Producer refactor #4

mhowlett · 2016-11-15T17:56:00Z

This PR is primarily a refactor of the Producer class.

Intended as an incremental step forward - still working out stuff, lots of open questions + notes to self.

Templating Producer on TKey, TValue seems almost certain at this point.

edenhill · 2016-11-15T18:02:59Z

examples/AdvancedProducer/Program.cs


-            using (Producer producer = new Producer(new Dictionary<string, string> { { "bootstrap.servers", brokerList } }))
-            using (Topic topic = producer.Topic(topicName))
+            var config = new Dictionary<string, string> { { "bootstrap.servers", brokerList } };


will this support generic values?
E.g.:
var config = NewDictionary<string,...> { { "bootstrap.servers": "localhost", "session.timeout.ms": 6000, "enable.auto.commit": false } }

no. there is a discussion on this in the previous PR. I'm in favor of changing this to Dictionary<string, object>, and having an extension method that takes Dictionary<string,strin> and furthermore understand's dotnet nested configuration notation of using : as a separator.

edenhill · 2016-11-15T18:03:51Z

examples/AdvancedProducer/Program.cs

+                producer.KeySerializer = (ISerializer<string>)new Confluent.Kafka.Serialization.Utf8StringSerializer();
+                producer.ValueSerializer = producer.KeySerializer;
+
+                Console.WriteLine($"{producer.Name} producing on {topicName}. q to exit.");


Can we use ctrl-c instead? 'q' isnt very out-of-bandish

ok, i was just updating existing examples, which were like this, but i'll make this change now.

edenhill · 2016-11-15T18:05:08Z

examples/AdvancedProducer/Program.cs

                    }

-                    Task<DeliveryReport> deliveryReport = topic.Produce(data, key);
+                    Task<DeliveryReport> deliveryReport = producer.Produce(topicName, key, text);


Seems like the key part wasn't removed from 'text'

yeah i noticed that too, but just left the example as is (doesn't really matter). ok i'll change it.

edenhill · 2016-11-15T18:06:54Z

examples/Benchmark/Program.cs

-            using (var topic = producer.Topic(topicName))
+            var config = new Dictionary<string, string> { { "bootstrap.servers", broker } };
+
+            using (var producer = new Producer<Empty, byte[]>(config, null))


Is the key and value types specified per producer?
What if you want to produce to two different topics with different object types?
I dont think any other client does it this way

yes. i talked at length with @ewencp about this. you can argue with him :-). I think i'm in favor of it, but i'm uneasy that it doesn't model what's going on properly. The java producer works like this.

I want to expose a way of writing byte[], bye[] efficiently (+ offset / length) so people can write something more general if they really need it.

@ewencp points out that usually the formatter is the same (i.e. json, avro), even though the schema might be different.

i'm still uneasy about it.

@edenhill Why would that be an issue? You just use Producer<Object, Object> if you need something more general. With Avro types it also wouldn't be uncommon to have something like Producer<string, SpecificRecord> if you need to capture multiple concrete types.

The value in doing this is that you get type safety in the vast majority of cases. While I think it's important to support producing different messages to different topics, I think it's important to optimize for the common case. I'm pretty sure producing messages of a single type to a single topic is the dominant usage pattern.

I expect we're going to leave this as is. regardless, I'm keen to not undo these changes in this PR.

@ewencp So if you use <Object,Object> you loose typesafety and you are back to square one, right?

Producer instances are quite heavy and I want to avoid giving people the impression that they should use one per K,V type.

Additionally, if we add interceptor support that is exposed to the .NET client, and there's an interceptor that needs to produce to some arbitrary topic using its own format, now that wouldn't really work with a specific types for <K,V>, right?

edenhill · 2016-11-15T18:10:11Z

src/Confluent.Kafka/Handle.cs

            callbackCts.Cancel();
            callbackTask.Wait();

+            // TODO: Why is this necessary only when disposing?


Im guessing the final destructor (disposing=false) shouldnt block.

yep, I expect you're right.
I assume the callbackCts.Cancel / callbackTask can never block. I'm not 100% sure of this though, and want to verify further so will leave the todo above this one in (but take this one out).

Im not sure how this works in .Net, but for the other clients we typically dont want to flush/wait for outstanding requests to finish before shutting down if the client instance is automatically destructed (goes out of scope, GCd, etc).

Is there like an explicit dispose you call? If so, that's the case where we want to block and flush, and this seems to be the case in the code already..?

This is part of the Dispose pattern. See the discussion of the Dispose(Boolean) method: https://msdn.microsoft.com/en-us/library/fs2xkftw(v=vs.110).aspx

This was a note to self that I want to check this very carefully. Still deferring it until later. What's happening in this implementation is not strictly what the Dispose pattern dictates (and i looked a bit closer now and I think it's incorrect). Also, I don't like that there are producer specific things happening here on Handle which is the base class of Producer / Consumer.

Further notes about the dispose method pattern:

The method can block. If it blocks indefinitely, the .NET runtime will deal with it and give up eventually.

Before the if statement we're supposed to dispose any unmanaged resources owned directly by this class regardless of whether the method is called by the finalizer or via Dispose. In the implementation, the poll thread is cancelled instead. I think it's possible that the callbackCts and callbackTask objects have already been collected in the event the method is being called by the finalizer (not good!), because these are managed objects and cleanup order is not deterministic. Therefore, I think they should be in the if statement (see below).

In the if statement we're supposed to call Dispose methods by any managed objects that are IDisposable.

Also, as noted, I probably don't like the factoring here of having Handle as a base class of both Producer and Consumer. But I have to think that through further.

For now I'm going to move the callbackCts and callbackTask stuff inside the if statement.

actually, i'm going to leave is and put a TODO noting the above potential flaw. I'll work all this through in a future PR.

edenhill · 2016-11-15T18:18:05Z

src/Confluent.Kafka/Producer.cs

            }
        }

+        private Topic getKafkaTopic(string topic)


Maybe we should provide this abstraction in librdkafka instead, seeing how all bindings are now doing this exact same thing.

confluentinc/librdkafka#908

edenhill · 2016-11-15T18:19:10Z

src/Confluent.Kafka/Producer.cs

+
+
+        // TODO: Support the other function overloads in Topic.
+        // TODO: I'd like a way to produce as (byte[], offset, length) as well if possible all the way down to librdkafka (need to investigate).


Are there slices in .net?

There is ArraySegment, which is a value type, so probably no less efficient, so you probably have a good point.
There is a related PR (that got reverted) in rdkafka-dotnet.

I think I saw some work very recently to fix that PR

yes. i'm monitoring it.

edenhill · 2016-11-15T18:19:28Z

src/Confluent.Kafka/Producer.cs

+
+        // TODO: Support the other function overloads in Topic.
+        // TODO: I'd like a way to produce as (byte[], offset, length) as well if possible all the way down to librdkafka (need to investigate).
+        // TODO: Name these Produce or Send?


the other bindings use [pP]roduce()

sounds good to me, will remove todo.

edenhill · 2016-11-15T18:22:59Z

src/Confluent.Kafka/Producer.cs

+            => getKafkaTopic(topic).Produce(ValueSerializer.Serialize(val), KeySerializer.Serialize(key));

-        public Topic Topic(string topic, IEnumerable<KeyValuePair<string, string>> config = null) => new Topic(handle, this, topic, config);
+        // TODO: do we need both the callback way of doing this and the Task way?


Qs:

What's the performance and memory impact of one-task-per-message?

How do you poll completion for multiple tasks? (thousands)

Can you bind variables to Tasks?

Can you bind variables to DeliveryHandlers (callbacks)?

#1 I don't know, but it worries me. (question for another PR).
#2 It's easy to Wait on many tasks Task.WaitAll(IEnumerable)
#3, #4 i don't think so. the only way I can think of right now is managing this yourself in a dictionary external to the Task or callback (not straightforward).

So re 4 (and maybe 3, not sure how though), we would need an opaque value for the produce call that is later supplied to the callback

Don't you want both since tasks give an easy way to block and wait for completion but callbacks let you follow up on something you're doing with the message as soon as the delivery report happens?

@ewencp We argued about the same for the Python client and concluded that it is easy enough to add Tasks/Futures using callbacks, so we sohuld be fine with just providing callbacks.
But that was for Python with a dead-slow Futures implementation

I just remembered Task has the ContinueWith method, which effectively provides a way of specifying a callback when the task completes (it's non blocking). I'll add a note about this in the todo (and think about all the details later).

@edenhill Also, the expectations across languages are different, so there are other possible tradeoffs as well. But I agree it'd be good to have an idea of the performance impact. @mhowlett What about async/await? If this is expected to work these days, then we may not be able to avoid the Task?

@ewencp no we can't avoid Task..

edenhill · 2016-11-15T18:23:47Z

src/Confluent.Kafka/RdKafkaException.cs

-using Confluent.Kafka.Internal;
+using Confluent.Kafka.Interop;

+// TODO: probably move this to Confluent.Kafka.Internal and also create Conflunet.Kafka.KafkaException.


would calling it just "Exception" (and the underlying KafkaError -> Error) be problematic?
I know it is in Python, but not in Go.

It's normal to prefix specific exception types with what they are. KafkaException would be more idiomatic. Actually this applies to Producer as well (see new comment there). Generally, things are often prefixed, even if this is redundantly specified in the namespace.

+1 on KafkaException and KafkaError to avoid confusion,
-1 on Consumer and Producer, unnecessarily redundant in my view (and disjoint from other bindings).

ewencp · 2016-11-15T19:17:38Z

examples/AdvancedProducer/Program.cs

-                Console.WriteLine($"{producer.Name} producing on {topic.Name}. q to exit.");
+                // TODO: work out why explicit cast is needed here.
+                // TODO: remove need to explicitly specify string serializers - assume Utf8StringSerializer in Producer as default.
+                producer.KeySerializer = (ISerializer<string>)new Confluent.Kafka.Serialization.Utf8StringSerializer();


I would consider changing the way this setup works. In the Java client serializers are handled specially as the only special thing you can include in constructors in addition to the config (and can be auto-instantiated via reflection if you specify them in your config). Having half of the producer setup after the constructor feels odd (and means you can't make those fields readonly).

right. i'll fix this in a future PR though.

ewencp · 2016-11-15T19:21:11Z

examples/Benchmark/Program.cs

-            using (var topic = producer.Topic(topicName))
+            var config = new Dictionary<string, string> { { "bootstrap.servers", broker } };
+
+            using (var producer = new Producer<Empty, byte[]>(config, null))


@edenhill Why would that be an issue? You just use Producer<Object, Object> if you need something more general. With Avro types it also wouldn't be uncommon to have something like Producer<string, SpecificRecord> if you need to capture multiple concrete types.

The value in doing this is that you get type safety in the vast majority of cases. While I think it's important to support producing different messages to different topics, I think it's important to optimize for the common case. I'm pretty sure producing messages of a single type to a single topic is the dominant usage pattern.

ewencp · 2016-11-15T19:34:53Z

examples/SimpleProducer/Program.cs

            {
-                Console.WriteLine($"{producer.Name} producing on {topic.Name}. q to exit.");
+                // TODO: figure out why the cast below is necessary and how to avoid it.
+                // TODO: There should be no need to specify a serializer for common types like string - I think it should default to the UTF8 serializer.


Java serializer doesn't do this since it supports multiple encodings, with UTF8 being the default.

Aside from string are there any other cases where you could reasonably set a default? Seems like everything else would be specific to the serialization format.

Int, Long, Empty

ewencp · 2016-11-15T19:37:02Z

src/Confluent.Kafka/Empty.cs

@@ -0,0 +1,4 @@
+namespace Confluent.Kafka
+{
+    public class Empty {}


It occurred to me that you could also call this Null. There's apparently an implicit null type which is the type for the null value, but you can't actually use that. I think the ideal implementation of this class would be sealed and make it impossible to instantiate it at all so it guarantees you have to use null, though I'm not sure you can do that in C#.

great points.

We can have a sealed class with a private constructor.

if we ever want to pass in a value for this type - which we must do with the current Produce method overloads if TValue is Empty and TKey isn't - it either needs to be an instance of Empty or null.

That leads me to think Null is a better name, null should be the value and Null should be a class that can't be instantiated.

ewencp · 2016-11-15T19:45:05Z

src/Confluent.Kafka/Handle.cs

            callbackCts.Cancel();
            callbackTask.Wait();

+            // TODO: Why is this necessary only when disposing?


This is part of the Dispose pattern. See the discussion of the Dispose(Boolean) method: https://msdn.microsoft.com/en-us/library/fs2xkftw(v=vs.110).aspx

ewencp · 2016-11-15T19:48:09Z

src/Confluent.Kafka/Internal/SafeDictionary.cs

+        public void Dispose(bool disposing)
+        {
+            // TODO: Think carefully about whether the implementation of this method is correct.
+            if (this.Disposed)


Do you even need this? Isn't it invalid to invoke Dispose twice?

i don't need a finalizer so this is all irrelevant now.

ewencp · 2016-11-15T20:31:08Z

src/Confluent.Kafka/Internal/SafeDictionary.cs

+            }
+
+            // this is atomic.
+            readDictionary = writeDictionary;


Is write dictionary every replaced? I feel like I must be missing something but it doesn't seem like this provides the guarantees you want?

ewencp · 2016-11-15T22:06:33Z

src/Confluent.Kafka/Producer.cs

+            => getKafkaTopic(topic).Produce(ValueSerializer.Serialize(val), KeySerializer.Serialize(key));

-        public Topic Topic(string topic, IEnumerable<KeyValuePair<string, string>> config = null) => new Topic(handle, this, topic, config);
+        // TODO: do we need both the callback way of doing this and the Task way?


Don't you want both since tasks give an easy way to block and wait for completion but callbacks let you follow up on something you're doing with the message as soon as the delivery report happens?

ewencp · 2016-11-15T22:10:06Z

src/Confluent.Kafka/Serialization/EmptySerializer.cs

@@ -0,0 +1,11 @@
+namespace Confluent.Kafka.Serialization
+{
+    public class EmptySerializer : ISerializer<Empty>


Do we actually need this class?

I think so. In the case we have Producer<string, Null> then the only applicable produce method overload will require a value value which will be put through the serializer.

ewencp · 2016-11-15T22:10:29Z

src/Confluent.Kafka/Serialization/EmptySerializer.cs

+        public static byte[] result = new byte[0];
+        public byte[] Serialize(Empty val)
+        {
+            return result;


Shouldn't this just be null?

there might be a debate here, but good point, null will work.

Also enhance examples, some minor cleanup

Producer refactor

2d60222

edenhill reviewed Nov 15, 2016

View reviewed changes

ewencp reviewed Nov 15, 2016

View reviewed changes

review changes

48cc375

mhowlett mentioned this pull request Nov 16, 2016

Producer API Design #5

Closed

mhowlett merged commit 34eff02 into confluentinc:master Nov 16, 2016

koushikchitta mentioned this pull request Mar 7, 2018

Invalid offsets on successful delivery of message with 0.11.3 #453

Closed

7 tasks

chiru1205 mentioned this pull request Dec 11, 2018

Producer message delivery issues #700

Open

nitinpi mentioned this pull request Jan 31, 2019

Unable to receive messages : Group coordinator not available #756

Closed

7 tasks

This was referenced Jun 3, 2020

Producer Metadata Refresh on Disconnected Broker Takes 20 seconds to Fail #1305

Closed

Producer Only Connecting to some Brokers at Startup #1308

Closed

Brokers Not Randomly Selected for Metadata Refresh #1309

Closed

ElectricVampire mentioned this pull request Aug 17, 2023

Getting "SASL authentication error" in loop when password got auto rotated for MSK cluster #2102

Open

mapr-devops pushed a commit to mapr/confluent-kafka-dotnet that referenced this pull request Nov 6, 2024

Improve missing KMS error message (confluentinc#4)

85e5de3

Also enhance examples, some minor cleanup



		// TODO: Support the other function overloads in Topic.
		// TODO: I'd like a way to produce as (byte[], offset, length) as well if possible all the way down to librdkafka (need to investigate).

Producer refactor #4

Producer refactor #4

Uh oh!

Conversation

mhowlett commented Nov 15, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhowlett Nov 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhowlett Nov 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhowlett Nov 15, 2016 •

edited

Loading

mhowlett Nov 15, 2016 •

edited

Loading