-
Notifications
You must be signed in to change notification settings - Fork 880
Producer Refactor Mk II #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return (long) LibRdKafka.produce( | ||
handle, | ||
partition, | ||
(IntPtr) (MsgFlags.MSG_F_COPY | (blockIfQueueFull ? MsgFlags.MSG_F_BLOCK : 0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how long does librdkafka need the memory for? rather than copying it, we could pin it until after the delivery report comes back. this may or may not be more performant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until the delivery report callback returns.
Re .._MSG_F_BLOCK:
since the message queue threshold limit also includes delivery reports, some other thread of the application will need to call poll() for a blocking produce() to ever unblock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: .._MSG_F_BLOCK - there is a thread in Handle which is devoted to calling poll. If the Task based ProduceAsync methods are used, I believe the continuations are always on a different thread, so there will never be a problem. If the callback ProduceAsync methods are used, and the callbacks produce messages, then there is potentially a problem though.
Re memcpy vs pinning, I'm currently thinking I'll leave as is for this version and possibly investigate in a future version.
|
||
if (val != null) | ||
{ | ||
gchValue = GCHandle.Alloc(val, GCHandleType.Pinned); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect marshaling of byte[] does this guff behind the scenes.
if (val != null) | ||
{ | ||
gchValue = GCHandle.Alloc(val, GCHandleType.Pinned); | ||
pValue = Marshal.UnsafeAddrOfPinnedArrayElement(val, valOffset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the safety of this is enforced higher up by use of ArraySegment which performs run time bounds checking on parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should arguably (would be better) enforced here. But then we get run time bounds checking twice. yay! maybe we do want (byte[], int, int) in the API after all instead of ArraySegment.
public Task<DeliveryReport> ProduceAsync(string topic, byte[] key, byte[] val, int? partition = null, bool blockIfQueueFull = true) | ||
=> getKafkaTopic(topic).Produce(val, 0, val.Length, key, 0, key.Length, partition ?? RD_KAFKA_PARTITION_UA, blockIfQueueFull); | ||
|
||
public Task<DeliveryReport> ProduceAsync(string topic, ArraySegment<byte> key, ArraySegment<byte> val, int? partition = null, bool blockIfQueueFull = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used ArraySegment here rather than (byte[], int, int) parameters explicitly. This is at odds with many methods in the .NET Framework (eg. BitConverter.ToString). I don't know if there is a good reason for this or whether it's a function of the relative age of ArraySegment struct compared to these other methods. One benefit is ArraySegment provides runtime bounds checking (we could do this explicitly of course though). Note that ArraySegment is a struct (stack allocated), so no GC overhead.
examples/AdvancedProducer/Program.cs
Outdated
var config = new Dictionary<string, object> { { "bootstrap.servers", brokerList } }; | ||
|
||
using (var producer = new Producer<string, string>(config)) | ||
using (var producer = new Producer<string, string>(config, new StringSerializer(), new StringSerializer())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewencp argues that these should be explicit, as most of the time defaults won't be obvious so most of the time if they're left off this has a good chance of being a user error. I agree, though It's kind of a shame.
var sProducer2 = producer.Wrap<Null, int>(new NullSerializer(), new IntSerializer()); | ||
|
||
// write (string, string) data to topic "first-topic", statically type checked. | ||
sProducer1.ProduceAsync("first-topic", "my-key-value", "my-value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i forgot to wait on these tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere, ideally this could just be a Flush()
on the unwrapped producer.
For that matter, this does raise the question of how "aggregate" operations behave for the wrapper producers. i.e. would Flush()
in any way isolate itself to messages for the single format (presumably no, given the way this is implemented)? How about things like Dispose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want Flush
(though I'm not 100% sure i'm seeing the world correctly here). I think we just want to wait on the tasks, or a collection of tasks.
If we have a flush method, it doesn't make sense to put it on ISerializingProducer. It'd be on the concrete Producer and Producer<TKey, TValue> only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dispose method of Producer effectively flushes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have en explicit Flush() method, and not to do an implicit Flush() when disposing, to allow applications to exit quickly without waiting for message transmission (which may block for a long time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about not wanting to flush in Dispose... shutdown is not the only consideration - a using statement will typically be used to wrap a producer and this is equivalent to try / finally. We probably don't want messages being flushed before an exception gets handled.
I guess Flush is a good thing to have in addition to the Tasks, as some people will probably just want to fire and forget and ignore the Tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand not flushing in .Dispose makes the API more error prone as users will often be in a situation where there may be messages in flight when they want to exit because all the calls are async. So, calling Flush will usually be an appropriate thing to do right at the end of the Producer using block.
Also, if an exception makes it outside the Producer scope, there is no reference to the producer, so no option to Flush.
Thoughts on having a property .FlushOnDispose, which by default is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the other clients does an implicit flush on dispose, so I dont think we should alter that behaviour in this client.
Also note that flush, and thus dispose, might block for up to message.timeout.ms which defaults to 5 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree.
I think I'm only still thinking about flush on dispose because that is the existing behavior of rdkafka-dotnet (and there are some positives to the idea).
But now I'm seeing it from a different point of view - it's not normal to wait a long time on dispose, so it's a counter intuitive thing to do.
I'll work out something similar to the other clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as indication that it's counter intuitive to flush in dispose, see my first comment in this thread "oh, i forgot to wait on these tasks." - this was before I started thinking about what was happening in the dispose method and before people started suggesting a flush method. my intuition then was dispose was not going to wait on anything.
examples/Benchmark/Program.cs
Outdated
public static void Produce(string broker, string topicName, long numMessages) | ||
{ | ||
var deliveryHandler = new DeliveryHandler(); | ||
public static void WaitForAllDeliveryReports() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we just lacking Flush()
right now? That's how I'd normally expect to wait for all sends to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this benchmark, I'm using the non-Task ProduceAsync method. Probably a premature optimization to even have this option. Maybe we want to get rid of this. One property of this is all the callbacks happen on the same thread.
Anyway, with Tasks, you can do WaitAll on a collection of Tasks, and that is probably the idiomatic way to 'flush'.
Also, if order is guaranteed, you could also just wait on the last Task. I think if messages are produced to different partitions though, order is not guaranteed?
If we include the non task based methods, you're right, Flush is probably arguably necessary.
currently thinking they should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did another benchmark test using the Task produce method / WaitAll. It's substantially terser, and there is no noticeable change in perf. But I can't easily profile memory usage - The Task way makes an addition 5M objects.
The callback methods are here because they're in rdkafka-dotnet and I wanted to take them out only after careful consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking them out. I can't think of any strong reason to have them and if it turns out there is one, it'll be much easier to change the API to add them back in than take them out.
examples/Benchmark/Program.cs
Outdated
|
||
byte cnt = 0; | ||
var val = new byte[100].Select(a => ++cnt).ToArray(); | ||
var key = new byte[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required to be byte[0]
rather than null
? I'd normally expect null
. I think they end up having the same overhead assuming compression is off, but strictly speaking they are different since null
gets encoded as a length of -1
and a zero length array is encoded as 0
followed by 0 bytes. Just want to make sure we're not losing the ability to encode null
in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about the len == -1 encoding of null? cc @edenhill - quickly looking in rdkafka_msg.c it looks like it gets set to 0 to me if the data is null.
rdkafka-dotnet handles null ok and sets the length to 0.
and whoops, i'd previously noticed this and forgot to make null work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
librdkafka treats key=NULL as Kafka null key, while key!=NULL and size=0 as an empty key, which are not the same thing.
We should allow for the same semantics in this client.
Value is identical.
var sProducer2 = producer.Wrap<Null, int>(new NullSerializer(), new IntSerializer()); | ||
|
||
// write (string, string) data to topic "first-topic", statically type checked. | ||
sProducer1.ProduceAsync("first-topic", "my-key-value", "my-value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere, ideally this could just be a Flush()
on the unwrapped producer.
For that matter, this does raise the question of how "aggregate" operations behave for the wrapper producers. i.e. would Flush()
in any way isolate itself to messages for the single format (presumably no, given the way this is implemented)? How about things like Dispose?
@@ -0,0 +1,8 @@ | |||
namespace Confluent.Kafka | |||
{ | |||
public struct DeliveryReport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The equivalent in the Python client fills in more information -- the callbacks accept (err, msg)
parameters, where the latter is http://docs.confluent.io/3.1.0/clients/confluent-kafka-python/index.html#confluent_kafka.Message Same deal with Go where the DR channel gets one of these: http://docs.confluent.io/3.1.0/clients/confluent-kafka-go/index.html#Message Is this being kept more minimal intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unchanged from rdkafka-dotnet.
If we include the message in the delivery report (given the precedent, we should), we're going to need to think about generics here too. I propose doing this in the next PR which is going to be a refactor of the Consumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A delivery report will need at least:
- topic
- partition
- offset
- error
- msg opaque (or bound variable through other means)
Other stuff that might be useful:
- value-object
- key-object
- value
- key
- timestamp
- future fields that the community makes up, e.g. headers
Wrapping this in a Message type is consistent with other clients.
src/Confluent.Kafka/Null.cs
Outdated
{ | ||
public sealed class Null | ||
{ | ||
public static Null Instance = new Null(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? The point of this is that you can't instantiate it, right? It's not Singleton, it's Null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 'cause I screwed up thinking about the deserializer. it's unnecessary.
} | ||
|
||
|
||
public class Producer<TKey, TValue> : ISerializingProducer<TKey, TValue>, IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should implement IDisposable
. Supposedly you should only implement IDisposable
if you directly handle unmanaged resources. In this case you are just using an IDisposable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs to implement IDisposable otherwise producer's resources can't be deterministically cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I think it's easy to lose track of where the disposable pattern vs SafeHandle
needs to be used. Seems we have one layer of SafeHandles around the underlying C resources and then use IDisposable
everywhere else to allow proactive cleanup.
I think part of the confusion comes from overloading what IDisposable
means. The docs for IDisposable
even say
Implement IDisposable only if you are using unmanaged resources directly. If your app simply uses an object that implements IDisposable, don't provide an IDisposable implementation.
Unfortunately it seems people also use this to also be the equivalent of Closeable
in Java (which doesn't need to imply anything about whether something will be garbage collected even if you don't call close()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Note there is still some cleaning up to do I think when I get to reviewing Handle. Also wanted to note that inheritance makes this more difficult to think through properly.
src/Confluent.Kafka/Producer.cs
Outdated
|
||
public void Dispose() | ||
{ | ||
producer.Dispose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this implementation is incorrect anyway. If I call wrap()
twice and then Dispose()
on the first wrapper, I'll dispose all the underlying resources and the other ISerializingProducer
will break. Docs seem to indicate a SafeHandle
or implementing Finalize
on the wrapped class is the way to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't call Dispose on a wrapped object (Wrap returns ISerializingProducer, and concretely the internal SerializingProducer).
If you call Dispose on producer that is wrapped, the SerializingProducer will no longer work and will throw some sort of an exception if it's tried to be used. I could be more explicit about detecting this and throw a more explicit exception.
No one's going to do that in practice, I don't see it as a problem with the concept.
} | ||
|
||
|
||
public class Producer<TKey, TValue> : ISerializingProducer<TKey, TValue>, IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do docs work for classes with the same name but different # of generic parameters? Are we going to have to constantly maintain duplicate docstrings (as I assume we are going to fill these all in to get automated generation of docs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but guess it's going to mean duplication.
Yes, I'll fill all these out, but want to get the API right first.
{ | ||
public Null Deserialize(byte[] data) | ||
{ | ||
return Null.Instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of weird. I can put a null
into a serializer and get an object back out of the deserializer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right, this is completely weird and unnecessary. I didn't think too hard about this yet as deserializers aren't used yet.
src/Confluent.Kafka/Producer.cs
Outdated
|
||
public class Producer<TKey, TValue> : ISerializingProducer<TKey, TValue>, IDisposable | ||
{ | ||
private Producer producer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both should be readonly
as they are always set in the constructor and should never change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep.
examples/Benchmark/Program.cs
Outdated
{ "queue.buffering.max.messages", 500000 }, | ||
{ "message.send.max.retries", 3 }, | ||
{ "retry.backoff.ms", 500 }, | ||
{ "queued.min.messages", 1000000 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a consumer property, and so is session.timeout.ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, your example probably tests consumer as well. removed.
var config = new Dictionary<string, object> | ||
{ | ||
{ "bootstrap.servers", broker }, | ||
{ "queue.buffering.max.messages", 500000 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for setting these properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"mirrors the librdkafka performance test example.". Was trying to get something directly comparable to your numbers.
examples/Benchmark/Program.cs
Outdated
|
||
byte cnt = 0; | ||
var val = new byte[100].Select(a => ++cnt).ToArray(); | ||
var key = new byte[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
librdkafka treats key=NULL as Kafka null key, while key!=NULL and size=0 as an empty key, which are not the same thing.
We should allow for the same semantics in this client.
Value is identical.
examples/SimpleProducer/Program.cs
Outdated
var config = new Dictionary<string, object> { { "bootstrap.servers", brokerList } }; | ||
|
||
using (var producer = new Producer<Null, string>(config)) | ||
using (var producer = new Producer<Null, string>(config, new NullSerializer(), new StringSerializer())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's a NullSerializer and how is it different from not setting a serializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The choice is to either have a NullSerializer class or an explicit check whether the serializer is null in the ProduceAsync method. I'm sort of on the fence here, erring on the side of having NullSerializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or simply defaulting to NullSerializer if null is passed as serializer in the constructor.
Not sure if this matters though. @ewencp ?
// TODO: There should be no need to specify a serializer for common types like string - I think it should default to the UTF8 serializer. | ||
producer.ValueSerializer = (ISerializer<string>)new Confluent.Kafka.Serialization.Utf8StringSerializer(); | ||
|
||
Console.WriteLine($"{producer.Name} producing on {topicName}. q to exit."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ctrl-c baby, not "q".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided I disagree. The purpose of these examples is to demonstrate usage of the client in a straightforward, easy to understand way. Turns out that detecting Ctrl-C is a bit convoluted, in fact there are as many lines dedicated to doing this properly as demonstrating the producer. None of the code is rocket science of course, so i'm sort of indifferent, but on balance, I think using q to exit is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might seem like a tiny nitpick thing, but there are two proper reasons:
- examples should be correct, even if for unrelated stuff, people will base their own code on this.
- out-of-band cancellation shows an interesting problem: how do I break out of the consume loop. If we can't show people how to do that in an effective and correct manner they will get it wrong and that will bite us back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AppDomain.ProcessExit
not a workable solution that will be relatively small?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding #1 - i wouldn't call using 'q' to exit 'incorrect' as such.
regarding #2 - you've convinced me it's useful. People won't be using Console.ReadLine, but many will be making console apps, and the CancelKeyPress handler is the way to detect Ctrl-C.
how about i leave it in the advanced producer example and keep it out of the simple producer example - keep that as dead simple as possible - i want the first example people look at to be inviting and not scary in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewencp AppDomain is changed a lot or doesn't exist in .net core. http://www.michael-whelan.net/replacing-appdomain-in-dotnet-core/
The new assembly unload event mentioned in the above article is not effective at catching Ctrl-C. Examples I see around the web use Console.CancelKeyPress. I'm not certain there is not a better way, but it seems likely CancelKeyPress is good.
Another reason not to include this: there are higher priorities than figuring this out.
// TODO: specify serializers via config file as well? In which case, default keySerializer and valueSerizlizer params to null. | ||
|
||
if (KeySerializer == null) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really require a KeySerializer? Can't we allow this to be null instead of having the phony NullSerializer thingie?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that option of using null rather than NullSerializer. Can remove NullSerializer if that is the consensus. See other comment.
/// <paramref name="val" /> cannot be null. | ||
/// TODO: well it shouldn't be other there is ambiguity on deserialization. check this. | ||
/// </remarks> | ||
public class StringSerializer : ISerializer<string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume that people will know this means UTF-8? Maybe being explicit about it is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm on the fence on this. removed.
|
||
namespace Confluent.Kafka.Serialization | ||
{ | ||
public class IntSerializer : ISerializer<int> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document what serialization this is in practise.
Big endian? varint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on architecture. added remark (will do propper docs later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huhm, you sure? A serializer shouldn't depend on the architecture, that's what makes the serialized format portable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. maybe i need to go home for a nap.
} | ||
|
||
public Task<DeliveryReport> Produce(byte[] val, int valLength, byte[] key = null, int keyCount = 0, Int32 partition = RD_KAFKA_PARTITION_UA, bool blockIfQueueFull = true) | ||
public Task<DeliveryReport> Produce(byte[] val, int valOffset, int valLength, byte[] key = null, int keyOffset = 0, int keyLength = 0, Int32 partition = RD_KAFKA_PARTITION_UA, bool blockIfQueueFull = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This valOffset API is funky, aren't there slices or similar in .NET?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this for now. will get rid of it when I refactor out the topic method.
src/Confluent.Kafka/Topic.cs
Outdated
{ | ||
throw RdKafkaException.FromErr(LibRdKafka.last_error(), "Could not produce message"); | ||
} | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use proper error string (rd_kafka_err2str(..last_error))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put a todo for this, have a separate JIRA for sorting out exceptions / errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The risk of doing it outside the review process is that we might loose track of requested changes and that means we'll eventually end up having to re-review the entire code base.
If things are fixed in a followup commit in the same PR it is much easier to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this class is going to be completely removed, so this is going to need to be re-reviewed at some point anyway - I'm in the middle of a big refactor - I see a lot of value in getting things broadly in place first then focussing on the detail. I see capturing things as todo's which get coppied around pretty efficient (keeping in mind my first point in this comment). Also, it's the best way for me to get a holistic view of the whole project, which I believe reduces risk in making bad design decisions. This is particularly important for me as i'm new to clients so can foresee less than I otherwise might.
|
||
public void Flush() | ||
{ | ||
while (OutQueueLength > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is actually a rd_kafka_flush() call that should beused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I checked and this wasn't exposed in the LibRdKafka layer yet whereas the OutQueueLength stuff was. Agreed that we should use the correct internal version though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I was just doing whatever ah- was here. will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put a todo. I want to prioritize getting the high level API right, and i'm going to be reviewing / addressing a lot more lower level stuff in future PRs
src/Confluent.Kafka/Producer.cs
Outdated
{ | ||
producer.Dispose(); | ||
} | ||
public bool FlushOnDispose => producer.FlushOnDispose; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should really try to keep everything as config dict properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very application code specific to me (users of an application shouldn't ever want to set it) and setting it in the config unnatural, so I think best left as a property. @ewencp ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is what we do in librdkafka, Python and Go.
Since it is up to the application to allow users to set configuration properties it can also block these ones if it so desires (we could add a helper that assists in this: rd_kafka_conf_property_is_probably_not_for_the_user(str) bool
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's more difficult to use and not at all idimoatic the way you suggest.
And it should never be exposed outside the app (why should a user even know what Dispose is - it's a c-sharp thing - let alone how and when the app uses it).
public class IntDeserializer : IDeserializer<int> | ||
{ | ||
/// <remark> | ||
/// Endianness depends on architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serializer must be stable and not dependent on arch.
If we have a producer on little endian and a consumer on big endian they need to be compatible using the same Serializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right you are.
examples/Wrapped/Program.cs
Outdated
// (null, int) serialization. When you do not wish to write any data to a key | ||
// or value, the Null type should be used. | ||
var sProducer2 = producer.Wrap<Null, int>(null, new IntSerializer()); | ||
var sProducer2 = producer.Wrap<Null, int>(null, new IntSerializer(Endianness.LittleEndian)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a new enum Edianness. Alternatively could use a bool here. This is more self-descriptive, but if the value is determined at runtime, could be a bit more annoying to use. opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the IntSerializer is only aimed at being compatible with itself then it must not have a configuration option to specify endian-ness, but instead be hardcoded to little or big endian.
Otoh if we think this Serializer will need to be compatible with IntSerializers in other languages we shall investigate if there is any prior art and if so adhere to that endian ness.
E.g., Avro uses little endian, Kafka uses big endian.
I dont really see the point of having the endian configurable, that'll create more problem than it solves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. let's just make it network order.
// TODO: Add timout parameter (with default option == block indefinitely) when use rd_kafka_flush. | ||
public void Flush() | ||
{ | ||
// TODO: use rd_kafka_flush here instead.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would've been less code calling rd_kafka_flush() than adding these comments ;)
public int Deserialize(byte[] data) | ||
{ | ||
return BitConverter.ToInt32(data, 0); | ||
// network byte order -> big endian -> most significant byte in the smallest address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be a system lib function for this (which also avoids doing anything on big-endian systems)
http://stackoverflow.com/questions/2420227/ntohs-and-ntohl-equivalent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i'm wondering is if calling that then BitConverter will be faster or slower than what I've got, also considering I expect most people won't be running on big endian systems (i have no idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something irks me about using this function that takes an int and returns and int, because what it's returning semantically isn't actually an int ... I'm also not convinced it's going to be faster in the the arithmetic expression I've got (i think it's fine to optimize for little endian systems).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using System.Net.IPAddress.HostToNetworkOrder() also brings in a dependency on System.Net which is not currently required. I don't have a clear idea of what this means. It should always be on the host system because it's part of the platform, but it might mean a larger memory footprint (dll loaded when it otherwise wouldn't have been).
return BitConverter.ToInt32(data, 0); | ||
// network byte order -> big endian -> most significant byte in the smallest address. | ||
return | ||
(((int)data[0]) << 24) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this is for a little-endian host, I think this might be wrong, it should be other way around:
return (data[3] << 24) | (data[2] << 16) | (data[1] << 8) | (data[0]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endianness of the host doesn't matter with <<
and >>
operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I have it the right way around.
Good point about the | operator though, that'll be quicker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, you are right, sorry.
Same deal with Go where the DR channel gets one of these: | ||
http://docs.confluent.io/3.1.0/clients/confluent-kafka-go/index.html#Message Is this being | ||
kept more minimal intentionally? | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on Slack, the message metadata contains an ever growing number of fields, so passing a rich Message object to the delivery report , like the other clients, is most likely the best way forward.
And that Message object should be the same as returned by consumer.poll()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. however if a very common use case is to just produce a key and value (I think it is), then it's worth having a Producer.ProducerAsync overload for this as well to make the interface easier to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to make assumptions on what information the dr callback needs based on the produce() arguments. We should provide whatever librdkafka provides in the dr.
@ewencp @edenhill
new producer API: