Skip to content

Integrate client with Confluent schema registry and Avro #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
criccomini opened this issue Aug 23, 2016 · 10 comments
Closed

Integrate client with Confluent schema registry and Avro #36

criccomini opened this issue Aug 23, 2016 · 10 comments

Comments

@criccomini
Copy link

criccomini commented Aug 23, 2016

We are looking at integrating this client with the Confluent schema registry and Avro with this client. Proposal is to add a new submodule in this repo that has:

  • Caching schema registry client
  • Message serializer
  • Wrapper around base producer/consumer

The wrapper would override send/receive methods on base producer/consumer, and pass the objects/bytes through the message serializer. The message serializer would, in turn, use the caching schema client to fetch the schema, and encode/decode the message. Wrapper would be pluggable so other serde implementations could be used as well (e.g. Thrift/Protobuf), though we're not planning to implement any others.

Does this sound OK?

/cc @roopahc

@ewencp
Copy link
Contributor

ewencp commented Aug 23, 2016

@criccomini All the components you mention sound fine, this is essentially what I would do. Given how small the overrides are for send/receive, I might not even make it pluggable in the same way the Java ones are. Python's duck typing makes this a lot easier to just wrap the two locations that need serialization with normal overloads, so a simple AvroProducer and AvroConsumer might just be simpler for the user than having to separately configure/instantiate the serializer.

The one question I have is whether we want it in this repo. Were you thinking about moving confluent_kafka to a subdirectory and still publishing it separately from this new package, or including it directly. I just wonder if requiring extra dependencies on confluent_kafka will be acceptable to folks not using Avro (and if we add Thrift/Protobufs eventually the dependencies would grow even more).

@criccomini
Copy link
Author

@ewencp

Were you thinking about moving confluent_kafka to a subdirectory and still publishing it separately from this new package, or including it directly

Not sure. I'm open to suggestions. One thing that I've seen work with Airflow is keeping everything in one package, but using setup.py extras_requires. This would mean you'd:

pip install confluent-kafka
pip install confluent-kafka[avro]

The second one would include the Avro dependencies. We could have confluent-kafka[thrift], etc. Thoughts?

@ewencp
Copy link
Contributor

ewencp commented Aug 25, 2016

@criccomini Ah, cool, extras_require seems like a pretty nice solution. Crazy that people have encountered this issue before and created a reasonable solution for it :) Do you just implement things normally and people who don't want to use the optional features just need to avoid the relevant imports? If so, this seems like a pretty nice solution to the problem without requiring much effort on our part -- compared to requiring completely separate libraries in many other languages, this might be a really easy solution.

@criccomini
Copy link
Author

Do you just implement things normally and people who don't want to use the optional features just need to avoid the relevant imports?

Yep, exactly. As long as you don't touch the code that you don't need at runtime, you don't need to install the extra requirements.

@ewencp
Copy link
Contributor

ewencp commented Aug 25, 2016

Seems reasonable then. @edenhill, any thoughts?

@edenhill
Copy link
Contributor

Great work @criccomini!

The extras_require solution sounds fine to me, I'm still not sure where the actual code would reside though? I do think we want to avoid git submodules if possible since they are poorly integrated in people's standard git workflows.

@ewencp
Copy link
Contributor

ewencp commented Aug 25, 2016

@edenhill I think the point is that we can just house it all under this repository directly, without submodules, since python provides a reasonable way to exclude unneeded dependencies. This would be a bit different than our many-repos approach so far, but we end up doing this for other things as well, e.g. schema-registry has the service implementation, serializers, Connect converters etc.

@ewencp
Copy link
Contributor

ewencp commented Aug 25, 2016

@edenhill And perhaps a bit more concretely, I imagine compared to doing

from confluent_kafka import Producer

you could do something like

from confluent_kafka.avro import AvroProducer

If you want the Avro/schema-registry enabled one instead of the bare bytes one.

@edenhill
Copy link
Contributor

Awesome, let's do it!

@criccomini
Copy link
Author

Awesome. @edenhill @ewencp one other thing. It looks like you guys are Python2/Python3 compatible. We will implement the schema registry stuff to be compatible as well, BUT the encoding/decoding syntax changes a bit between Python 2 and Python 3. We're planning to just branch this portion of the code based on the Python runtime, so we'll have two different encode/decode util methods depending on the Python version.

The second rub here is that Avro, itself, has different libraries between Python 2 and Python 3 (avro-python and avro-python3). We'll try and figure out what to do about this, but it might mean different extras_requires depending on the Python runtime.

dtheodor pushed a commit to dtheodor/confluent-kafka-python that referenced this issue Sep 4, 2018
…ignment

Add Consumer.Subscription() and Assignment() APIs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants