-
Notifications
You must be signed in to change notification settings - Fork 915
BufferError [Local] Queue full for producer even after changing librdkafka config #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Oddly, it looks like every 4-6 minutes we have BufferError occur for about 1-2 seconds, and then again 4-6 minutes later. (see timestamps)
|
You need to call poll() at regular intervals to serve the producer's delivery report callbacks. poll(0) is a cheap call if nothing needs to be done so it is typically put in the producer loop. You should typically only call flush() on termination to make sure all outstanding messages are sent. |
If I call Since the diff module instantiates my Producer class, it should never terminate but I may add a flush method to that class and call FYI - I edited the producer config to the following and let it run 3 hours and process 400K+ msgs and no BufferError. I do think that config is important and with defaults I was running into it so perhaps mention something in the Python client docs on those params.
|
Oddly, I added logging every 100 records in loop for each to print the current count. I don't understand how it's even possible but my Produced count passed my Consumed count along the way:
I only increment each upon each successful call and eventually Produced count surpasses Consumed and stays ahead the rest of the program through over 400K records.
It should not be possible that produced is called more than consumed, and they should be and are equal for first 25-30K messages and then produced starts jumping ahead. BufferError not appearing in logs but something really strange? |
poll() is cheap to call, it will not have a performance impact, so please add it to your producer loop. Regarding the 3 hour run, if it only did 400k messages then that is still below the internal queue size of 1M messages, but eventually that limit will be exceeded without calls to poll(). |
I will add |
How are you counting consumed and produced messages? |
I set class member var
I did same in Consumer class. Not using multithreading and single consumer group and single program for testing. |
In the diff_processor module that instantiates both classes (consumer, producer), for every msg from consume it does the Cassandra query/upsert then this:
In theory the produced count should be an exact match to consumed count as I only call it once per consumed msg. |
Ignore the publish count... Doh! If program detects photo modified date in listing record changed, it also publishes a record to another topic for a photo puller program to process. I will try I'll close this issue and if |
It must be noticed that the limitation of queue.buffering.max.kbytes may also introduce "BufferError: Local: Queue full" error. The max value of queue.buffering.max.kbytes is 2097151, that is about 2GB (400MB as default). There also has a document for Chinese readers: https://zhuanlan.zhihu.com/p/37016944 |
I can confirm that a poll after every message causes a drastic hit in performance. I had to revert it back in Prod on code that creates millions of records |
Ubuntu 14.04.4 LTS / Python 2.7.x / Kakfa 0.10 (Confluent Platform 3) / Python client (latest)
I'm seeing a large number of
BufferError [Local] Queue full
errors in logs for Producer client. I searched for the error yesterday and saw an issue from 2014 for librdkafka that was resolved by changing a few configuration parameters. I posted in this issue and changed my config and initial errors went away but as the program ran overnight, a flood of errors filled the logs. Out of 500,000 messages consumed from the topics, I'm missing over 100,000 in the subsequent topic.I have a python stream processor that instantiates both Consumer and Producer classes and consumes from 6 topics, performing diff operation/upsert against matching record if exists in Cassandra cluster, and then publishing diff'ed object to another topic (...ListingEditEvent). When it tries to publish to the subsequent topic, messages are getting lost. Transformer program picks up from the ListingEditEvent topic and converts to our schema and publishes to ListingEditTransformed topic for Logstash consumption to Elasticsearch. I'm seeing differences in the records in ES compared to Kafka topics and trying to resolve. I appreciate any tips on how to solve or better configuration values.
I edited the config for Producer client to the following:
I'm thinking of reducing the max time and increasing max messages, perhaps reduce to 5000ms, and 250 batch size, and 1 million max?
Errors not constant so must just exceed buffer as it's processing and recover and then exceed again:
My producer class doesn't call
flush()
like your example client since the calling module connects and keeps publishing. I also don't callpoll(0)
like example but unsure if that matters???The text was updated successfully, but these errors were encountered: