-
Notifications
You must be signed in to change notification settings - Fork 915
Consumer Keeps Resets to -1001 and difference between topic offset and consumer offset? #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Offset -1001 is a special value meaning Invalid or Default, depending on context. If you want to manually specify the starting offset you can set the offset of the TopicPartition to an offset to your liking. You can use |
@edenhill everytime I stop the consumer the offest resets to -1001 topic = TopicPartition(FROM_TOPIC,i) |
Can you provide your config and some code? When you say you have a consumer for each partition, are you manually assigning the partition, or do you have three consumer in a same consumer groups which share the offset? In your case, you should use the same group id for each consumer, as you seem to want them to read all data from topics. Your call if you use subscribe (auto repartition) or assign (you decide which partition each consumer read). If you assign the consumer to the partition, you should let the invalid offset for the assignment- this will use the stored offset for the consumer group id. With some code it will be easier to help you :) |
@treziac @edenhill this is how I set up my consumers. I do use the same groupid and I assign them manually.
|
Since the client is asynchronous internally and it takes some time to set up connections, query leaders, and start fetching, it is very likely that calling position() right after assign() will not return an actual offset. What I think you are after is actually the committed offset, for which you would use committed() (and you can do so right after assign() since it will block while waiting for the broker to respond). Unrelated, check |
@edenhill so to your point I need to wait before fetching the offset? so a time.sleep() should help?
Is that right? |
What is it that you are trying to achieve by getting the fetch position? |
@edenhill I'm trying what you mentioned, if for some reason if the consumer crashes and dies - I would like to start this script again. This way, when the consumers launches and I start assigning partitions to each consumer, I want to get the last offset position and continue. I check for position<0 because of the -1001 so if there's a better way you could reconstruct my code? |
The client already has built-in support for this by two means:
Which means you only need to decide if you want to start reading the oldest ( |
@edenhill okay, will take a look. |
It is not different, it is just handled automatically in the client itself, so there is typically no reason to implement the same logic in the application. |
cool, let me try your changes and get back. |
@edenhill I removed setting offset to 0 and I noticed form the fetch logs that the consumer gets stuck on the latest offset and doesn't cover the lag. %7|1514946451.579|FETCH|rdkafka#consumer-28| [thrd:172.32.231.127:9092/2]: 172.32.231.127:9092/2: Fetch topic qa_gcdb_shipment_queue [26] at offset 2413 (v2) |
What do you mean "gets stuck on"? If there are no previously committed offsets it will revert to auto.offset.reset which you've set to latest, i.e., the end of the partition (the next new message to arrive). |
@edenhill so basically this is the state of my topic partitions So for example when I launch my consumer, I can see the consumer for partition 2 (when Debug=Fetch) at 17108 rather than 16911 and complete the remaining 197 entries. |
Can you show me your latest code? |
|
If you down where the message is consumed add a printout for Nit: you should use |
Oh sorry, set |
Thank you! So the consumer starts up, gets your assignment and then asks the broker for committed offsets for topic qa_gcdb_shipment_queue partition 2 in group maxwellConsumer, but the broker responds with offset -1 which means there are no committed offsets for that combination.
There are generally two reasons for this:
Either way the client needs to know what to do when there are no committed offsets to use, so it looks at Do note that this consumer will only commit actually messages it has seen, in this case it has not seen any actual message though, only the end of the partition, so nothing is committed. |
@edenhill I can't seem to find the offset.rentention parameter in the consumer.properties nor the server.properties. Where would it be? Or do I have to add it independently? |
That is a broker-level configuration, look for |
@edenhill the link has no mention of |
You misspelled it, see my last comment. |
@edenhill got it. Sounds like that fixes the problem. Producer produces: 1,2,3,4,5,6 Why would that happen? |
When you restart the client, does it progress from message 4 and sees message 5 and 6 as well?
If it does see the remaining messages:
|
I'm not able to reproduce it right away but this is what the logs had shown %3|1515100056.487|ERROR|rdkafka#consumer-69| [thrd:172.31.230.234:9092/bootstrap]: 172.31.230.234:9092/0: Receive failed: Disconnected |
Is this from when it fails to consume? Also, the timestamp indicates that it is the broker's idle connection reaper that is closing idle connections. SEe https://github.com/edenhill/librdkafka/wiki/FAQ#why-am-i-seeing-receive-failed-disconnected |
Changed my code quite a bit since then and not able to reproduce. Will reopen this with the logs if and when it occurs again. |
@edenhill I have the same issue when I tried to implement manual commits after disabling enable.auto.commit=False. I have also tried to use store_offset method you mentioned in another ticket. But, I end up getting negative lag.
My consumer configuration is as given below
Everytime, I start the consumer, the lag goes to negative -1001 and then rise and fall back again. Let me know if I am doing anything wrong. |
I don't see anywhere you are calculating the lag
You are not checking the message for errors (message objects are reused for error signalling). nit: You are adding the same topic,partition combo multiples times to the topicpartitions list, better use a dict. |
I am sorry that I didn't add the code for tracking lag. I will remove the tracking of offset if the message results in an error. Although, while reading the documentation more thoroughly, I realized that If I consume a batch and process those messages before I consume the next batch. I don't need manual commit. Because of the offsets of the first batch is being committed when I consume the next batch. So even if I consumer breaks down. I still won't lose any messages. Because my consumption and processing happen synchronously. I did some tests, it happens to validate my hypothesis. But the scale of those tests isn't that reliable. Correct me if I am wrong. Thanks for the help :) |
Description
So I'm currently troubleshooting my consumer and trying different things with partitions. I have 3 partitions and a consumer for each partition.
I initially had an issue where the commit offset would be -1001 but I figured that was only because of the timeout. So I put in code to reset it to 0 if <0, but now everytime I rerun the consumer it always returns -1001 as my offset.
Is there a way to find the latest commit of a particular topic partition?
And also, what is the difference between topic and consumer offset?
Thanks in advance
How to reproduce
confluent_kafka.version() - ('0.11.0', 720896)
confluent_kafka.libversion() - ('0.11.0', 721151)
Broker version 0.9.0.1
OS: ubuntu
Checklist
Please provide the following information:
confluent_kafka.version()
andconfluent_kafka.libversion()
):{...}
'debug': '..'
as necessary)The text was updated successfully, but these errors were encountered: