Skip to content

Bug: ConsumerOffsetCommitter goes into failure state after broker downtime #203

@nioertel

Description

@nioertel

During testing we observed an issue that the ConsumerOffsetCommitter apparently tries endlessly to commit offsets without success after a broker downtime.
Test setup:

  • Application is started (4 Java processes reading from the same topic which has 24 partitions)
  • Application is processing data without issues
  • Kafka Broker is killed
  • Application has cached messages that were already polled but not processed yet
  • Kafka Broker is started again
  • Application starts processing again but half of the instances has very low throughput
  • In the logs we keep seeing this error:
     [ERROR] 2022-03-01 09:16:37.934 [pc-broker-poll] i.c.p.i.ConsumerOffsetCommitter - Error committing offsets org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets.
     Caused by: org.apache.kafka.common.errors.DisconnectException

It seems like the application tries to process messages it still had cached but which were from a partition that was assigned to another application instance and therefor the offset committing doesn't work.

Related:

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions