-
Notifications
You must be signed in to change notification settings - Fork 881
Description
Description
We are using MSK(AWS Apache kafka) cluster with SASL-SCRAM authentication, here credentials get auto rotated after 90 days when our application is running. There is no impact of password change until unless by any/various reason client try to connect to new broker(essentially reestablishing the TCP connection). At that time authentication happens and gives "SASL authentication error".
%7|1692263746.489|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://b-3.test.kafka.us-east-1.am]: sasl_ssl://b-3.test.kafka.us-east-1.amazonaws.com:9096/3: SASL authentication error: Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512 (after 11ms in state AUTH_REQ) (_AUTHENTICATION): identical to last error: error log suppressed
Our issue here is that this error doesn't invoke the error handler which we have assigned to Consumer so we can rebuild the consumer with latest password in case of authentication failure. It just runs in loop and call never come back to client and remain in underlying kafka library.
Just for explaining how we read password:
If we or Confluent-kafka-dotnet lib will try to read password from sasl.password
key in config dictionary - we will always go to secrets manager and fetch the latest password. We have overridden IDictionary so any call for Get with key sasl.password
will go to secrets manager
How to reproduce
- Setup a multi node Kafka cluster with SCRAM authentication
- Start a consumer which subscribe the topic
- Make call to Consume(cancellationToken)
- See that you are able to consume the messages successfully
- We are using AWS secrets manager hence we just rotated the password there but without AWS you need to change the user password and associate it with cluster (I am not familiar on how this can be done in non AWS env)
- You will see that there is no impact even after password change as authentication happens at the start of tcp connection with broker.
- Now take down a node which will force your client to connect with another node and this will retrigger authentication as this would be new SSL handshake.
- You will see that now Consume method never returns and in loop you will see following error
%7|1692263746.489|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://b-3.test.kafka.us-east-1.am]: sasl_ssl://b-3.test.kafka.us-east-1.amazonaws.com:9096/3: SASL authentication error: Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512 (after 11ms in state AUTH_REQ) (_AUTHENTICATION): identical to last error: error log suppressed
Here if we get this error some how in consumer handler we can rebuild our consumer. MSK doesn't allow us to set connections.max.reauth.ms and gives "Key 'connections.max.reauth.ms' is not supported by at least one Apache Kafka version. Key checked against versions: [2.8.1.2, 2.6.2, 2.7.1, 2.8.0, 2.6.3, 2.7.2, 2.8.1, 2.5.1, 2.6.0, 2.6.1, 2.7.0, 2.3.1
, 2.2.1, 2.4.1.1, 3.3.2, 3.1.1, 3.2.0, 3.3.1, 3.4.0] error.
We are looking to find someway by which Auth error comes back to client side so we can rebuild the consumer.
- Confluent.Kafka nuget version - 1.9.3, librdkafka -1.9.2
- Apache Kafka version - 2.7.0
- Client configuration - Using SCRAM auth
- Operating system - Amazon Linux
- Debug logs are provided
- Critical issue. - YES