-
Notifications
You must be signed in to change notification settings - Fork 558
Description
Affected Version: v1.5.0 (as used in Telegraf 1.35.4)
We are observing a panic in the paho.mqtt.golang library when an MQTT client loses its network connection and attempts to acknowledge a delivered message after the connection has been lost - specifically when QoS 2 is enabled and a persistent session is active.
The panic happens in the onDelivered function of the mqtt_consumer input plugin of Telegraf, which internally calls msg.Ack() when the persistent session is enabled and track.Delivered() is true — particularly when QoS 2 is used. If the network drops before the acknowledgment can be sent, the client attempts to send the acknowledgment on a closed channel, triggering the panic.
Logs from Telegraf
Sep 2 17:52:21 tvikgge telegraf: 2025-09-02T09:52:21Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: EOF
Sep 2 17:52:21 tvikgge telegraf: panic: send on closed channel
Sep 2 17:52:21 tvikgge telegraf: goroutine 52 [running]:
Sep 2 17:52:21 tvikgge telegraf: github.com/eclipse/paho%2emqtt%2egolang.(*router).matchAndDispatch.func2.ackFunc.4()
Sep 2 17:52:21 tvikgge telegraf: /home/dev/go/pkg/mod/github.com/eclipse/[email protected]/net.go:464 +0x28e
Sep 2 17:52:21 tvikgge telegraf: sync.(*Once).doSlow(0x0?, 0x56280?)
Sep 2 17:52:21 tvikgge telegraf: /home/dev/go-pkg/src/sync/once.go:78 +0xab
Sep 2 17:52:21 tvikgge telegraf: sync.(*Once).Do(...)
Sep 2 17:52:21 tvikgge telegraf: /home/dev/go-pkg/src/sync/once.go:69
Sep 2 17:52:21 tvikgge telegraf: github.com/eclipse/paho%2emqtt%2egolang.(*message).Ack(0x1047e80?)
Sep 2 17:52:21 tvikgge telegraf: /home/dev/go/pkg/mod/github.com/eclipse/[email protected]/message.go:77 +0x25
Sep 2 17:52:21 tvikgge telegraf: github.com/influxdata/telegraf/plugins/inputs/mqtt_consumer.(*MQTTConsumer).onDelivered(0xc0000ca508, {0x13cf190, 0xc000143c80})
Sep 2 17:52:21 tvikgge telegraf: /home/dev/telegraf-1.35.4/plugins/inputs/mqtt_consumer/mqtt_consumer.go:240 +0x12d
Sep 2 17:52:21 tvikgge telegraf: github.com/influxdata/telegraf/plugins/inputs/mqtt_consumer.(*MQTTConsumer).Start.func1()
Sep 2 17:52:21 tvikgge telegraf: /home/dev/telegraf-1.35.4/plugins/inputs/mqtt_consumer/mqtt_consumer.go:145 +0x79
Sep 2 17:52:21 tvikgge telegraf: created by github.com/influxdata/telegraf/plugins/inputs/mqtt_consumer.(*MQTTConsumer).Start in goroutine 1
Sep 2 17:52:21 tvikgge telegraf: /home/dev/telegraf-1.35.4/plugins/inputs/mqtt_consumer/mqtt_consumer.go:138 +0x147
Sep 2 17:52:21 tvikgge systemd: telegraf.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 2 17:52:21 tvikgge systemd: Unit telegraf.service entered failed state.
Sep 2 17:52:21 tvikgge systemd: telegraf.service failed.
Sep 2 17:52:21 tvikgge systemd: telegraf.service holdoff time over, scheduling restart.
Sep 2 17:52:21 tvikgge systemd: Stopped Telegraf.
Sep 2 17:52:21 tvikgge systemd: Starting Telegraf...
Ideally, this race condition should be handled upstream in paho.mqtt.golang, rather than requiring workarounds at the Telegraf plugin level.
This issue is already reported in the Telegraf issue: influxdata/telegraf#17564 (reference if needed).
Thank you for your time and attention to this issue.