Skip to content

Panic on Message Ack when MQTT Client Connection is Lost(QoS 2 + Persistent Session) #726

@lnxpgn

Description

@lnxpgn

Affected Version: v1.5.0 (as used in Telegraf 1.35.4)

We are observing a panic in the paho.mqtt.golang library when an MQTT client loses its network connection and attempts to acknowledge a delivered message after the connection has been lost - specifically when QoS 2 is enabled and a persistent session is active.

The panic happens in the onDelivered function of the mqtt_consumer input plugin of Telegraf, which internally calls msg.Ack() when the persistent session is enabled and track.Delivered() is true — particularly when QoS 2 is used. If the network drops before the acknowledgment can be sent, the client attempts to send the acknowledgment on a closed channel, triggering the panic.

Logs from Telegraf

Sep  2 17:52:21 tvikgge telegraf: 2025-09-02T09:52:21Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: EOF
Sep  2 17:52:21 tvikgge telegraf: panic: send on closed channel
Sep  2 17:52:21 tvikgge telegraf: goroutine 52 [running]:
Sep  2 17:52:21 tvikgge telegraf: github.com/eclipse/paho%2emqtt%2egolang.(*router).matchAndDispatch.func2.ackFunc.4()
Sep  2 17:52:21 tvikgge telegraf: /home/dev/go/pkg/mod/github.com/eclipse/[email protected]/net.go:464 +0x28e
Sep  2 17:52:21 tvikgge telegraf: sync.(*Once).doSlow(0x0?, 0x56280?)
Sep  2 17:52:21 tvikgge telegraf: /home/dev/go-pkg/src/sync/once.go:78 +0xab
Sep  2 17:52:21 tvikgge telegraf: sync.(*Once).Do(...)
Sep  2 17:52:21 tvikgge telegraf: /home/dev/go-pkg/src/sync/once.go:69
Sep  2 17:52:21 tvikgge telegraf: github.com/eclipse/paho%2emqtt%2egolang.(*message).Ack(0x1047e80?)
Sep  2 17:52:21 tvikgge telegraf: /home/dev/go/pkg/mod/github.com/eclipse/[email protected]/message.go:77 +0x25
Sep  2 17:52:21 tvikgge telegraf: github.com/influxdata/telegraf/plugins/inputs/mqtt_consumer.(*MQTTConsumer).onDelivered(0xc0000ca508, {0x13cf190, 0xc000143c80})
Sep  2 17:52:21 tvikgge telegraf: /home/dev/telegraf-1.35.4/plugins/inputs/mqtt_consumer/mqtt_consumer.go:240 +0x12d
Sep  2 17:52:21 tvikgge telegraf: github.com/influxdata/telegraf/plugins/inputs/mqtt_consumer.(*MQTTConsumer).Start.func1()
Sep  2 17:52:21 tvikgge telegraf: /home/dev/telegraf-1.35.4/plugins/inputs/mqtt_consumer/mqtt_consumer.go:145 +0x79
Sep  2 17:52:21 tvikgge telegraf: created by github.com/influxdata/telegraf/plugins/inputs/mqtt_consumer.(*MQTTConsumer).Start in goroutine 1
Sep  2 17:52:21 tvikgge telegraf: /home/dev/telegraf-1.35.4/plugins/inputs/mqtt_consumer/mqtt_consumer.go:138 +0x147
Sep  2 17:52:21 tvikgge systemd: telegraf.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep  2 17:52:21 tvikgge systemd: Unit telegraf.service entered failed state.
Sep  2 17:52:21 tvikgge systemd: telegraf.service failed.
Sep  2 17:52:21 tvikgge systemd: telegraf.service holdoff time over, scheduling restart.
Sep  2 17:52:21 tvikgge systemd: Stopped Telegraf.
Sep  2 17:52:21 tvikgge systemd: Starting Telegraf...

Ideally, this race condition should be handled upstream in paho.mqtt.golang, rather than requiring workarounds at the Telegraf plugin level.

This issue is already reported in the Telegraf issue: influxdata/telegraf#17564 (reference if needed).

Thank you for your time and attention to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions