QQ: crash after snapshot installation

### Describe the bug

Occasional crash of member after a snapshot installation due to attempt to read a command for an already consumed message. The reproduction steps are highly artificial but this crash has been seen in the wild a couple of times and could happen if a follower member on a node with consumers that come and go runs slowly.

```2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> ** Stacktrace =
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> **  [{lists,zipwith,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>             [#Fun<rabbit_fifo.60.126061837>,[],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>              [{1,[7352901|4]},{2,[7352904|4]}],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>              fail],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>             [{file,"lists.erl"},{line,844}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {lists,zipwith,4,[{file,"lists.erl"},{line,845}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {rabbit_fifo,'-delivery_effect/3-anonymous-5-',4,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>                   [{file,"rabbit_fifo.erl"},{line,2062}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {ra_server_proc,handle_effect,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>                      [{file,"src/ra_server_proc.erl"},{line,1385}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {ra_server_proc,handle_effects,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>                      [{file,"src/ra_server_proc.erl"},{line,1301}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {lists,foldl_1,3,[{file,"lists.erl"},{line,2151}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {ra_server_proc,handle_effects,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>                      [{file,"src/ra_server_proc.erl"},{line,1301}]}]
```

### Reproduction steps

This is easiest to re-create on 4.0.x but can happen on 3.13.x also

1. create a quorum queue "q1" in a 3 node cluster with the leader on rabbit-1 with the `quorum_min_checkpoint_interval` application config set to 1.
2. stop the member on rabbit-3: e.g. `ra:stop_server(quorum_queues, {'%2F_q1', node()}).`
3. publish 2 messages
4. trigger a checkpoint for the leader member: `ra:cast_aux_command({'%2F_q1', 'rabbit-1@HOST'}, force_checkpoint).`
5.  publish 1 more message
6. Attach then detach a consumer for the queue connected to rabbit-3 (no message should be delivered but they will show as unacked)
7. purge the queue
8. restart the member on rabbit-3 `ra:restart_server(quorum_queues, {'%2F_q1', node()}).`
9. Observer a member crash on rabbit-3

The member _may_ recover after step 9 - this is also, in fact, a bug.


### Expected behavior

No crash

### Additional context

currently a queue that experiences this error can be fixed by removing the faulty member from the quorum queue cluster, wait a bit and then re-adding it back using `rabbitmq-queues delete_member` and `rabbitmq-queues add_member`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QQ: crash after snapshot installation #12635

Describe the bug

Reproduction steps

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QQ: crash after snapshot installation #12635

Description

Describe the bug

Reproduction steps

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions