-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Using the patch for #714, in a 3-node cluster configured to test #545, the GM might eventually crash when processing an activity
message:
** {{case_clause,
{{{value,
{33059,
{publish,<9042.1151.1>,flow,
{message_properties,undefined,false,2048},
{basic_message,
{resource,<<"/">>,exchange,<<"testExchange">>},
[<<>>],
{content,60,
....
[{gm,find_common,3,[{file,"src/gm.erl"},{line,1369}]},
{gm,'-handle_msg/2-fun-2-',7,[{file,"src/gm.erl"},{line,881}]},
{gm,with_member_acc,3,[{file,"src/gm.erl"},{line,1386}]},
{lists,foldl,3,[{file,"lists.erl"},{line,1262}]},
{gm,handle_msg,2,[{file,"src/gm.erl"},{line,871}]},
{gm,handle_cast,2,[{file,"src/gm.erl"},{line,661}]},
{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1049}]},
{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,250}]}]}
which I believe leads in the other nodes to:
=ERROR REPORT==== 12-Apr-2016::16:18:05 ===
** Generic server <0.19461.2> terminating
** Last message in was {'$gen_cast',join}
** When Server state == {state,{9,<0.19461.2>},
{{9,<0.19461.2>},undefined},
{{9,<0.19461.2>},undefined},
{resource,<<"/">>,queue,<<"myQuueue_a_2">>},
rabbit_mirror_queue_slave,undefined,-1,
undefined,
[<0.19460.2>],
{[],[]},
[],0,undefined,
#Fun<rabbit_misc.execute_mnesia_transaction.1>,
false}
** Reason for termination ==
** {{bad_return_value,
{bad_flying_ets_update,1,2,
{<<212,124,127,183,143,75,237,208,132,9,251,34,112,92,244,166>>,
<<202,95,0,178,134,57,152,103,126,177,128,73,15,248,54,106>>}}},
{gen_server2,call,
[<5629.28766.2>,{add_on_right,{9,<0.19461.2>}},infinity]}}
and
=ERROR REPORT==== 12-Apr-2016::16:16:38 ===
** Generic server <0.27327.2> terminating
** Last message in was go
** When Server state == {not_started,
{amqqueue,
{resource,<<"/">>,queue,<<"myQuueue_a_2">>},
true,false,none,[],<0.26808.2>,[],[],[],
[{vhost,<<"/">>},
{name,<<"all">>},
{pattern,<<>>},
{'apply-to',<<"all">>},
{definition,
[{<<"ha-mode">>,<<"all">>},
{<<"ha-sync-mode">>,<<"automatic">>}]},
{priority,0}],
[{<32227.15062.2>,<32227.14697.2>},
{<0.26809.2>,<0.26808.2>}],
[],live}}
** Reason for termination ==
** {duplicate_live_master,'rabbit@t-srv-rabbit04'}
This is not suspected to have been introduced by #714, but a consequence of the deadlock being resolved. Thus, the system continues running on partial partitions with pause_minority, and eventually reaches an inconsistent state.