Skip to content

Windows eacces / eexist error on journal.jif file during pause minority #545

@Gsantomaggio

Description

@Gsantomaggio

The following errors occurs only in Windows after pause minority

   ** {{badmatch,{error,{"c:/rmqdata/db/rabbit@WIN-NCNMMKRPVMR-mnesia/queues/997D2EGEQM85T4DGH6EEPV91Y",
                          eexist}}},

and

** Reason for termination == 
** {{badmatch,{error,{"c:/rmqdata/db/rabbit@WIN-NCNMMKRPVMR-mnesia/queues/997D2EGEQM85T4DGH6EEPV91Y/journal.jif",
                      eacces}}},

Here the steps to reproduce it:

Versions: RabbitMQ 3.5.7 on Erlang 17.3

STEP 1 - Create a RabbitMQ cluster with 3 machines:

  1. Debian
  2. Debian
  3. Windows 2012 server

Policy:{"ha-mode":"all","ha-sync-mode":"automatic"}
rabbitmq.config:

[
  {kernel, [
    {net_ticktime,  15}
  ]},
  {rabbit, [
    {queue_index_embed_msgs_below, 128},
    {queue_index_max_journal_entries, 512},
    {cluster_partition_handling, pause_minority}
  ]}
].

STEP 2 - Publish persistent messages :

for (int i = 0; i < 10 ; i++) {
            threadChannels.submit(new Runnable() {
                public void run() {

                    try {
                        Channel channel = connection.createChannel();

                        for (int j = 0; j < 50000; j++) {
                            channel.basicPublish(exchange_name,"",MessageProperties.PERSISTENT_BASIC, new byte[1024]);
                            Thread.sleep(30);
                        }

STEP 3 - When RabbitMQ starts to write the queue-index files inside the directory: %RABBITMQ_BASE%\db\rabbit@WIN-NCNMMKRPVMR-mnesia\queues
add a Windows firewall rule to get Windows in network partition.

Firewall rule note: it is enough to block incoming connection from one on Debian machine.

LOGS

 * pause_minority mode enabled
    We will therefore pause until the *entire* cluster recovers

During the stopping :

=INFO REPORT==== 12-Jan-2016::16:00:45 ===
Mirrored queue 'myQuueue3' in vhost '/': Synchronising: complete

=ERROR REPORT==== 12-Jan-2016::16:01:17 ===
** Generic server <0.1374.0> terminating
** Last message in was {'$gen_cast',go}
** When Server state == {not_started,
                            {amqqueue,
                                {resource,<<"/">>,queue,<<"myQuueue1">>},
                                true,false,none,[],<7603.11283.0>,
                                [<7602.2511.0>],
                                [],[],
                                [{vhost,<<"/">>},
                                 {name,<<"asdsada">>},
                                 {pattern,<<>>},
                                 {'apply-to',<<"all">>},
                                 {definition,
                                     [{<<"ha-mode">>,<<"all">>},
                                      {<<"ha-sync-mode">>,<<"automatic">>}]},
                                 {priority,0}],
                                [{<7602.2512.0>,<7602.2511.0>},
                                 {<7603.11285.0>,<7603.11283.0>}],
                                [],live}}
** Reason for termination == 
** {{badmatch,{error,{"c:/rmqdata/db/rabbit@WIN-NCNMMKRPVMR-mnesia/queues/77FUPFJL6LYV1XUWFXRXYOTE",
                      eexist}}},
    [{rabbit_mirror_queue_slave,handle_go,1,[]},
     {rabbit_mirror_queue_slave,handle_cast,2,[]},
     {gen_server2,handle_msg,2,[]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}

Note: The error sometime occurs also during the start_app after pause minority

Here the full logs

I didn't test (yet) if in this situation RabbitMQ loses messages.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions