Skip to content

Conversation

@lukebakken
Copy link
Collaborator

@lukebakken lukebakken commented Sep 28, 2017

See #693 and #151098578 in tracker.

These changes are already in the stable branch as they were not reverted there: https://github.com/rabbitmq/rabbitmq-server/blob/stable/src/rabbit_cli.erl#L117-L146

I found issue #693 and it turns out that, yes, systemd does look at the exit status of the ExecStop command to determine if the rabbitmq-server.service unit should be put into a failed state. Here is the problematic sequence of events as provided by @dumbbell in #693:

  • User runs rabbitmqctl stop which causes RabbitMQ server to eventually exit with status 0
  • systemd notices that the server process exited, and runs the ExecStop command - rabbitmqctl stop
  • At this point, if rabbitmqctl stop returns anything other than 0, systemd puts the unit into a failed state

The other reason rabbitmqctl stop could (or should) exit with 0 is due to the change introduced in rabbitmq/rabbitmq-server-release#49. With the change to Restart=on-failure, the above sequence of events would cause systemd to restart RabbitMQ if rabbitmqctl stop returns 69 during the ExecStop phase.

Other Options

  • Introduce a safe_stop command for rabbitmqctl that will always return 0. This command could also check to see if the RABBITMQ_PID_FILE environment variable is set and use it to synchronously wait for the server to exit, which would remove the need for this code. The code to wait for RabbitMQ to exit is already available.

  • Use the systemd SuccessExitStatus=69 setting. I have tested it and systemd does consider the code to be OK from the ExecStop command:

● rabbitmq-server.service - RabbitMQ broker
   Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2017-09-28 19:08:38 UTC; 4s ago
  Process: 9671 ExecStop=/bin/sh -c while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done (code=exited, status=0/SUCCESS)
  Process: 9518 ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl stop (code=exited, status=69)
 Main PID: 8819 (code=exited, status=0/SUCCESS)
   Status: "Initialized"

Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]: attempted to contact: ['rabbit@UBUNTU-16']
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]: rabbit@UBUNTU-16:
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]:   * connected to epmd (port 4369) on UBUNTU-16
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]:   * epmd reports: node 'rabbit' not running at all
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]:                   no other nodes on UBUNTU-16
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]:   * suggestion: start the node
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]: current node details:
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]: - node name: 'rabbitmq-cli-46@UBUNTU-16'
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]: - home dir: /var/lib/rabbitmq
Sep 28 19:08:38 UBUNTU-16 rabbitmqctl[9518]: - cookie hash: AsRShfl0Qc4l/Srxe0zd5w==

@michaelklishin
Copy link
Collaborator

Was this meant to be submitted against stable since CLI tools reside in a separate repo in master?

@michaelklishin
Copy link
Collaborator

@lukebakken some team members strongly believe rabbitmqctl stop existing with the code of 0 more harm than good. We introduced rabbitmqctl shutdown in rabbitmq/rabbitmq-cli#181 (and also in 3.6.x) to work around some problems with rabbitmqctl stop (that are specific to certain scenarios).

This improves stop for some fairly specific cases and potentially hides node termination problems, say, in PCF RabbitMQ. I think a safe_stop command that systemd will use is the only alternative that seems feasible at this point.

Also, rabbit_cli is meant to be gone in master, although I see one usage.

@lukebakken
Copy link
Collaborator Author

I'll check out the shutdown command, thanks.

@lukebakken
Copy link
Collaborator Author

@michaelklishin @hairyhum this change just makes the badrpc_multi case behave the same way as badrpc.

@lukebakken lukebakken deleted the rabbitmq-server-1362 branch September 29, 2017 14:17
lukebakken added a commit to rabbitmq/rabbitmq-server-release that referenced this pull request Sep 29, 2017
HoloRin pushed a commit to rabbitmq/rabbitmq-packaging that referenced this pull request Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants