Skip to content

Create an easy way to kill all currently running message queue consumer processes #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hostep opened this issue Jul 13, 2019 · 13 comments

Comments

@hostep
Copy link

hostep commented Jul 13, 2019

Hi folks

It would be great if we can have some easy way to stop all current message queue consumer processes running on the server.
We are having some problems in our deploy flow that old message queue consumer processes keep running using old code from memory even after new code has been deployed.
I know there is this new poison pill mechanism, where message queue consumer processes kill themselves if they notice some configuration has changed and if they have at least one new incoming message.

I would like to see something similar, where we can say to the message queues consumers that there is some new code deployed and they should stop themselves, in order to start up again using the new code.
The other problem is that this poison pill feature currently only kicks in after a new message was added to the queue. This could potentially take many week/months/years until this happens if a specific consumer is only used very infrequently. So that's not really ideal.

We are currently fixing it in our deploy flow, by finding all process id's in the var/ directory, and killing them one by one with the kill command.
But I don't know how safe that is to do if a message is currently being processed as we kill that process. I don't know if the message queue consumer system in Magento can deal with that in a nice way or not?

So if we would have an option to say via command line that message queue consumers should kill themselves, that would be really great I think.

A potential idea would be to have a command which changes the poison pill version in the database and also sends a single dummy message in all the consumer queues which does nothing except to indicate to the consumer that it should kill itself.

Thoughts, other ideas?

Thanks!

@hostep hostep changed the title Create an easy to use way to kill all current message queue consumer processes Create an easy way to kill all current message queue consumer processes Jul 13, 2019
@hostep hostep changed the title Create an easy way to kill all current message queue consumer processes Create an easy way to kill all currently running message queue consumer processes Jul 22, 2019
@FreekVandeursen
Copy link

Another solution could be to have a configurable maximum wait timeout. If the timeout is exceeded, the consumer should perform the PoisonPill check, and then continue to wait for a new message. I'm not familiar enough with the message queue implementation though to know if this is technically possible, or how easy/difficult it will be to implement.

@kandy
Copy link

kandy commented Jul 22, 2019

Our common recommendation is to not introduce infrastructure code in Magento if it possible.

For example, if you use Kubernetes job, you don't need any code in Magento, you can just delete job and start new after deployment.

@hostep
Copy link
Author

hostep commented Jul 22, 2019

@kandy: Yes I agree, we shouldn't rely on server infrastructure on how to restart these processes. This ticket was created for this purpose. Magento should have an easy way to indicate to the consumer processes that they should self-destruct somehow.
From what I understand from Magento Cloud (I might be wrong), they also kill these processes during deploys. So they can probably also benefit from a more easy way to stop these consumer processes instead of running kill commands.

@FreekVandeursen: yes, that could be a solution, next to only performing the poison pill check after a new message was added to the queue, also check on regular (configurable?) intervals if the poison pill version was changed. But that interval can't be too high. Especially in deployment scenario's, you want the consumers to stop existing as fast as possible, so that newly deployed code (which is used by the MQ consumers) can be executed again as soon as the deploy is finished.
I would love to see some new command in bin/magento where we can simply indicate to all the consumers that they should stop running after they are done with processing the current message (if any).

@hostep
Copy link
Author

hostep commented Jul 28, 2019

Just accidentally stumbled over the symfony/messenger component, which looks like a similar implementation as what Magento is doing.
They also have a feature to stop running consumers/workers apparently: https://symfony.com/doc/current/messenger.html#deploying-to-production (messenger:stop-workers)

Maybe it makes sense to take a look at that component to see if there are other interesting things they are doing?

@Luwdo
Copy link

Luwdo commented Oct 11, 2019

The symphony mechanism that stops the queue actually just puts into cache a flag that the queue consumer reads that will exit after next job is finished.

https://github.com/symfony/messenger/blob/master/Command/StopWorkersCommand.php

In theory the poison pull mechanism already does this just not in the way we need.

the callback invoker checks for a change in the version
https://github.com/magento/magento2/blob/269b47af3e37fbbe76e9f38d45fdb0cf969d45e3/lib/internal/Magento/Framework/MessageQueue/CallbackInvoker.php

This does reject the next item in the queue, then exit the process with an exit(0)

I see a few choices here, we could just make a cli command that calls
https://github.com/magento/magento2/blob/59b3d7249559ae40667165b88699117406d140c3/app/code/Magento/MessageQueue/Model/ResourceModel/PoisonPill.php#L35

That would just put a new version into the DB making the processes kill themselves on the next message they process.

The only issue I see with this is if you don't have a message for a long time after a deployment then your consumer will become more and more out of date.

I could see a second option here, we could lock the queue, check it for any processing items and the moment that there are no longer any processing items, kill the consumer.
This would have to be a cron of some sort because we do not know when the next job is going to finish.
I see what appears to be a lock mechanism in the invoke I will need to do some testing before I am sure that this would work.

@hostep
Copy link
Author

hostep commented Oct 11, 2019

Thanks for the feedback @Luwdo!

The questions posed in this issue and in #180 have been thrown together and put into an architectural proposal already, which you can see over here:
https://github.com/magento/architecture/pull/232/files?short_path=81c5aa0#diff-81c5aa0b55a519b20c0ffd8b3f57b21b
Problem 2 at the end might be of interest to you.

Feel free to leave comments or suggestions on that proposal (it's not because it is already accepted that it is perfect)

@Luwdo
Copy link

Luwdo commented Oct 11, 2019

Problem 2 is exactly my issue that I have been working on.

I just finished a stopgap measure for our production sites and published it here:
https://github.com/humanelement/module-advanced-message-queue-options

I do not see mention about moving the PIDs themselves to a standard folder under var
Which I found as necessary step to make sure I can keep the symlink consistant across releases and not have to worry about 3rd party modules that add their own consumers.

@hostep
Copy link
Author

hostep commented Oct 11, 2019

Nice work!

But be aware that the pid files have been removed in the latest version of Magento (version 2.3.3), by magento/magento2@1d9e07b, I think they are now using locking in the database to make sure no duplicated consumers spawn.

@davidwindell
Copy link

We've just come across this issue (customer's complaining about exports not working), turns out it was the consumers running in the old deployment release directories and creating the files in the wrong root folders.

Our fix for now is a quick killall php after deployment, but it's messy.

@hostep
Copy link
Author

hostep commented Dec 10, 2020

@davidwindell: we do something like this: davidalger/capistrano-magento2#133 (comment)
Also messy, but a little bit less messy 🙂

Magento should really implement something easy we can call with the command line, like proposed in https://github.com/magento/architecture/blob/master/design-documents/asynchronous-operations/consumer-processes-improvements.md#problem-2-deployment-problems (problem 2), but I and Magento devs haven't found time yet to implement that.

@davidwindell
Copy link

Thanks so much @hostep this inspired me to replace the killall with:

pgrep -u "$(whoami)" -f "[q]ueue:consumers:start" | tee /dev/stderr | awk '{print $1}' | xargs -r kill

In our Deploybot config 👍

@onlinebizsoft
Copy link

The other problem is that this poison pill feature currently only kicks in after a new message was added to the queue. This could potentially take many week/months/years until this happens if a specific consumer is only used very infrequently. So that's not really ideal.

I don't see any problem in this case? I mean we can let the consumer running until new message comes, do you see any problem with this? @hostep

@hostep
Copy link
Author

hostep commented Aug 28, 2021

There's a pending PR with a solution for this BTW: magento/magento2#31495

@onlinebizsoft: so far we haven't seen big issues with this new solution (which we've patched into some of our shops), besides the process using a bunch of memory, but that's about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants