|
| 1 | +# Improvements to message queue consumer processes |
| 2 | + |
| 3 | +## The current situation |
| 4 | + |
| 5 | +Currently all defined message queue consumers are getting spawned by a cronjob called `consumers_runner`. You can optionally choose to disable this entirely or only allow specific consumers to run based on some config settings in the `app/etc/env.php` file (see [docs](https://devdocs.magento.com/guides/v2.3/config-guide/mq/manage-message-queues.html#configuration)). |
| 6 | +This `consumers_runner` job first looks around on the server(s) to see if there are already running consumer processes for the same queue. If that's the case it won't spawn a new one, but if no consumer process is found for a certain queue, it will spawn one. By default it will listen for maximum 10.000 messages and if those 10.000 messages have been handled, it will kill itself. The next time the cronjob `consumers_runner` triggers, it will spawn a new process. |
| 7 | + |
| 8 | +There is also a poison pill feature in Magento, which is basically just a random hash stored in the database in the table `queue_poison_pill`. If some configuration value gets changed using the Magento backend, the poison pill version is changed. |
| 9 | +The running consumer processes check on this version, but only when a new message is in the queueu. If a consumer discovers a new message in the queue, it first checks if the poison pill version has changed. If the version wasn't changed, the message will be handled. If the version did get changed, the consumer will kill itself and the cronjob will spawn a new one on its next run and only then will handle that message. |
| 10 | + |
| 11 | +Magento 2.3.0 Open Source shipped with the message queue system and made use of it when using the bulk api import feature. |
| 12 | +And with Magento version 2.3.2 some already existing features were converted to make use of the message queue system. These are tasks which are triggered using the backend of Magento which can potentially take a while to execute. In order to prevent the webserver or php-fpm to run against a timeout, it was chosen to send these tasks to the message queue system and let these tasks get executed asynchronously. Some examples are: |
| 13 | + |
| 14 | +- Generating coupon codes |
| 15 | +- Mass editing products |
| 16 | +- Exporting data |
| 17 | +- ... |
| 18 | + |
| 19 | +Currently in Magento 2.3.2, 4 consumer processes get spawned by the cron system. Each of these processes take memory and cpu and regularly queries the database (or RabbitMQ if that broker is being used) to see if new messages are available and then process them. |
| 20 | + |
| 21 | + |
| 22 | +## Problem 1: not enough options to keep consumers under control |
| 23 | + |
| 24 | +### The problem |
| 25 | + |
| 26 | +There is currently too little control over these consumer processes. |
| 27 | +What if people only very irregularly use one of these features mentioned above, let's say they only once a year export all their products to check their inventory levels. |
| 28 | +Then you have some consumer process sitting there, doing nothing for 364 days in the year, wasting precious cpu cycles and taking up precious memory, until finally once a year the shopowner decides to execute a certain task it can execute. |
| 29 | + |
| 30 | +There is also the potential problem that the consumer process will take up more and more memory due to some memory leak in the code it executes. |
| 31 | + |
| 32 | +### The suggested solution |
| 33 | + |
| 34 | +The suggestion is to give more control per consumer to the shopowner or developers managing the shop. |
| 35 | +We could add some additional options to the consumer processes to keep them more under control. Currently there is one limit available: `max-messages`. If that number of messages gets processed, the consumer will kill itself. |
| 36 | +I'd like to suggest some other limits which we can set: |
| 37 | + |
| 38 | +- `max-idle-time`: if no message was being handled in xx seconds, then kill yourself |
| 39 | +- `max-time`: after xx total seconds, and after you are done with handling the current message, kill yourself |
| 40 | +- `memory-limit`: if xx MB's of memory is being taken up by the process and after you are done with handling the current message, kill yourself (a way to work around potential memory leaks) |
| 41 | + |
| 42 | +Next to these limits, a configurable sleep time might be nice: |
| 43 | + |
| 44 | +- `sleep`: xx milliseconds to sleep before checking if a new message is available (currently this is [hardcoded to 1 second](https://github.com/magento/magento2/blob/2.3.2/lib/internal/Magento/Framework/MessageQueue/CallbackInvoker.php#L59)) |
| 45 | + |
| 46 | +I'd also like to see an option defined on the consumer, but being used by the `consumers_runner` cronjob: |
| 47 | + |
| 48 | +- `only-spawn-when-message-available`: the idea is that the `consumers_runner` job checks the queue before spawning a consumer, to see if there is actually a message pending in the queue. If there is one, then go ahead and spawn a consumer (only if one isn't already running). If there isn't a message in the queue, then don't spawn a consumer. |
| 49 | + |
| 50 | +### Some options combined |
| 51 | + |
| 52 | +The problem outlined above, where a specific consumer only needs to run very infrequently could be solved by combining the options: |
| 53 | + |
| 54 | +- `only-spawn-when-message-available` |
| 55 | +- `max-idle-time` |
| 56 | + |
| 57 | +The consumer will only spawn when it is needed, and it will kill itself when it wasn't active for a certain period. |
| 58 | +That should save some precious server resources. |
| 59 | + |
| 60 | +### Making these options configurable and have some defaults |
| 61 | + |
| 62 | +These options should be configurable per consumer type. |
| 63 | +Some sensible defaults could be set in the [`queue_consumer.xml`](https://devdocs.magento.com/guides/v2.3/extension-dev-guide/message-queues/config-mq.html#queueconsumerxml) file for some of these options. |
| 64 | +Next to that, developers or shopowners should be able to override these values per consumer type. At least being able to override them in the `app/etc/env.php` file would be nice, but a backend interface for making these things configurable would also be very nice (but can maybe be done in a later phase?). |
| 65 | + |
| 66 | +### Credits |
| 67 | + |
| 68 | +Some of these ideas where taken from the [`symfony/messenger` component](https://github.com/symfony/messenger/blob/3d65f22f9a56f6475c19999fdbc3a897cefc8900/Command/ConsumeMessagesCommand.php#L77-L80). |
| 69 | + |
| 70 | +## Problem 2: deployment problems |
| 71 | + |
| 72 | +### The problem |
| 73 | + |
| 74 | +When deploying a new version of Magento/third party/custom code to a server, we can potentially run against a problem where consumers are still using old code loaded in memory. This is not only the code the consumer is running itself, but might also be Magento core code itself for the consumer processes themselves (which might change when upgrading Magento to a newer version). |
| 75 | + |
| 76 | +Also when using a capistrano-style deployment where you have a symlinked a `current` directory pointing to a certain release directory, after a new deploy the running consumers will still supposedly run from the old release directory referencing a `.pid` file containing its process id in a directory which is in the old release directory. The cronjob `consumers_runner` will go searching for that file in the new release directory and won't find it. Causing a second consumer to start up, even though the old one is still running. (This is probably already fixed by this unreleased commit: [MC-18477](https://github.com/magento/magento2/commit/1d9e07b218c7c8ad1f05706828cb2dd47d2d2d58)) |
| 77 | + |
| 78 | +### The suggested solution |
| 79 | + |
| 80 | +The suggestion here would be to give deployment scripts the option to signal the running consumer proccesses to handle their current messages and then kill themselves as soon as possible. |
| 81 | +We already have the poison pill functionality which could be used here, but the problem with that one is that it will only be checked when a new message appears in the queue. If no messages appear in the queue during a deploy, and we only update the poison pill version, the consumer will still keep running because it doesn't check for the poison pill version until a new message appears. |
| 82 | +So I would suggest some kind of dummy message to be created in each queue which only purpose is for the consumers to check the poison pill version and that message then should get removed again from the queue and nothing should be done with it. |
| 83 | + |
| 84 | +I'm seeing this as some new command being added to `bin/magento` (`queue:consumers:suicide` ?) which does both these things: |
| 85 | + |
| 86 | +- updates the poison pill version |
| 87 | +- sends a dummy message to all queues |
0 commit comments