Description
Description
Motivation
Currently, the pm.max_children
value is set by default to 5
. This causes PHP applications to only accept 5 currently running PHP processes and rejecting the rest, causing a Gateway Timeout exception from the upstream web server (e.g. nginx). This is not a problem for small applications or for properly optimized applications, as each individual request usually is processed quite quickly. However, as soon as you have a long(er) running request, it can very quickly cause a denial of service of your entire web application.
Example
Let's consider the following scenario:
/download.php
is a proxy script that fetches a file from disk and passes it through to the client. Therefore, the request stays open until the download has concluded. Nextcloud, for example, uses a similar architecture, which is where I noticed this issue.- 5 users (or one user via 5 connections) downloads a file using
/download.php.
The download takes a few minutes. - During this time, a new user visits the website (or the current user tries to initiate a 6th download, etc.). At this point, because
pm.max_children
is exhausted, php-fpm no longer accepts new requests. After a timeout, all users trying to access the website will receive aGateway Timeout
from the web server.
This issue can quickly appear not only with special download cases like the one described above, but also during several other scenarios. Such as:
- A frontend makes multiple simultaneous requests to an PHP API with several expensive database queries.
- For some reason, an internal dependency (such as an external API called from within PHP) has a delay causing a few processes to run for a couple of seconds.
- Someone forgot to remove a
sleep(1)
from the code while debugging. - The website is poorly optimized.
- etc.
Why this is suddenly an issue
As soon as 5 instances of a PHP FPM script are running simultaneously, the entire web application becomes unavailable, even if the FPM server is just waiting for IO or network during the script execution. This can obviously be exploited by malicious attackers and leads to bad experiences during traffic spikes. I would assume a non-significant amount of "hugs of deaths" can be attributed to this value being exhausted. I would argue that in $currentYear, 5 concurrent requests is no longer a sensible default, considering how many web frontends today regularly send 5 or more concurrent requests to (PHP) backend servers, and how in the age of AI crawlers even simple websites get occasionally hit with many concurrent queries.
Solution
The simple solution would be to just increase pm.max_children
to a more sensible value matching modern hardware targets. I would suggest increasing it to at least 20 if you want to be conservative (usually, a limit that low won't even have a $5 VPS sweating), but I see no harm in setting it to 128 or even 1000. In the end, the real limiting factor here should be the performance of the system anyways. However, having this value capped at 5 by default means that scaling the system beyond anything with more than 512 MB RAM (5*128M default memory limit) will have no effect if you don't know to adjust this variable.
Alternative Solution
An alternative approach would be to introduce a more flexible scaling limit that takes into account the system memory, the memory_limit
configuration and a percentage threshold to dynamically scale based on available memory.
But I'm not sure if it's worth spending the development effort on such a sophisticated process when increasing the limit doesn't really come with any significant downsides, as the server either stops serving requests due to the limit or due to running out of system resources.