-
-
Notifications
You must be signed in to change notification settings - Fork 32k
multiprocessing.get_logger() logger deadlock on first call by subprocess to logger.info("...") due to internal logger.debug(...) call by multiprocessing.Queue._start_thread #91555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I reproduced in 3.11 latest version on MacOS, both the lockup and success when setting higher debug level. |
see also: |
minimal-mpq-repro.pyimport logging.handlers
import multiprocessing
queue = multiprocessing.Queue()
handler = logging.handlers.QueueHandler(queue)
logger = multiprocessing.get_logger()
logger.addHandler(handler)
logger.setLevel("DEBUG")
logger.debug("deadlock") Queue's have a
We could make the lock recursive, however that will just turn the deadlock into a stack overflow. At present, while a handler is processing a log record for a logger, it must not log anything to that same logger. Arguably this is fair enough: that seems like an inherently bad idea, and perhaps this issue (and #90321) should just be closed as a user error. In fact there is a warning about exactly these sort of possible deadlocks in the As an alternative, we could relatively easily eliminate this whole class of bug by temporarily disabling logging for a given logger while its handlers are being called. I'll push a PR for consideration shortly. Either way, I don't think this is a multiprocessing bug per se. |
Prevent the possibility of re-entrancy and deadlock or infinite recursion caused by logging triggered by logging by disabling logging while the logger is handling log messages.
What value does that internal |
I can't comment on the value of that logging but removing it will indeed fix this issue and #90321. In this case, since I've done a quick code inspection of the other standard logging handlers and I don't think any of them have the same issue, although some do fairly involved things like making HTTP requests. More widely, custom handlers calling into third-party code may run into this issue and either having latent bugs or need tricky work-arounds. E.g. see the Sentry logging integration code: If folks think removing the logging from |
Well,
I only had a quick look, but it seems to just sidestep their own internal logging because it would cause recursion errors.
I would suggest opening a |
OK, will do, thanks!
As it happens I was looking at a different bug in multiprocessing yesterday, and with this fresh in my mind used the built-in logging: it was very useful. It is also a publicly documented part of the module's API. Removing it doesn't feel appropriate to me, but let's see what other people think. |
If not completely removing it, then perhaps consider keeping the fix in |
I disagree that the fix belongs in More over, this is just one example of a whole class of potential bugs, which I think we can and should prevent entirely by making the logging framework more robust, as per the PR. |
Sorry, which lower-level data structure do you mean, and which unrelated module? Sorry if I'm being dense. |
Not at all! "Lower-level" was inaccurate and a poor choice of words on my part, sorry. Let me see if I can explain myself better.
So, I don't think we should treat it as a user error, but I don't think it is a As another example, an external logging handler which sends log messages to an HTTP endpoint will probably be implemented using one of the existing, mature external HTTP modules such as In my opinion, the best and most practical way forward is to deal with this in the logging core itself. My preference is to just drop any such recursive logging (this potentially loses useful log messages). Another option would be to store and send them separately, later (this doesn't work if sending a log message produces another log message, ad infinitum). |
With these changes: diff --git a/Lib/multiprocessing/queues.py b/Lib/multiprocessing/queues.py
index 925f0439000..73f1535581b 100644
--- a/Lib/multiprocessing/queues.py
+++ b/Lib/multiprocessing/queues.py
@@ -70,7 +70,7 @@ def _reset(self, after_fork=False):
if after_fork:
self._notempty._at_fork_reinit()
else:
- self._notempty = threading.Condition(threading.Lock())
+ self._notempty = threading.Condition(threading.RLock())
self._buffer = collections.deque()
self._thread = None
self._jointhread = None
diff --git a/Lib/multiprocessing/util.py b/Lib/multiprocessing/util.py
index b7192042b9c..5d54a3bcb79 100644
--- a/Lib/multiprocessing/util.py
+++ b/Lib/multiprocessing/util.py
@@ -35,33 +35,35 @@
INFO = 20
SUBWARNING = 25
+META_LOGGER_NAME = 'multiprocessing.meta'
LOGGER_NAME = 'multiprocessing'
DEFAULT_LOGGING_FORMAT = '[%(levelname)s/%(processName)s] %(message)s'
_logger = None
+_meta_logger = None
_log_to_stderr = False
def sub_debug(msg, *args):
- if _logger:
- _logger.log(SUBDEBUG, msg, *args, stacklevel=2)
+ if _meta_logger:
+ _meta_logger.log(SUBDEBUG, msg, *args, stacklevel=2)
def debug(msg, *args):
- if _logger:
- _logger.log(DEBUG, msg, *args, stacklevel=2)
+ if _meta_logger:
+ _meta_logger.log(DEBUG, msg, *args, stacklevel=2)
def info(msg, *args):
- if _logger:
- _logger.log(INFO, msg, *args, stacklevel=2)
+ if _meta_logger:
+ _meta_logger.log(INFO, msg, *args, stacklevel=2)
def sub_warning(msg, *args):
- if _logger:
- _logger.log(SUBWARNING, msg, *args, stacklevel=2)
+ if _meta_logger:
+ _meta_logger.log(SUBWARNING, msg, *args, stacklevel=2)
def get_logger():
'''
Returns logger used by multiprocessing
'''
- global _logger
+ global _logger, _meta_logger
import logging
with logging._lock:
@@ -70,6 +72,9 @@ def get_logger():
_logger = logging.getLogger(LOGGER_NAME)
_logger.propagate = 0
+ _meta_logger = logging.getLogger(META_LOGGER_NAME)
+ _meta_logger.propagate = 0
+
# XXX multiprocessing should cleanup before logging
if hasattr(atexit, 'unregister'):
atexit.unregister(_exit_function)
the hang or stack overflow does not occur in
What problems do you see with implementing my suggested changes in Update: tweaked diff to add a missing |
Also, please add a link here to the Never mind, found it: https://discuss.python.org/t/dealing-with-unbounded-recursion-in-logging-handlers/86365 |
Once again, as I see it,
If we decide the user is responsible for configuring logging correctly in order to avoid these issues (i.e. we treat this issue as a "user error") we don't need to do anything, except maybe adding a note to the docs.
It will potentially break existing code that configures the current logger and wants those logs. It doesn't prevent the bug, it just alters the configuration required to trigger it. |
Then we can leave things as they are. Your proposed solution (disabling the logger while handlers are active) seems to potentially leave the logger disabled for a long period of time (indeterminate, depending on what other handlers are configured). Certain handlers (such as
Well, in this case Ultimately, the reason for this problem seems to be that the
But surely it prevents the bug in the simple configurations described in the example reproducers here and in #90321? As things are, they wouldn't get the logs anyway, because of the deadlock problem, right? A number of people who would currently run into the bug would do so no longer, so couldn't that could be considered an improvement? If the configuration required to trigger is more complicated, then someone would have to go out of their way to trigger it, right? What would be the use case for such configuration? I'm not actually proposing to make the changes in the above diff - it's only to show that other approaches are possible and to spark discussion. |
For sure we can. I think we can do better, but I could be wrong, and it isn't a big deal. Treating it as user error is a perfectly reasonable way to resolve the issue.
It disables logging on a per-thread basis and only while the thread runs the message handlers (+filters). Which is to say, it is disabled only while running code that would trigger this bug if it logged a message. Or at least, that is my intent: if I've overlooked anything or there is a bug, please let me know! It doesn't stop other threads from logging to the same logger and/or handlers.
If we cannot find on an acceptable solution to prevent this in the code then I will indeed prepare a documentation patch. I remain optimistic we can prevent it, however :-)
The way I've been thinking of it is: if some code is sending logs to a
Understood, and thanks. I find the feedback very valuable, from you and others. For now, however, I still think suppressing the logging while running the log handling code is the best solution. |
Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages.
…1812) Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages. (cherry picked from commit 2561e14) Co-authored-by: Duane Griffin <[email protected]>
…1812) Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages. (cherry picked from commit 2561e14) Co-authored-by: Duane Griffin <[email protected]>
…GH-133898) Co-authored-by: Duane Griffin <[email protected]>
…GH-133899) Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages. (cherry picked from commit 2561e14) Co-authored-by: Duane Griffin <[email protected]>
A Python bug (python/cpython#91555) was preventing logs from the embeddings maintainer process from printing. The bug is fixed in Python 3.14, but a viable workaround is to use the multiprocessing Manager, which better manages mp queues and causes the logging to work correctly.
* use mp Manager to handle logging queues A Python bug (python/cpython#91555) was preventing logs from the embeddings maintainer process from printing. The bug is fixed in Python 3.14, but a viable workaround is to use the multiprocessing Manager, which better manages mp queues and causes the logging to work correctly. * consolidate * fix typing
Uh oh!
There was an error while loading. Please reload this page.
Python: v3.10.2
Platform: Windows 10
Description: A deadlock occurs when a subprocess uses a logging.Logger returned multiprocessing.get_logger() where the initial logging level at the time of the first call to logger.info is level DEBUG or lower, which causes, during that same initial call to logger.info, an internal call to logger.debug(...) (at top of multiprocessing.Queue._start_thread), all of which causes the same thread to attempt to re-acquire an already held lock.
Workaround: Set logging level to INFO or higher (something above DEBUG) to prevent the logging.debug() statement at the top of Queue._start_thread from attempting to init/lock while init is already in progress.
The following example exhibits the issue when SHOW_THE_DEADLOCK==True. Set SHOW_THE_DEADLOCK==False to observe the workaround.
The following is an annotated stack from the above example when SHOW_THE_DEADLOCK==True.
Linked PRs
The text was updated successfully, but these errors were encountered: