-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
GH-95704: Don't suppress errors from tasks when TG is cancelled #95761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-95704: Don't suppress errors from tasks when TG is cancelled #95761
Conversation
When a task catches CancelledError and raises some other error, the other error should not silently be suppressed. Exception: when uncancel() != 0, always propagate CancelledError. NOTE: This represents a change in behavior (hence the need to change three tests). But it is only an edge case. We need to discuss with the RM whether to backport to 3.11RC2, or 3.11.1, or not at all.
@Tinche Please review as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of the old boy scouts rule, 'Always leave the campground cleaner than you found it', can we give the changed tests meaningful names and docstrings?
The functionality change looks ok, although I was a little confused at first as to why we're now also checking self._parent_cancel_requested
, so might want to add a small comment there?
Hm, I have some issues with boy scouts. :-) I know I let you do that in the past, but when we started comparing the code to the original in EdgeDb I realized that it would be better to keep the names and style consistent with the original. Eventually we could do a separate cleanup PR that adds docstrings (actually for tests I prefer comments, since the unittest package does something funny with docstrings) and maybe add something to the name (e.g.
Yeah, that I can do. |
Not sure if the comment explains things well enough, but I am loath to insert a novella. Everything about propagating cancellation is confusingly complicated. |
Lib/asyncio/taskgroups.py
Outdated
# Propagate CancelledError if there is one, except if there | ||
# are other errors and we're not required by uncancel() != 0. | ||
# (The latter means we *must* propagate the cancellation.) | ||
if (propagate_cancellation_error is not None and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this now suppress the TimeoutError when used with async with timeout():
?
I'll try and make a repro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the condition is true in fewer cases than the original code (where it was just if propagate_cancellation_error is not None:
) I don't see how it could suppress something that wouldn't be suppressed previously. But I'm not sure what you're thinking of, so please make a repro.
Misc/NEWS.d/next/Library/2022-08-08-01-42-11.gh-issue-95704.MOPFfX.rst
Outdated
Show resolved
Hide resolved
@gvanrossum I'm trying grok your newly added comment, finding the state machine a little complicated. IIUC, the comments are saying if |
Co-authored-by: Thomas Grainger <[email protected]>
Me too, but it's the only thing that makes all the original tests pass.
Note that "propagate" here only refers to In the remaining case where we swallow child exceptions (this is the
It looks like test 16 is similar but with a nested runner task. So what this does at a high level is, it propagates Are these the right semantics? We can debate that, I'm not sure myself. I feel that given there were two tests that validated this behavior (with not a race condition in sight, unlike tests 08, 11, 12) it is defensible. OTOH what we did with tests 08, 11, 12 was that we decided that if a child raised an error we should propagate that, and so maybe as long as we are changing tests and semantics anyway, we may as well decide to propagate the child's error in this case as well. It would certainly simplify the logic a bit. Thoughts? I'd really like @1st1's opinion here, too. And maybe this is what @graingert was alluding to? |
I think what you need to do here is raise This happens here in trio https://github.com/python-trio/trio/blob/2eed34c9e01ce279bfe30fd9d8f7ae897f31a515/trio/_core/_run.py#L930-L936 Here |
But that is exactly what we're contorting ourselves not to do -- mainly because We have to decide what's more important -- the |
We don't want people to do that anyway
Right, but code can't get itself into this situation without using 3.11 features anyway. The problem only presents itself when an except* is sandwiched between a timeout and a taskgroup |
Maybe I'll take a minute and see what Trio does here. Might be an interesting data point. My gut feeling is: we never swallow child errors. But I haven't thought about it much. |
Ah I built a tool to do this. https://gist.github.com/graingert/898a76796c925da7b26a183811a410fa#file-demo-py-L76 |
What Trio does is not necessarily what we should do, they have different constraints (its MultiError or whatever behaves differently than ExceptionGroup). Anyway, here's a scenario that I was thinking of: t = asyncio.create_task(run())
<do other stuff>
t.cancel()
try:
res = await t # Wait for task to complete
except CancelledError:
<handle it>
except ZeroDivisionError:
<handle that>
# etc. This is perfectly legit code in 3.10 and before (except for the syntax to wait for a task), and I don't want to break it. Now suppose run() was not completely under our control (say it invoked some other library that might be upgraded to use TaskGroup), and if that ends up somehow wrapping the CancelledError in an exception, we'd have broken this code. This is why CancelledError is always propagated unwrapped. So we just have to decide whether to raise CancelledError or the child error(s) -- we can't raise both. |
Sorry for being hard to reach. It took me some time to unpack this discussion (I glanced over it a few times and could miss a few details, sorry!) and then to debug the example from #95704 (comment) on the latest I think the correct solution is to always propagate all errors, otherwise asyncio programs would be too hard to debug. Thomas' example is a perfect illustration of that--without AttrubuteErrors in the printed traceback it's almost impossible to understand why his program is abruptly crashing. The current How do we fix the current behavior? The key thing we should probably keep is supporting class CancelledErrorGroup(BaseExceptionGroup, CancelledError):
"""A group of errors containing a CancelledError."""
# This error type needed for "backwards compatibility" with code that
# has `except CancelledError: ...; raise`-style blocks.
def derive(self, excs):
et = BaseExceptionGroup
if any(e for e in excs if isinstance(e, CancelledError)):
et = CancelledErrorGroup
return et(self.message, excs) The most important fix then is in these lines of diff --git a/Lib/asyncio/taskgroups.py b/Lib/asyncio/taskgroups.py
index 097b4864f7..68391d080c 100644
--- a/Lib/asyncio/taskgroups.py
+++ b/Lib/asyncio/taskgroups.py
@@ -115,14 +121,14 @@ async def __aexit__(self, et, exc, tb):
if self._base_error is not None:
raise self._base_error
- if propagate_cancellation_error is not None:
- # The wrapping task was cancelled; since we're done with
- # closing all child tasks, just propagate the cancellation
- # request now.
- raise propagate_cancellation_error
+ group_type = BaseExceptionGroup
- if et is not None and et is not exceptions.CancelledError:
- self._errors.append(exc)
+ if propagate_cancellation_error is not None:
+ self._errors.append(propagate_cancellation_error)
+ group_type = exceptions.CancelledErrorGroup
+ else:
+ if erroring_out and not issubclass(et, exceptions.CancelledError):
+ self._errors.append(exc)
if self._errors:
# Exceptions are heavy objects that can have object
@@ -131,7 +137,7 @@ async def __aexit__(self, et, exc, tb):
errors = self._errors
self._errors = None
- me = BaseExceptionGroup('unhandled errors in a TaskGroup', errors)
+ me = group_type('unhandled errors in a TaskGroup', errors)
raise me from None This seems to do the trick and all asyncio tests pass unmodified. Thomas' program now outputs a correct and debuggable traceback: Click to see the better tracebaack
And the old-style code that would expect a Here's the full diff that I have with a couple more fixes & nits.diff --git a/Lib/asyncio/exceptions.py b/Lib/asyncio/exceptions.py
index 5ece595aad..961d039aed 100644
--- a/Lib/asyncio/exceptions.py
+++ b/Lib/asyncio/exceptions.py
@@ -2,7 +2,8 @@
__all__ = ('BrokenBarrierError',
- 'CancelledError', 'InvalidStateError', 'TimeoutError',
+ 'CancelledError', 'CancelledErrorGroup',
+ 'InvalidStateError', 'TimeoutError',
'IncompleteReadError', 'LimitOverrunError',
'SendfileNotAvailableError')
@@ -11,6 +12,16 @@ class CancelledError(BaseException):
"""The Future or Task was cancelled."""
+class CancelledErrorGroup(BaseExceptionGroup, CancelledError):
+ """A group of errors containing a CancelledError."""
+
+ # This error type needed for "backwards compatibility" with code that
+ # has `except CancelledError: ...; raise`-style blocks.
+
+ def derive(self, excs):
+ return CancelledErrorGroup(self.message, excs)
+
+
TimeoutError = TimeoutError # make local alias for the standard exception
diff --git a/Lib/asyncio/taskgroups.py b/Lib/asyncio/taskgroups.py
index 097b4864f7..9505793b03 100644
--- a/Lib/asyncio/taskgroups.py
+++ b/Lib/asyncio/taskgroups.py
@@ -55,29 +55,35 @@ async def __aenter__(self):
async def __aexit__(self, et, exc, tb):
self._exiting = True
- if (exc is not None and
+ # True if the context manager is aborting with an error.
+ erroring_out = et is not None
+
+ if (erroring_out and
self._is_base_error(exc) and
self._base_error is None):
self._base_error = exc
- propagate_cancellation_error = \
- exc if et is exceptions.CancelledError else None
+ propagate_cancellation_error = None
+ if et is not None and issubclass(et, exceptions.CancelledError):
+ propagate_cancellation_error = exc
+
if self._parent_cancel_requested:
- # If this flag is set we *must* call uncancel().
+ # If this flag is set it means that this TaskGroup requested
+ # the parent task to be cancelled. We now *must* call uncancel().
if self._parent_task.uncancel() == 0:
# If there are no pending cancellations left,
# don't propagate CancelledError.
propagate_cancellation_error = None
- if et is not None:
+ if erroring_out:
if not self._aborting:
- # Our parent task is being cancelled:
+ # Either the parent task is being cancelled:
#
# async with TaskGroup() as g:
# g.create_task(...)
- # await ... # <- CancelledError
+ # await ... # <- errors out with CancelledError
#
- # or there's an exception in "async with":
+ # or there's an exception the "async with" block:
#
# async with TaskGroup() as g:
# g.create_task(...)
@@ -97,7 +103,7 @@ async def __aexit__(self, et, exc, tb):
await self._on_completed_fut
except exceptions.CancelledError as ex:
if not self._aborting:
- # Our parent task is being cancelled:
+ # The parent task is being cancelled:
#
# async def wrapper():
# async with TaskGroup() as g:
@@ -113,16 +119,31 @@ async def __aexit__(self, et, exc, tb):
assert not self._tasks
if self._base_error is not None:
- raise self._base_error
+ # asyncio currently can't handle SystemExit or KeyboardIterrupt
+ # without corrupting its state. In an event of such an error
+ # we simply immediately propagate it ignoring other errors
+ # (but we still want to at least log them).
+ try:
+ for error in self._errors:
+ if isinstance(error, exceptions.CancelledError):
+ continue
+ cause = type(self._base_error).__name__
+ self._loop.call_exception_handler({
+ 'message': f'ignoring an error due to {cause}',
+ 'exception': error,
+ 'task': self._parent_task,
+ })
+ finally:
+ raise self._base_error
+
+ group_type = BaseExceptionGroup
if propagate_cancellation_error is not None:
- # The wrapping task was cancelled; since we're done with
- # closing all child tasks, just propagate the cancellation
- # request now.
- raise propagate_cancellation_error
-
- if et is not None and et is not exceptions.CancelledError:
- self._errors.append(exc)
+ self._errors.append(propagate_cancellation_error)
+ group_type = exceptions.CancelledErrorGroup
+ else:
+ if erroring_out and not issubclass(et, exceptions.CancelledError):
+ self._errors.append(exc)
if self._errors:
# Exceptions are heavy objects that can have object
@@ -131,7 +152,7 @@ async def __aexit__(self, et, exc, tb):
errors = self._errors
self._errors = None
- me = BaseExceptionGroup('unhandled errors in a TaskGroup', errors)
+ me = group_type('unhandled errors in a TaskGroup', errors)
raise me from None
def create_task(self, coro, *, name=None, context=None):
@@ -195,10 +216,10 @@ def _on_task_done(self, task):
return
if not self._aborting and not self._parent_cancel_requested:
- # If parent task *is not* being cancelled, it means that we want
- # to manually cancel it to abort whatever is being run right now
- # in the TaskGroup. But we want to mark parent task as
- # "not cancelled" later in __aexit__. Example situation that
+ # If the parent task *is not* being cancelled, it means that we
+ # want to manually cancel it, to abort whatever is being currently
+ # awaited in the TaskGroup. But we want to mark the parent task as
+ # "not cancelled" later in __aexit__. An example situation that
# we need to handle:
#
# async def foo():
@@ -206,13 +227,14 @@ def _on_task_done(self, task):
# async with TaskGroup() as g:
# g.create_task(crash_soon())
# await something # <- this needs to be canceled
- # # by the TaskGroup, e.g.
+ # # by the TaskGroup, i.e.
# # foo() needs to be cancelled
# except Exception:
# # Ignore any exceptions raised in the TaskGroup
# pass
# await something_else # this line has to be called
- # # after TaskGroup is finished.
+ # # after the TaskGroup is
+ # # finished.
self._abort()
self._parent_cancel_requested = True
self._parent_task.cancel()
@gvanrossum Guido I'm happy to push to your branch or make a new one with this diff if you want me to. I think it would need a couple new tests (and |
I don't think you actually need to support this usecase. Because the only way to end up in this situation is when you use the new TaskGroup code, don't handle the grouped exceptions, and want to catch a single CancelledError only. Consider what would happen if someone replaced their If TaskGroup now takes special effort to re-raise a bare import asyncio
import contextlib
class Sock:
async def aclose():
pass
async def foo():
with contextlib.closing(Sock()):
await asyncio.sleep(1)
print("hello")
async def demo():
task = asyncio.current_task()
task.get_loop().call_later(0.1, task.cancel)
try:
await asyncio.gather(foo(), foo())
except asyncio.CancelledError:
pass # the intent here is to only reach it if asyncio.gather was cancelled cleanly
asyncio.run(demo()) |
That's not the only way. You can have a library that takes an arbitrary task/coroutine and awaits on it, reacting specifically to CancelledError and re-raising it immediately. It's those scenarios that I care about.
Perfect compatibility with Also "break existing code" isn't an argument here, because a user would be switching from old API to new API that's not declared as a drop-in replacement. |
so this would be something like this? async def some_library(async_fn):
try:
await async_fn()
except asyncio.CancelledError:
raise
except BaseException:
# accidentally catching a BaseExceptionGroup('unhandled errors in a TaskGroup', [asyncio.CancelledError()])
await cleanup()
raise
yeah that's the behavior I'm suggesting here |
Yeah, along the lines of this. FWIW I don't want to push hard for the approach I suggested in #95761 (comment) -- the I do feel strongly about including all errors into the group though--my own opinion is that that's key for the usability of asyncio. I always viewed exception groups as a mechanism to deliver the cancellation signal without hiding other errors and IMO we should use the mechanism fully. |
We've established one thing clearly: even though there were several tests that seemed to be written to ignore "side errors" (let's call them that) accompanying a cancellation, we shouldn't assume that's the best design, and it's not too late to do something better -- in particular, we should not ignore side errors. We're past RC1, and I don't think that a bug in the cancel behavior of the brand new My proposal is to keep the PR as it is, except remove the Now, this will mean that code (old or new) using async with asyncio.timeout(1):
try:
await asyncio.sleep(2) # Will be cancelled by the timeout
finally:
1/0 # oops, crash in cleanup this will report Note that a I considered proposing adding the |
Any scenario where a task crashes in cleanup upon cancellation will now result in an ExceptionGroup wrapping the crash(es) instead of propagating CancelledError and ignoring the side errors.
Remember that when we designed exception groups, we decided that changing any API from raising When we cancel a task (or a future -- remember tasks are futures), and the task handles the cancellation cleanly, then we expect waiting for the task to raise I looked at the if self._task.uncancel() == 0 and exc_type is exceptions.CancelledError: I must assume there's plenty of careful cancellation handling code out there that similarly needs CE to be not wrapped. Anyway, this PR is updated, please have a look. |
Yes, I agree. Although TaskGroups is a new API, so we can tweak it within certain limits.
Yes. Unless we implement what I'm suggesting in #95761 (comment) -- to make a special EG type that is a subclass of CancelledError. It's really quite elegant -- if an error group contains a CancelledError it itself becomes a CancelledError.
This can be fixed to work with my proposal - we just need to replace I'm approving this PR because it's better to land something than nothing in this case. But I think that its suboptimal in its design because it obstructs the cancellation in certain cases and can make things very hard to understand. Consider the traceback of the Thomas' script. With this PR:
With the approach I'm suggesting in #95761 (comment):
Guido, I'm curious what you think specifically about my idea. That said this PR is still a huge improvement over what's in the |
@1st1 can you make a draft PR for your CancelledError as an ExceptionGroup change? I suspect it will end up hiding exceptions again in |
I think it will be full of surprises. I did a few experiments, and it seems that if you have a a subclass of (Even if we were to decide that that as a bug in the exception group implementation, it would be too late to fix in 3.11.)
It may be an antipattern, but the taskgroup code I inherited from you also uses this, and I suspect it's a common pattern, because people don't expect In addition, in the scenario where we bubble up the So I think that, alas, your idea is not going to work, and we're better of with my "conservative" approach which replaces the
Thanks -- I will give it a little more time but not too much since RC2 is already looming. |
I believe this anti-pattern comes from people trying to emulate the |
As long as the door is open for a fix in 3.12, 3.11.1? We can fix it in the backport, people are going to need to use the backport until Oct 2026 already hopefully they're happy to wait till October 2027 |
We're porting to EG already so I'm not sure what you mean by this |
And what is in your opinion the difference between |
That depends on whether it is backwards compatible. And whether we actually think it is a useful feature. I wouldn't get my hopes up.
And by "the backport" you mean the emulation on PyPI, right? That seems of limited relevance given that we're talking about the semantics of My personal opinion: give up on this idea, it is more complicated than you think. |
IIRC this was when we were still deciding whether to throw away "side errors" or not. That has been decided (we don't), so I see no reason to further litigate this. |
I'm talking about |
🤦 I meant to link to the top of the discussion, but scrolling discourse changes the URL in the browser. It is indeed totally irrelevant |
@graingert Please focus. Do you agree with Yury that the current PR is an improvement over what’s in main? Do you have any specific feedback on the PR? (Do any others still reading have feedback? @Tinche ?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM.
I was warming up to the cleverness of Yury's idea but if you think there be dragons I'm convinced.
@pablogsal told me that he's open to backporting this to 3.11, so I'll go through the movements. |
Thanks @gvanrossum for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11. |
GH-96048 is a backport of this pull request to the 3.11 branch. |
…pythonGH-95761) When a task catches CancelledError and raises some other error, the other error should not silently be suppressed. Any scenario where a task crashes in cleanup upon cancellation will now result in an ExceptionGroup wrapping the crash(es) instead of propagating CancelledError and ignoring the side errors. NOTE: This represents a change in behavior (hence the need to change several tests). But it is only an edge case. Co-authored-by: Thomas Grainger <[email protected]> (cherry picked from commit f51f54f) Co-authored-by: Guido van Rossum <[email protected]>
…5761) When a task catches CancelledError and raises some other error, the other error should not silently be suppressed. Any scenario where a task crashes in cleanup upon cancellation will now result in an ExceptionGroup wrapping the crash(es) instead of propagating CancelledError and ignoring the side errors. NOTE: This represents a change in behavior (hence the need to change several tests). But it is only an edge case. Co-authored-by: Thomas Grainger <[email protected]> (cherry picked from commit f51f54f) Co-authored-by: Guido van Rossum <[email protected]>
When a task catches CancelledError and raises some other error,
the other error should not silently be suppressed.
Exception: when uncancel() != 0, always propagate CancelledError.
NOTE: This represents a change in behavior (hence the need to
change three tests). But it is only an edge case.
We need to discuss with the RM whether to backport to 3.11RC2,
or 3.11.1, or not at all.