GH-95704: Don't suppress errors from tasks when TG is cancelled #95761

gvanrossum · 2022-08-07T15:42:24Z

When a task catches CancelledError and raises some other error,
the other error should not silently be suppressed.

Exception: when uncancel() != 0, always propagate CancelledError.

NOTE: This represents a change in behavior (hence the need to
change three tests). But it is only an edge case.

We need to discuss with the RM whether to backport to 3.11RC2,
or 3.11.1, or not at all.

Issue: TaskGroup suppresses grouped exceptions due to parent task cancellation #95704

When a task catches CancelledError and raises some other error, the other error should not silently be suppressed. Exception: when uncancel() != 0, always propagate CancelledError. NOTE: This represents a change in behavior (hence the need to change three tests). But it is only an edge case. We need to discuss with the RM whether to backport to 3.11RC2, or 3.11.1, or not at all.

gvanrossum · 2022-08-07T15:43:17Z

@Tinche Please review as well.

Tinche

In the spirit of the old boy scouts rule, 'Always leave the campground cleaner than you found it', can we give the changed tests meaningful names and docstrings?

The functionality change looks ok, although I was a little confused at first as to why we're now also checking self._parent_cancel_requested, so might want to add a small comment there?

gvanrossum · 2022-08-08T01:40:34Z

In the spirit of the old boy scouts rule, 'Always leave the campground cleaner than you found it', can we give the changed tests meaningful names and docstrings?

Hm, I have some issues with boy scouts. :-)

I know I let you do that in the past, but when we started comparing the code to the original in EdgeDb I realized that it would be better to keep the names and style consistent with the original.

Eventually we could do a separate cleanup PR that adds docstrings (actually for tests I prefer comments, since the unittest package does something funny with docstrings) and maybe add something to the name (e.g. test_taskgroup_08_cancellation_in_body) but I am loath to clean up just the changed tests -- it will look weird (PEP 8 has a rule about respecting the local style over almost everything else: "Consistency within one module or function is the most important").

The functionality change looks ok, although I was a little confused at first as to why we're now also checking self._parent_cancel_requested, so might want to add a small comment there?

Yeah, that I can do.

gvanrossum · 2022-08-08T02:00:10Z

Not sure if the comment explains things well enough, but I am loath to insert a novella. Everything about propagating cancellation is confusingly complicated.

graingert · 2022-08-08T07:52:48Z

Lib/asyncio/taskgroups.py

+        # Propagate CancelledError if there is one, except if there
+        # are other errors and we're not required by uncancel() != 0.
+        # (The latter means we *must* propagate the cancellation.)
+        if (propagate_cancellation_error is not None and


Doesn't this now suppress the TimeoutError when used with async with timeout():?

I'll try and make a repro

Since the condition is true in fewer cases than the original code (where it was just if propagate_cancellation_error is not None:) I don't see how it could suppress something that wouldn't be suppressed previously. But I'm not sure what you're thinking of, so please make a repro.

Misc/NEWS.d/next/Library/2022-08-08-01-42-11.gh-issue-95704.MOPFfX.rst

Tinche · 2022-08-08T13:45:16Z

@gvanrossum I'm trying grok your newly added comment, finding the state machine a little complicated.

IIUC, the comments are saying if uncancel() returns non-zero (i.e. there are others up-stack that have also cancelled and are now waiting for the CancelledError), we should swallow child exceptions? But I think we should always propagate, so exceptions don't pass silently. WYT?

Co-authored-by: Thomas Grainger <[email protected]>

gvanrossum · 2022-08-08T15:35:13Z

@gvanrossum I'm trying grok your newly added comment, finding the state machine a little complicated.

Me too, but it's the only thing that makes all the original tests pass.

IIUC, the comments are saying if uncancel() returns non-zero (i.e. there are others up-stack that have also cancelled and are now waiting for the CancelledError), we should swallow child exceptions? But I think we should always propagate, so exceptions don't pass silently. WYT?

Note that "propagate" here only refers to CancelledError. When we don't "propagate" because there are other errors, those will be bundled in an ExceptionGroup and raised below.

In the remaining case where we swallow child exceptions (this is the self._parent_cancel_requested part) because otherwise test_taskgroup_{15,16} fail. Maybe we need to consider these tests in detail. The first one does the following:

Create a task group with a child that will crash after 0.3s
in the runner (parent) task, in the body of the async with TaskGroup()... statement, sleep 10s
after 0.1s the parent task is cancelled
it catches the CancelledError, and sleeps 0.5s more (this is followed by a bare raise but that's not reached)
0.2s into the latter sleep (0.3 into the whole thing), the child crashes
This cancels the parent task again, setting _parent_cancel_requested
TaskGroup.__aexit__ sets propagate_cancellation_error and calls uncancel(), finding the latter returns 1 (so we don't clear propagate_cancellation_error)
We collect the ZeroDivisionError from the child task
We reach the code with the comment, and decide to propagate the CancelledError, ignoring the child error

It looks like test 16 is similar but with a nested runner task.

So what this does at a high level is, it propagates CancelledError in favor of ZeroDivisionError because (through the magic of uncancel()) it knows that the parent taks was cancelled "from the outside".

Are these the right semantics? We can debate that, I'm not sure myself. I feel that given there were two tests that validated this behavior (with not a race condition in sight, unlike tests 08, 11, 12) it is defensible. OTOH what we did with tests 08, 11, 12 was that we decided that if a child raised an error we should propagate that, and so maybe as long as we are changing tests and semantics anyway, we may as well decide to propagate the child's error in this case as well. It would certainly simplify the logic a bit.

Thoughts? I'd really like @1st1's opinion here, too. And maybe this is what @graingert was alluding to?

graingert · 2022-08-08T15:51:03Z

So what this does at a high level is, it propagates CancelledError in favor of ZeroDivisionError because (through the magic of uncancel()) it knows that the parent taks was cancelled "from the outside".

I think what you need to do here is raise CancelledError wrapped in a BaseExceptionGroup allong with the other self._errors

This happens here in trio https://github.com/python-trio/trio/blob/2eed34c9e01ce279bfe30fd9d8f7ae897f31a515/trio/_core/_run.py#L930-L936

Here await checkpoint() is equivalent to the uncancel() == 0 check

gvanrossum · 2022-08-08T15:59:58Z

I think what you need to do here is raise CancelledError wrapped in a BaseExceptionGroup allong with the other self._errors

But that is exactly what we're contorting ourselves not to do -- mainly because except CancelledError would not catch that, and we don't want to require people to use except* CancelledError (especially since that would not work in pre-3.11 code).

We have to decide what's more important -- the CancelledError or the child task errors. (Maybe I'll write more later.)

graingert · 2022-08-08T17:11:41Z

mainly because except CancelledError would not catch that,

We don't want people to do that anyway

especially since that would not work in pre-3.11 code).

Right, but code can't get itself into this situation without using 3.11 features anyway. The problem only presents itself when an except* is sandwiched between a timeout and a taskgroup

Tinche · 2022-08-08T17:58:32Z

Maybe I'll take a minute and see what Trio does here. Might be an interesting data point.

My gut feeling is: we never swallow child errors. But I haven't thought about it much.

graingert · 2022-08-08T18:04:20Z

Maybe I'll take a minute and see what Trio does here. Might be an interesting data point.

My gut feeling is: we never swallow child errors. But I haven't thought about it much.

Ah I built a tool to do this. ~~I'll edit this comment to link it~~

https://gist.github.com/graingert/898a76796c925da7b26a183811a410fa#file-demo-py-L76

gvanrossum · 2022-08-08T18:30:22Z

What Trio does is not necessarily what we should do, they have different constraints (its MultiError or whatever behaves differently than ExceptionGroup).

Anyway, here's a scenario that I was thinking of:

t = asyncio.create_task(run())
<do other stuff>
t.cancel()
try:
    res = await t  # Wait for task to complete
except CancelledError:
    <handle it>
except ZeroDivisionError:
    <handle that>
# etc.

This is perfectly legit code in 3.10 and before (except for the syntax to wait for a task), and I don't want to break it.

Now suppose run() was not completely under our control (say it invoked some other library that might be upgraded to use TaskGroup), and if that ends up somehow wrapping the CancelledError in an exception, we'd have broken this code.

This is why CancelledError is always propagated unwrapped.

So we just have to decide whether to raise CancelledError or the child error(s) -- we can't raise both.

1st1 · 2022-08-09T06:06:05Z

Sorry for being hard to reach. It took me some time to unpack this discussion (I glanced over it a few times and could miss a few details, sorry!) and then to debug the example from #95704 (comment) on the latest taskgroups.py in the main branch. I think I'm more or less up to date now.

I think the correct solution is to always propagate all errors, otherwise asyncio programs would be too hard to debug. Thomas' example is a perfect illustration of that--without AttrubuteErrors in the printed traceback it's almost impossible to understand why his program is abruptly crashing.

The current taskgroup.py implementation ignores side errors and propagates a single CancelledError. This could be a simple oversight by me (and now I feel dumb because I think I remember debugging pretty complex async/await code wondering why some errors appear to be silenced). It could also be the result of the current TaskGroup design predating exception groups. Who knows...

How do we fix the current behavior? The key thing we should probably keep is supporting except CancelledError blocks of code that some libraries can have. A solution would be to introduce the following subclass of BaseException that we'll use whenever we're wrapping a set of errors that also has a CancelledError:

class CancelledErrorGroup(BaseExceptionGroup, CancelledError):
    """A group of errors containing a CancelledError."""

    # This error type needed for "backwards compatibility" with code that
    # has `except CancelledError: ...; raise`-style blocks.

    def derive(self, excs):
        et = BaseExceptionGroup
        if any(e for e in excs if isinstance(e, CancelledError)):
            et = CancelledErrorGroup
        return et(self.message, excs)

The most important fix then is in these lines of taskgroups.py:

diff --git a/Lib/asyncio/taskgroups.py b/Lib/asyncio/taskgroups.py
index 097b4864f7..68391d080c 100644
--- a/Lib/asyncio/taskgroups.py
+++ b/Lib/asyncio/taskgroups.py
@@ -115,14 +121,14 @@ async def __aexit__(self, et, exc, tb):
         if self._base_error is not None:
             raise self._base_error
 
-        if propagate_cancellation_error is not None:
-            # The wrapping task was cancelled; since we're done with
-            # closing all child tasks, just propagate the cancellation
-            # request now.
-            raise propagate_cancellation_error
+        group_type = BaseExceptionGroup
 
-        if et is not None and et is not exceptions.CancelledError:
-            self._errors.append(exc)
+        if propagate_cancellation_error is not None:
+            self._errors.append(propagate_cancellation_error)
+            group_type = exceptions.CancelledErrorGroup
+        else:
+            if erroring_out and not issubclass(et, exceptions.CancelledError):
+                self._errors.append(exc)
 
         if self._errors:
             # Exceptions are heavy objects that can have object
@@ -131,7 +137,7 @@ async def __aexit__(self, et, exc, tb):
             errors = self._errors
             self._errors = None
 
-            me = BaseExceptionGroup('unhandled errors in a TaskGroup', errors)
+            me = group_type('unhandled errors in a TaskGroup', errors)
             raise me from None

This seems to do the trick and all asyncio tests pass unmodified. Thomas' program now outputs a correct and debuggable traceback:

Click to see the better tracebaack

  + Exception Group Traceback (most recent call last):
  |   File "/Users/yury/dev/python/cpython/t.py", line 31, in <module>
  |     asyncio.run(main())
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/runners.py", line 187, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/runners.py", line 119, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/base_events.py", line 650, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/t.py", line 20, in main
  |     async with asyncio.TaskGroup() as tg:
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/taskgroups.py", line 156, in __aexit__
  |     raise me from None
  | asyncio.exceptions.CancelledErrorGroup: unhandled errors in a TaskGroup (3 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t.py", line 6, in child
    |     with contextlib.closing(sock):
    |   File "/Users/yury/dev/python/cpython/Lib/contextlib.py", line 345, in __exit__
    |     self.thing.close()
    |     ^^^^^^^^^^^^^^^^
    | AttributeError: 'Sock' object has no attribute 'close'. Did you mean: 'aclose'?
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t.py", line 6, in child
    |     with contextlib.closing(sock):
    |   File "/Users/yury/dev/python/cpython/Lib/contextlib.py", line 345, in __exit__
    |     self.thing.close()
    |     ^^^^^^^^^^^^^^^^
    | AttributeError: 'Sock' object has no attribute 'close'. Did you mean: 'aclose'?
    +---------------- 3 ----------------
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t.py", line 25, in main
    |     await asyncio.sleep(2)
    |   File "/Users/yury/dev/python/cpython/Lib/asyncio/tasks.py", line 644, in sleep
    |     return await future
    |            ^^^^^^^^^^^^
    | asyncio.exceptions.CancelledError
    |
    | During handling of the above exception, another exception occurred:
    |
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t.py", line 24, in main
    |     async with contextlib.aclosing(sock):
    |   File "/Users/yury/dev/python/cpython/Lib/asyncio/tasks.py", line 644, in sleep
    |     return await future
    |            ^^^^^^^^^^^^
    | asyncio.exceptions.CancelledError
    +------------------------------------

And the old-style code that would expect a CancelledError should also be happy.

Here's the full diff that I have with a couple more fixes & nits.

diff --git a/Lib/asyncio/exceptions.py b/Lib/asyncio/exceptions.py
index 5ece595aad..961d039aed 100644
--- a/Lib/asyncio/exceptions.py
+++ b/Lib/asyncio/exceptions.py
@@ -2,7 +2,8 @@
 
 
 __all__ = ('BrokenBarrierError',
-           'CancelledError', 'InvalidStateError', 'TimeoutError',
+           'CancelledError', 'CancelledErrorGroup',
+           'InvalidStateError', 'TimeoutError',
            'IncompleteReadError', 'LimitOverrunError',
            'SendfileNotAvailableError')
 
@@ -11,6 +12,16 @@ class CancelledError(BaseException):
     """The Future or Task was cancelled."""
 
 
+class CancelledErrorGroup(BaseExceptionGroup, CancelledError):
+    """A group of errors containing a CancelledError."""
+
+    # This error type needed for "backwards compatibility" with code that
+    # has `except CancelledError: ...; raise`-style blocks.
+
+    def derive(self, excs):
+        return CancelledErrorGroup(self.message, excs)
+
+
 TimeoutError = TimeoutError  # make local alias for the standard exception
 
 
diff --git a/Lib/asyncio/taskgroups.py b/Lib/asyncio/taskgroups.py
index 097b4864f7..9505793b03 100644
--- a/Lib/asyncio/taskgroups.py
+++ b/Lib/asyncio/taskgroups.py
@@ -55,29 +55,35 @@ async def __aenter__(self):
     async def __aexit__(self, et, exc, tb):
         self._exiting = True
 
-        if (exc is not None and
+        # True if the context manager is aborting with an error.
+        erroring_out = et is not None
+
+        if (erroring_out and
                 self._is_base_error(exc) and
                 self._base_error is None):
             self._base_error = exc
 
-        propagate_cancellation_error = \
-            exc if et is exceptions.CancelledError else None
+        propagate_cancellation_error = None
+        if et is not None and issubclass(et, exceptions.CancelledError):
+            propagate_cancellation_error = exc
+
         if self._parent_cancel_requested:
-            # If this flag is set we *must* call uncancel().
+            # If this flag is set it means that this TaskGroup requested
+            # the parent task to be cancelled. We now *must* call uncancel().
             if self._parent_task.uncancel() == 0:
                 # If there are no pending cancellations left,
                 # don't propagate CancelledError.
                 propagate_cancellation_error = None
 
-        if et is not None:
+        if erroring_out:
             if not self._aborting:
-                # Our parent task is being cancelled:
+                # Either the parent task is being cancelled:
                 #
                 #    async with TaskGroup() as g:
                 #        g.create_task(...)
-                #        await ...  # <- CancelledError
+                #        await ...  # <- errors out with CancelledError
                 #
-                # or there's an exception in "async with":
+                # or there's an exception the "async with" block:
                 #
                 #    async with TaskGroup() as g:
                 #        g.create_task(...)
@@ -97,7 +103,7 @@ async def __aexit__(self, et, exc, tb):
                 await self._on_completed_fut
             except exceptions.CancelledError as ex:
                 if not self._aborting:
-                    # Our parent task is being cancelled:
+                    # The parent task is being cancelled:
                     #
                     #    async def wrapper():
                     #        async with TaskGroup() as g:
@@ -113,16 +119,31 @@ async def __aexit__(self, et, exc, tb):
         assert not self._tasks
 
         if self._base_error is not None:
-            raise self._base_error
+            # asyncio currently can't handle SystemExit or KeyboardIterrupt
+            # without corrupting its state. In an event of such an error
+            # we simply immediately propagate it ignoring other errors
+            # (but we still want to at least log them).
+            try:
+                for error in self._errors:
+                    if isinstance(error, exceptions.CancelledError):
+                        continue
+                    cause = type(self._base_error).__name__
+                    self._loop.call_exception_handler({
+                        'message': f'ignoring an error due to {cause}',
+                        'exception': error,
+                        'task': self._parent_task,
+                    })
+            finally:
+                raise self._base_error
+
+        group_type = BaseExceptionGroup
 
         if propagate_cancellation_error is not None:
-            # The wrapping task was cancelled; since we're done with
-            # closing all child tasks, just propagate the cancellation
-            # request now.
-            raise propagate_cancellation_error
-
-        if et is not None and et is not exceptions.CancelledError:
-            self._errors.append(exc)
+            self._errors.append(propagate_cancellation_error)
+            group_type = exceptions.CancelledErrorGroup
+        else:
+            if erroring_out and not issubclass(et, exceptions.CancelledError):
+                self._errors.append(exc)
 
         if self._errors:
             # Exceptions are heavy objects that can have object
@@ -131,7 +152,7 @@ async def __aexit__(self, et, exc, tb):
             errors = self._errors
             self._errors = None
 
-            me = BaseExceptionGroup('unhandled errors in a TaskGroup', errors)
+            me = group_type('unhandled errors in a TaskGroup', errors)
             raise me from None
 
     def create_task(self, coro, *, name=None, context=None):
@@ -195,10 +216,10 @@ def _on_task_done(self, task):
             return
 
         if not self._aborting and not self._parent_cancel_requested:
-            # If parent task *is not* being cancelled, it means that we want
-            # to manually cancel it to abort whatever is being run right now
-            # in the TaskGroup.  But we want to mark parent task as
-            # "not cancelled" later in __aexit__.  Example situation that
+            # If the parent task *is not* being cancelled, it means that we
+            # want to manually cancel it, to abort whatever is being currently
+            # awaited in the TaskGroup.  But we want to mark the parent task as
+            # "not cancelled" later in __aexit__.  An example situation that
             # we need to handle:
             #
             #    async def foo():
@@ -206,13 +227,14 @@ def _on_task_done(self, task):
             #            async with TaskGroup() as g:
             #                g.create_task(crash_soon())
             #                await something  # <- this needs to be canceled
-            #                                 #    by the TaskGroup, e.g.
+            #                                 #    by the TaskGroup, i.e.
             #                                 #    foo() needs to be cancelled
             #        except Exception:
             #            # Ignore any exceptions raised in the TaskGroup
             #            pass
             #        await something_else     # this line has to be called
-            #                                 # after TaskGroup is finished.
+            #                                 # after the TaskGroup is
+            #                                 # finished.
             self._abort()
             self._parent_cancel_requested = True
             self._parent_task.cancel()

@gvanrossum Guido I'm happy to push to your branch or make a new one with this diff if you want me to. I think it would need a couple new tests (and CancelledErrorGroup.derive looks wrong to me)

graingert · 2022-08-09T07:06:09Z

The key thing we should probably keep is supporting except CancelledError blocks of code that some libraries can have.

I don't think you actually need to support this usecase. Because the only way to end up in this situation is when you use the new TaskGroup code, don't handle the grouped exceptions, and want to catch a single CancelledError only.

Consider what would happen if someone replaced their asyncio.gather calls with TaskGroup: In this situation with the old style when gather is cancelled if any of its child task throws an error it replaces the CancelledError entirely which means the except CancelledError: block wouldn't run anyway.

If TaskGroup now takes special effort to re-raise a bare CancelledError when faced with additional errors from crashing tasks it would actually break existing code by reaching except CancelledError: clauses that were previously unreachable

import asyncio
import contextlib

class Sock:
    async def aclose():
        pass

async def foo():
    with contextlib.closing(Sock()):
        await asyncio.sleep(1)
        print("hello")


async def demo():
    task = asyncio.current_task()
    task.get_loop().call_later(0.1, task.cancel)
    try:
        await asyncio.gather(foo(), foo())
    except asyncio.CancelledError:
        pass # the intent here is to only reach it if asyncio.gather was cancelled cleanly


asyncio.run(demo())

1st1 · 2022-08-09T14:40:40Z

Because the only way to end up in this situation is when you use the new TaskGroup code, don't handle the grouped exceptions, and want to catch a single CancelledError only.

That's not the only way. You can have a library that takes an arbitrary task/coroutine and awaits on it, reacting specifically to CancelledError and re-raising it immediately. It's those scenarios that I care about.

If TaskGroup now takes special effort to re-raise a bare CancelledError when faced with additional errors from cashing tasks it would actually break existing code by reaching except CancelledError: clauses that were previously unreachable

Perfect compatibility with asyncio.gather() with never a design goal. As a user I would expect your example code to actually propagate a CancelledError (amongst others) when used with a TaskGroup.

Also "break existing code" isn't an argument here, because a user would be switching from old API to new API that's not declared as a drop-in replacement.

graingert · 2022-08-09T15:11:09Z

That's not the only way. You can have a library that takes an arbitrary task/coroutine and awaits on it, reacting specifically to CancelledError and re-raising it immediately. It's those scenarios that I care about.

so this would be something like this?

async def some_library(async_fn):
    try:
        await async_fn()
    except asyncio.CancelledError:
        raise
    except BaseException:
        # accidentally catching a BaseExceptionGroup('unhandled errors in a TaskGroup', [asyncio.CancelledError()])
        await cleanup()
        raise

As a user I would expect your example code to actually propagate a CancelledError (amongst others) when used with a TaskGroup.

yeah that's the behavior I'm suggesting here

1st1 · 2022-08-09T22:54:52Z

so this would be something like this?

Yeah, along the lines of this.

FWIW I don't want to push hard for the approach I suggested in #95761 (comment) -- the CancelledErrorGroup error feels hacky. I'd be OK if we just say: (a) manually handling CancelledError is an anti-pattern, and (b) we don't want to make that easier for the existing or for the future code that uses regular (non-group) try..except. In which case we can simplify the approach and use a regular BaseExceptionGroup instead of CancelledErrorGroup.

I do feel strongly about including all errors into the group though--my own opinion is that that's key for the usability of asyncio. I always viewed exception groups as a mechanism to deliver the cancellation signal without hiding other errors and IMO we should use the mechanism fully.

gvanrossum · 2022-08-10T04:36:29Z

We've established one thing clearly: even though there were several tests that seemed to be written to ignore "side errors" (let's call them that) accompanying a cancellation, we shouldn't assume that's the best design, and it's not too late to do something better -- in particular, we should not ignore side errors.

We're past RC1, and I don't think that a bug in the cancel behavior of the brand new TaskGroup and timeout async context managers is worth disrupting the release cycle. Therefore I'd like to do something smaller that we can consider it a bug fix when it lands in 3.11.1. I don't want to wait for 3.12 because by the time that rolls around, the existing 3.11 behavior will be set in stone. But 3.11.1 (which I expect arund two months after 3.11) seems soon enough.

My proposal is to keep the PR as it is, except remove the self._parent_cancel_requested or part from the condition for propagating the CancelledError, and fix the tests that this breaks (I think 15 and 16 -- I added that condition specifically to make those pass).

Now, this will mean that code (old or new) using except CancelledError: will miss a cancellation when one of the tasks crashes during cleanup. I think that's a reasonable price to pay -- for example, if we write

async with asyncio.timeout(1):
    try:
        await asyncio.sleep(2)  # Will be cancelled by the timeout
    finally:
        1/0  # oops, crash in cleanup

this will report ZeroDivisionError, not TimeoutError. So if we move the try/except block to a task and wait for it using a TaskGroup and then wrap that in a timeout(1), it's reasonable that we get the same effect (but wrapped in an ExceptionGroup).

Note that a TaskGroup wrapping another TaskGroup will still call uncancel() when needed, due to #95602 (landed 6 days ago, just before RC1).

I considered proposing adding the CancelledError to the ExceptionGroup and just another exception, but that just complicates the logic, and I think it's a cleaner rule that CancelledError is either propagated singly, or not at all.

Any scenario where a task crashes in cleanup upon cancellation will now result in an ExceptionGroup wrapping the crash(es) instead of propagating CancelledError and ignoring the side errors.

gvanrossum · 2022-08-10T20:04:42Z

Why do you think it's cleaner to only propagate CancelledError alone? IMO it's totally OK to have it propagated in a group with other errors. I don't see the downsides of that or the complications (we don't have to implement my idea of CancelledErrorGroup, fwiw).

Remember that when we designed exception groups, we decided that changing any API from raising FooError() to raising ExceptionGroup("msg", [FooError()]) a backwards incompatible API change?

When we cancel a task (or a future -- remember tasks are futures), and the task handles the cancellation cleanly, then we expect waiting for the task to raise CancelledError. If under certain circumstances it will now instead an ExceptionGroup wrapping a CancelledError that would be an API change. OTOH, if the task crashes during cleanup, other errors can be raised, so that is not a new outcome.

I looked at the timeout() code, and it checks for CancelledError in a way that would not work for an EG wrapping a CE:

            if self._task.uncancel() == 0 and exc_type is exceptions.CancelledError:

I must assume there's plenty of careful cancellation handling code out there that similarly needs CE to be not wrapped.

Anyway, this PR is updated, please have a look.

1st1 · 2022-08-12T04:50:46Z

Remember that when we designed exception groups, we decided that changing any API from raising FooError() to raising ExceptionGroup("msg", [FooError()]) a backwards incompatible API change?

Yes, I agree. Although TaskGroups is a new API, so we can tweak it within certain limits.

When we cancel a task (or a future -- remember tasks are futures), and the task handles the cancellation cleanly, then we expect waiting for the task to raise CancelledError. If under certain circumstances it will now instead an ExceptionGroup wrapping a CancelledError that would be an API change.

Yes. Unless we implement what I'm suggesting in #95761 (comment) -- to make a special EG type that is a subclass of CancelledError.

It's really quite elegant -- if an error group contains a CancelledError it itself becomes a CancelledError.

if self._task.uncancel() == 0 and exc_type is exceptions.CancelledError:

This can be fixed to work with my proposal - we just need to replace is with issubclass. Using is for checks like this is an anti-pattern anyways (we never documented CancelledError as a type that must not be subclassed).

I'm approving this PR because it's better to land something than nothing in this case.

But I think that its suboptimal in its design because it obstructs the cancellation in certain cases and can make things very hard to understand.

Consider the traceback of the Thomas' script. With this PR:

child task got error: type(e)=<class 'AttributeError'> e=AttributeError("'Sock' object has no attribute 'close'")
child task got error: type(e)=<class 'AttributeError'> e=AttributeError("'Sock' object has no attribute 'close'")
handled attribute error

With the approach I'm suggesting in #95761 (comment):

child task got error: type(e)=<class 'AttributeError'> e=AttributeError("'Sock' object has no attribute 'close'")
child task got error: type(e)=<class 'AttributeError'> e=AttributeError("'Sock' object has no attribute 'close'")
handled attribute error
  + Exception Group Traceback (most recent call last):
  |   File "/Users/yury/dev/python/cpython/t3.py", line 30, in <module>
  |     asyncio.run(main())
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/runners.py", line 187, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/runners.py", line 119, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/base_events.py", line 650, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/t3.py", line 20, in main
  |     async with asyncio.TaskGroup() as tg:
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/taskgroups.py", line 157, in __aexit__
  |     raise me from None
  | asyncio.exceptions.CancelledErrorGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t3.py", line 25, in main
    |     await asyncio.sleep(2)
    |   File "/Users/yury/dev/python/cpython/Lib/asyncio/tasks.py", line 644, in sleep
    |     return await future
    |            ^^^^^^^^^^^^
    | asyncio.exceptions.CancelledError
    |
    | During handling of the above exception, another exception occurred:
    |
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t3.py", line 24, in main
    |     async with contextlib.aclosing(sock):
    |   File "/Users/yury/dev/python/cpython/Lib/asyncio/tasks.py", line 644, in sleep
    |     return await future
    |            ^^^^^^^^^^^^
    | asyncio.exceptions.CancelledError
    +------------------------------------

Guido, I'm curious what you think specifically about my idea.

That said this PR is still a huge improvement over what's in the main branch. I'm OK to land it now and continue the discussion if you are open to it.

graingert · 2022-08-12T05:43:07Z

@1st1 can you make a draft PR for your CancelledError as an ExceptionGroup change? I suspect it will end up hiding exceptions again in async with timeout() and I think changing EG semantics would actually be easier

gvanrossum · 2022-08-12T16:47:27Z

When we cancel a task (or a future -- remember tasks are futures), and the task handles the cancellation cleanly, then we expect waiting for the task to raise CancelledError. If under certain circumstances it will now instead an ExceptionGroup wrapping a CancelledError that would be an API change.

Yes. Unless we implement what I'm suggesting in #95761 (comment) -- to make a special EG type that is a subclass of CancelledError.

It's really quite elegant -- if an error group contains a CancelledError it itself becomes a CancelledError.

I think it will be full of surprises. I did a few experiments, and it seems that if you have a a subclass of SomeException and ExceptionGroup, and you raise one of those, if you write except* SomeException: then it will catch just that -- other except* blocks that match one or more of the exceptions in the group will not trigger.

(Even if we were to decide that that as a bug in the exception group implementation, it would be too late to fix in 3.11.)

if self._task.uncancel() == 0 and exc_type is exceptions.CancelledError:

This can be fixed to work with my proposal - we just need to replace is with issubclass. Using is for checks like this is an anti-pattern anyways (we never documented CancelledError as a type that must not be subclassed).

It may be an antipattern, but the taskgroup code I inherited from you also uses this, and I suspect it's a common pattern, because people don't expect CancelledError to be subclassed.

In addition, in the scenario where we bubble up the CancelledError plus some other "side exceptions", it would not be okay for timeout() to recognize the CancelledError and raise TimeoutError, dropping the side errors on the floor. The same would be true for other scenarios where CancelledError is being handled.

So I think that, alas, your idea is not going to work, and we're better of with my "conservative" approach which replaces the CancelledError with an exception group bundling the side errors (if there are any).

I'm approving this PR because it's better to land something than nothing in this case.

Thanks -- I will give it a little more time but not too much since RC2 is already looming.

graingert · 2022-08-12T17:05:09Z

It may be an antipattern, but the taskgroup code I inherited from you also uses this, and I suspect it's a common pattern, because people don't expect CancelledError to be subclassed.

I believe this anti-pattern comes from people trying to emulate the except T as e: vs isinstance(e, T) differences exposed to implementers of __exit__ https://discuss.python.org/t/expose-pyerr-givenexceptionmatches-in-inspect/17420/4

graingert · 2022-08-12T17:06:29Z

(Even if we were to decide that that as a bug in the exception group implementation, it would be too late to fix in 3.11.)

As long as the door is open for a fix in 3.12, 3.11.1? We can fix it in the backport, people are going to need to use the backport until Oct 2026 already hopefully they're happy to wait till October 2027

graingert · 2022-08-12T17:10:46Z

What Trio does is not necessarily what we should do, they have different constraints (its MultiError or whatever behaves differently than ExceptionGroup).

We're porting to EG already so I'm not sure what you mean by this

gvanrossum · 2022-08-12T17:15:18Z

It may be an antipattern, but the taskgroup code I inherited from you also uses this, and I suspect it's a common pattern, because people don't expect CancelledError to be subclassed.

I believe this anti-pattern comes from people trying to emulate the except T as e: vs isinstance(e, T) differences exposed to implementers of __exit__ https://discuss.python.org/t/expose-pyerr-givenexceptionmatches-in-inspect/17420/4

And what is in your opinion the difference between except T as e: vs isinstance(e, T)? There are differences related to virtual base classes but nothing that would lead people to write type(e) is T. The example by Steven D'Aprano is also irrelevant (it tries to do something clever about raise T vs. raise T() but that is also irrelevant to this discussion.

gvanrossum · 2022-08-12T17:20:07Z

(Even if we were to decide that that as a bug in the exception group implementation, it would be too late to fix in 3.11.)

As long as the door is open for a fix in 3.12, 3.11.1?

That depends on whether it is backwards compatible. And whether we actually think it is a useful feature. I wouldn't get my hopes up.

We can fix it in the backport, people are going to need to use the backport until Oct 2026 already hopefully they're happy to wait till October 2027

And by "the backport" you mean the emulation on PyPI, right? That seems of limited relevance given that we're talking about the semantics of except* here (and in particular the bit of semantics that allows us to write except* FooError and catch a bare FooError, i.e. one that isn't wrapped in an ExceptionGroup).

My personal opinion: give up on this idea, it is more complicated than you think.

gvanrossum · 2022-08-12T17:23:30Z

What Trio does is not necessarily what we should do, they have different constraints (its MultiError or whatever behaves differently than ExceptionGroup).

We're porting to EG already so I'm not sure what you mean by this

IIRC this was when we were still deciding whether to throw away "side errors" or not. That has been decided (we don't), so I see no reason to further litigate this.

graingert · 2022-08-12T17:24:14Z

I'm talking about with exceptiongroup.catch(): where we have the same flexibility as except*

graingert · 2022-08-12T17:34:44Z

The example by Steven D'Aprano is also irrelevant (it tries to do something clever about raise T vs. raise T() but that is also irrelevant to this discussion.

🤦 I meant to link to the top of the discussion, but scrolling discourse changes the URL in the browser. It is indeed totally irrelevant

gvanrossum · 2022-08-12T18:24:57Z

@graingert Please focus. Do you agree with Yury that the current PR is an improvement over what’s in main? Do you have any specific feedback on the PR?

(Do any others still reading have feedback? @Tinche ?)

Tinche

This LGTM.

I was warming up to the cleverness of Yury's idea but if you think there be dragons I'm convinced.

gvanrossum · 2022-08-17T16:32:52Z

@pablogsal told me that he's open to backporting this to 3.11, so I'll go through the movements.

miss-islington · 2022-08-17T16:33:24Z

Thanks @gvanrossum for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

bedevere-bot · 2022-08-17T16:33:33Z

GH-96048 is a backport of this pull request to the 3.11 branch.

…pythonGH-95761) When a task catches CancelledError and raises some other error, the other error should not silently be suppressed. Any scenario where a task crashes in cleanup upon cancellation will now result in an ExceptionGroup wrapping the crash(es) instead of propagating CancelledError and ignoring the side errors. NOTE: This represents a change in behavior (hence the need to change several tests). But it is only an edge case. Co-authored-by: Thomas Grainger <[email protected]> (cherry picked from commit f51f54f) Co-authored-by: Guido van Rossum <[email protected]>

…5761) When a task catches CancelledError and raises some other error, the other error should not silently be suppressed. Any scenario where a task crashes in cleanup upon cancellation will now result in an ExceptionGroup wrapping the crash(es) instead of propagating CancelledError and ignoring the side errors. NOTE: This represents a change in behavior (hence the need to change several tests). But it is only an edge case. Co-authored-by: Thomas Grainger <[email protected]> (cherry picked from commit f51f54f) Co-authored-by: Guido van Rossum <[email protected]>

bedevere-bot added the awaiting core review label Aug 7, 2022

gvanrossum requested a review from graingert August 7, 2022 15:42

gvanrossum marked this pull request as ready for review August 7, 2022 15:42

gvanrossum requested review from 1st1 and asvetlov as code owners August 7, 2022 15:42

Tinche suggested changes Aug 7, 2022

View reviewed changes

graingert mentioned this pull request Aug 7, 2022

rename asyncio tests so they have readable names by adding a suffix to them #95772

Closed

blurb-it bot and others added 2 commits August 8, 2022 01:42

📜🤖 Added by blurb_it.

afeab71

Improve comment explaining when/why we propagate cancellations

4698b4b

graingert reviewed Aug 8, 2022

View reviewed changes

Misc/NEWS.d/next/Library/2022-08-08-01-42-11.gh-issue-95704.MOPFfX.rst Outdated Show resolved Hide resolved

Add markup to NEWS file

6763dda

Co-authored-by: Thomas Grainger <[email protected]>

Never ignore side errors

79a4972

Any scenario where a task crashes in cleanup upon cancellation will now result in an ExceptionGroup wrapping the crash(es) instead of propagating CancelledError and ignoring the side errors.

1st1 approved these changes Aug 12, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Aug 12, 2022

Tinche approved these changes Aug 12, 2022

View reviewed changes

Remove blank line from middle of blurb

63a1956

gvanrossum mentioned this pull request Aug 17, 2022

TimeoutError not raised by 'timeout' context manager when "side errors" occur #96037

Closed

gvanrossum merged commit f51f54f into python:main Aug 17, 2022

bedevere-bot removed the awaiting merge label Aug 17, 2022

gvanrossum deleted the fix-taskgroup-fail-on-cancel branch August 17, 2022 01:23

gvanrossum added the needs backport to 3.11 only security fixes label Aug 17, 2022

bedevere-bot removed the needs backport to 3.11 only security fixes label Aug 17, 2022

Uh oh!

GH-95704: Don't suppress errors from tasks when TG is cancelled #95761

GH-95704: Don't suppress errors from tasks when TG is cancelled #95761

Uh oh!

Conversation

gvanrossum commented Aug 7, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Aug 7, 2022

Uh oh!

Tinche left a comment

Choose a reason for hiding this comment

Uh oh!

gvanrossum commented Aug 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Aug 8, 2022

Uh oh!

graingert Aug 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gvanrossum Aug 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Tinche commented Aug 8, 2022

Uh oh!

gvanrossum commented Aug 8, 2022

Uh oh!

graingert commented Aug 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Aug 8, 2022

Uh oh!

graingert commented Aug 8, 2022

Uh oh!

Tinche commented Aug 8, 2022

Uh oh!

graingert commented Aug 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Aug 8, 2022

Uh oh!

1st1 commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

graingert commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

1st1 commented Aug 9, 2022

Uh oh!

graingert commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

1st1 commented Aug 9, 2022

Uh oh!

gvanrossum commented Aug 10, 2022

Uh oh!

gvanrossum commented Aug 10, 2022

Uh oh!

1st1 commented Aug 12, 2022

Uh oh!

graingert commented Aug 12, 2022

Uh oh!

gvanrossum commented Aug 12, 2022

Uh oh!

graingert commented Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

graingert commented Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

graingert commented Aug 12, 2022

Uh oh!

gvanrossum commented Aug 12, 2022

Uh oh!

gvanrossum commented Aug 12, 2022

Uh oh!

gvanrossum commented Aug 12, 2022

gvanrossum commented Aug 7, 2022 •

edited by bedevere-bot

Loading

gvanrossum commented Aug 8, 2022 •

edited

Loading

graingert Aug 8, 2022 •

edited

Loading

graingert commented Aug 8, 2022 •

edited

Loading

graingert commented Aug 8, 2022 •

edited

Loading

1st1 commented Aug 9, 2022 •

edited

Loading

graingert commented Aug 9, 2022 •

edited

Loading

graingert commented Aug 9, 2022 •

edited

Loading

graingert commented Aug 12, 2022 •

edited

Loading

graingert commented Aug 12, 2022 •

edited

Loading

graingert commented Aug 12, 2022 •

edited

Loading

gvanrossum commented Aug 12, 2022 •

edited

Loading