Skip to content

Safe synchronous cancellation in asyncio #103486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
evgeny-osipenko opened this issue Apr 12, 2023 · 17 comments
Open

Safe synchronous cancellation in asyncio #103486

evgeny-osipenko opened this issue Apr 12, 2023 · 17 comments
Labels
stdlib Python modules in the Lib dir topic-asyncio type-feature A feature request or enhancement

Comments

@evgeny-osipenko
Copy link

evgeny-osipenko commented Apr 12, 2023

Feature or enhancement

I propose adding a function to asyncio to cancel a task and then safely wait for the cancellation to complete.

The main difficulty with this is deciding whether to swallow or re-raise the CancelledError once we catch one. It we call the task that we're waiting on "dependency", and the task doing the waiting "controller", then there are two distinct possibilities:

  1. The "dependency" has finished its cleanup, passed its own CancelledError all the way up its own stack, and then further up through the await in the "controller". In this case, the CancelledError should be swallowed, and the "controller" can continue its work normally.
  2. The "controller" itself was cancelled, while being on the await. In this case, the CancelledError is the signal to cancel the "controller" itself, and should be either re-raised further up, or its swallowing should be accompanied by a call to uncancel.

The documentation for asyncio.Task.cancel, in fact, does not make this decision correctly, and thus would be a bug. Copying the example, and changing the names to match this issue's terminology:

async def dependency():
    print('dependency(): start')
    try:
        await asyncio.sleep(3600) # a long or infinite loop
    except asyncio.CancelledError:
        print('dependency(): cancel')
        raise
    finally:
        print('dependency(): cleanup')

async def controller():
    dependency_task = asyncio.create_task(dependency())
    await asyncio.sleep(1)
    dependency_task.cancel()
    try:
        await dependency_task # controller itself may be cancelled at this moment
    except asyncio.CancelledError:
        print("controller(): dependency is cancelled now")
        # CancelledError swallowed unconditionally

If I'm not missing anything, the correct procedure would look like this:

async def controller():
    dependency_task = asyncio.create_task(dependency())
    await asyncio.sleep(1)
    dependency_task.cancel()
    try:
        await dependency_task
    except asyncio.CancelledError:
        print("controller(): dependency is cancelled now")
        if asyncio.current_task().cancelling() > 0:
            raise

Thus, I propose to make these changes:

  1. Introduce a function to asyncio or Task:

    async def cancel_and_wait(task, msg=None):
        task.cancel(msg)
        try:
            await task
        except asyncio.CancelledError:
            if asyncio.current_task().cancelling() == 0:
                raise
            else:
                return # this is the only non-exceptional return
        else:
            raise RuntimeError("Cancelled task did not end with an exception")

    Having a specialized function would reduce the possibility of someone making this mistake in their code (like the author of the example probably did :) ), and allow the implementation to be changed or improved in the future. One such possible enhancement, for example, could be adding a repeat parameter to instruct the function, in case the task uncancels itself, to keep cancelling it again in a loop.

  2. In the documentation for asyncio, add a warning for this kind of mistake, in the "Task Cancellation" section or in the description of asyncio.Task.cancel.

  3. Change the code example for asyncio.Task.cancel to account for cancellation of main. I know that, in this specific snippet, it is impossible for main itself to be cancelled; but a developer unsuspecting of this issue may copy this example into a situation where the controller is indeed cancellable, and end up with a bug in their code.

@evgeny-osipenko evgeny-osipenko added the type-feature A feature request or enhancement label Apr 12, 2023
@github-project-automation github-project-automation bot moved this to Todo in asyncio Apr 13, 2023
@gvanrossum
Copy link
Member

This feels very similar to gh-102847, which was just closed. Since that has much more discussion we should probably keep that one open instead?

@achimnol
Copy link
Contributor

achimnol commented Apr 14, 2023

I think this issue shares some contexts with gh-102847, but targets more specific use case: safe synchronous cancellation. Could we reopen it, then it may be better to collect the discussion there. I'd love to see some improvements on the "canceller" side APIs and ergonomics as the OP says.

@gvanrossum
Copy link
Member

I don't think that adding a new API is going to help much. Realistic use cases of cancellation (e.g. as used in task groups and the timeout context manager) wouldn't be able to use it. Most use cases are better off using something like timeout anyway.

You may have made a case for updating the example, but I resent the suggestion that the author of the example "probably made a mistake". As you mention, in the example there is no way main() itself can be cancelled. But if you submit a PR that updates the example and explains (in not too many words) why the cancelling() check is needed it will probably be accepted. Alternatively, you could update the docs for cancelling() to include the improved example, and just plant a link there in the text accompanying the basic cancel() example. If you go that way, the cancel_me() function should probably be reduced to just await sleep(3600), since the rest of the code is just there to demonstrate the execution flow through a coroutine that catches CancelledError. (That's also true for main() BTW -- so it's probably better to update the cancelling() docs than the cancel() docs.)

@evgeny-osipenko
Copy link
Author

Let me describe the context where I encountered this question.

I am building a system composed of multiple services that run in parallel in separate Python instances. To provide synchronization between the services where necessary, I use a shared database with a "locks" table, where the lock's name field, constructed from a resource's global id, is covered with a unique index. When a service takes a lock, it tries to create a row in this table, and to release the lock, the row is deleted.

In addition, each lock is equipped with a watchdog timer, so that if a service bugs out, crashes or "forgets" to unlock the resource in any other way, the lock would time out by itself. When a service genuinely needs to hold the lock for a long time, however, it can periodically reset the watchdog, thus preventing it from timing out for as long as the service is actually functional.

And so, what I wanted to have, is a context manager that takes care of both creating/removing the row and resetting the watchdog timer, by spawning an asyncio task for the duration of the "managed" code. An important caveat is that the "resetter" task must be stopped and joined before the row is removed; because, if the "resetter" discovers that the row has been removed and there is nothing to update anymore, it's treated as if the lock has timed out, the resource may already be in the middle of another process, and it's no longer safe to continue, so it raises an exception and interrupts the parent task.

In the end, though, I implemented this behavior by constructing a TaskGroup from within the lock manager, calling its __aenter__ and __aexit__ manually, and putting the "resetter" task under its control:

class LockConflict(RuntimeError):
    pass

class LockLost(RuntimeError):
    pass

class Lock:
    def __init__(self, name):
        self.name = name
        self.nonce = random.randbytes(8)
        self.task_group = None
        self.resetter_task = None

    async def take(self):
        created_successfully = await db.transaction(
            lambda c: db_try_create_lock(c, self.nonce, self.name),
        )
        if not created_successfully:
            raise LockConflict(
                f"Lock {self.name} is already taken. Are you trying to run "
                f"multiple instances of a non-multiplicable service at once?"
            )

    async def touch(self):
        updated_successfully = await db.transaction(
            lambda c: db_update_lock_watchdog(c, self.nonce, self.name),
        )
        if not updated_successfully:
            raise LockLost(f"Lock {self.name} timed out")

    async def release(self):
        await db.transaction(
            lambda c: db_release_lock(c, self.nonce, self.name),
        )

    async def async_touch_loop(self):
        while True:
            await asyncio.sleep(settings.service_watchdog_timer_seconds // 2)
            await self.touch()

    async def __aenter__(self):
        if self.task is not None:
            raise RuntimeError('Lock context manager entered twice')
        try:
            await self.take()
            self.task_group = await asyncio.TaskGroup().__aenter__()
            self.resetter_task = self.task_group.create_task(self.async_touch_loop())
            return self
        except:
            # handle KeyboardInterrupt during initialization
            await self.release()
            raise

    async def __aexit__(self, *exc_info):
        try:
            self.resetter_task.cancel()
            await self.task_group.__aexit__(None, None, None)
        finally:
            await self.release()

# example usage:
async with Lock(f"record:{my_record.id}"):
    await process_record(my_record)

So, in the end, it might indeed be better to just add a note to the cancel example saying stuff like "check cancelling before swallowing CancelledError" and "actually, just use a TaskGroup, it already has all that covered for you".

@achimnol
Copy link
Contributor

achimnol commented May 1, 2023

The same suggestion from another person: https://discuss.python.org/t/asyncio-cancel-a-cancellation-utility-as-a-coroutine-this-time-with-feeling/26304

@fried
Copy link
Contributor

fried commented May 1, 2023

My talk at pycon.us 2022 https://youtu.be/XW7yv6HuWTE brought up this as an issue I see a lot at Meta. People screwing up cancelling tasks. I even wrote my own utility into the later: https://github.com/facebookincubator/later/blob/main/later/task.py#L59

Though looking at my implementation you can't cancel it if the task its trying to cancel won't finish :P So yeah this is a work in progress. Would be nice to have this in the stdlib. And I can't just tell developers to use taskgroups especially if they are interfacing with 3rd party code.

@gvanrossum
Copy link
Member

I suspect this will need a PEP, since so many people have broken their heads over this (even @fried admits his solution isn't perfect. :-) See https://discuss.python.org/t/asyncio-cancel-a-cancellation-utility-as-a-coroutine-this-time-with-feeling/26304/3 (I'm not sure where we should discuss this -- here on GitHub, or there on Discourse.)

@xitop
Copy link

xitop commented Jan 15, 2024

Speaking from my experience (yes, it's limited), this is an issue that should be solved while dealing with the whole task life-cycle, i.e. on a level above the asyncio, not in the asyncio.

Very closely related to cancellation are other problems I've encountered in my work: task monitoring (essential tasks may not exit or else let's shut down the whole app) and the shutdown management itself (task cancelling order, cleanup actions, etc). I have solved it "good enough" for my projects, i.e. far from being suitable for general use, but that work is the reason I'm now quite sure that a fully fledged tool for managing task and taking care of all the details, edge-cases etc. would create a complete asyncio add-on library and this particular issue belongs there.

@arthur-tacca
Copy link

The root problem here that await task could raise either that task's exception or the current task's own one. But you're only calling it to wait till the task is done – you can check the result later by looking at task.cancelled(). In fact you probably don't even care if it ends that way (rather than e.g. catching CancelledError and returning).

So what you all really need is a way to just wait for the task to finish. Something like await task.wait_done(). Then your synchronous cancellation is two steps:

task.cancel()
await task.wait_done()

await task.wait_done() would never raise regardless of how task finished, but (like any await) would raise CancelledError if the current task is cancelled. This is safe even if the task is already cancelled or already done. await task.wait_done() would also be a lot more generally useful than something tailored to cancellation.

For the record, I don't have any need for this. This is just some drive-by API design 😊

@arthur-tacca
Copy link

arthur-tacca commented Mar 18, 2024

async def cancel_and_wait(task, msg=None):
    task.cancel(msg)
    try:
        await task
    except asyncio.CancelledError:
        if asyncio.current_task().cancelling() == 0:
            raise
        else:
            return # this is the only non-exceptional return
    else:
        raise RuntimeError("Cancelled task did not end with an exception")

Isn't the if condition the wrong way round? If asyncio.current_task().cancelling() == 0 then that means the outer task is NOT being cancelled, so the cancellation is from the inner task (await task) so you would want to just return in that case.

[edit: there was a second half to this comment that had a stupid idea which I have now deleted]

@rrevans
Copy link

rrevans commented Dec 22, 2024

So what you all really need is a way to just wait for the task to finish. Something like await task.wait_done(). Then your synchronous cancellation is two steps:

task.cancel()
await task.wait_done()

I think this would be the same as asyncio.wait:

task.cancel()
await asyncio.wait([task])

After that, it's guaranteed both task is done and the current task was not cancelled during the wait. Then the outcome can be checked with Task.cancelled(), Task.result(), or Task.exception().

@arthur-tacca
Copy link

I think this would be the same as asyncio.wait:

task.cancel()
await asyncio.wait([task])

After that, it's guaranteed both task is done and the current task was not cancelled during the wait. Then the outcome can be checked with Task.cancelled(), Task.result(), or Task.exception().

Very interesting, I think you're right. I had assumed that asyncio.wait() would re-raise whatever exception (including asyncio.CancelledError) had been raised from the task. In fairness the documentation for asyncio.wait() isn't explicit about what happens in that case; maybe all that's needed is to update those docs.

@cbornet
Copy link

cbornet commented Dec 22, 2024

So it seems a safer example to the one in the cancel() doc would be:

async def cancel_me():
    print('cancel_me(): before sleep')
    try:
        await asyncio.sleep(3600)
    except asyncio.CancelledError:
        print('cancel_me(): cancel sleep')
        raise
    finally:
        print('cancel_me(): after sleep')

async def main():
    task = asyncio.create_task(cancel_me())
    await asyncio.sleep(1)
    task.cancel()
    await asyncio.wait([task])
    if task.cancelled():
        print("main(): cancel_me is cancelled now"

asyncio.run(main())

Does that look correct ?

@rrevans
Copy link

rrevans commented Dec 23, 2024

Yes that looks correct and more robust against main being cancelled.

As to what to do about the docs: they should probably be enhanced, but I'm not sure it makes sense to dump all the complexity into the first sample? I'm also receptive to the argument main can't be cancelled in the sample code.

Maybe there should be some note after the existing sample that the plain await raising CancelledError is ambiguous when the waiter itself is subject to being cancelled? Then asyncio.wait can be introduced as a possible solution?

That explanatory text could also make mention of TaskGroup as a possible alternative. (But they're quite different: the snippet above can itself be cancelled leaving the task still running while TaskGroup context manager exit would always cancel at that point and wait until the task actually finishes deferring any cancellation of the waiter.)

@VietThan
Copy link
Contributor

Based on @cbornet comment, I've made the following implementation, is this overkill?

import asyncio
import logging

logger = logging.getLogger(__name__)

async def cancel_task(task: asyncio.Task, task_name: str):
    """
    Cancel a task and wait for the cancelling to finish.

    Args:
        task (asyncio.Task): The task to cancel.
        task_name (str): Human-readable name of the task.
    """
    task_cancelled = task.cancelled()
    task_done = task.done()
    logger.debug(
        f"{task_name = } {task_cancelled = } {task_done = }"
    )

    if task_cancelled or task_done:
        logger.debug(
            f"{task_name} task was already cancelled or done"
        )
        return

    logger.info(f"Cancelling {task_name} task.")
    task_cancelling = task.cancel()

    if not task_cancelling:
        logger.debug(
            f"{task_name} task was already cancelled or done by the time we tried to cancel it"
        )
        return

    logger.debug(
        f"{task_name} task was not cancelled yet but is cancelling now"
    )
    await asyncio.wait([task])
    logger.debug(f"Awaited cancelling of {task_name} task")

    if not task.cancelled() and not task.done():
        logger.error(
            f"{task_name} task was not cancelled or done. This is VERY UNEXPECTED.",
            stack_info=True,
        )
        return

    logger.info(f"{task_name} task is now cancelled or done")

@LukasLendvorsky
Copy link

I propose better implementation of cancel_and_wait:

async def cancel_and_wait(task: asyncio.Task, msg=None):
    # if task is already done, return immediately
    if task.done():
        return

    # After our cancellation, cancelling() should be incremented by 1. If its incremented by more than 1, it means cancel was requested externally.
    # In that case CancelledError should be raised to also end waiting task.
    cancelling_expected = task.cancelling() + 1
    task.cancel(msg)
    try:
        await task
    except asyncio.CancelledError:
        if task.cancelling() != cancelling_expected:
            raise
        else:
            return
    else:
        raise RuntimeError("Cancelled task did not end with an exception")

Original one has following flaws:

  • If task is already done when cancelling, it will not raise CancelledError and that would result in RuntimeError("Cancelled task did not end with an exception"). That is race-condition which should be avoided.
  • Using current_task().cancelling() was causing undesired behavior in some special cases. For example when the current_task is already terminating due to some other exception, CancelledError would be raised as well. Better solution is to check how many times the cancelling() incremented in the cancelled task itself. If it was incremented by more than 1, somebody else requested cancel externally, which is the actual condition you want to detect.

@picnixz picnixz added the stdlib Python modules in the Lib dir label Mar 7, 2025
@loucadufault
Copy link

loucadufault commented Mar 25, 2025

Is it safe to assume asyncio.wait_for behaves the same as asyncio.wait with respect to not re-raising the CancelledError of the awaited Task?

Edit: Answer: yes. wait_for calls timeouts.timeout which documents this behaviour:

long_running_task() is interrupted by raising asyncio.CancelledError,
the top-most affected timeout() context manager converts CancelledError
into TimeoutError.

So the example in the docs, if one wishes to allow some time for the Task to cancel before moving on:

async def cancel_me():
    print('cancel_me(): before sleep')
    try:
        await asyncio.sleep(3600)
    except asyncio.CancelledError:
        print('cancel_me(): cancel sleep')
        print('cancel_me(): async cleanup before raise')
        await asyncio.sleep(random.randint(0, 10))
        print('cancel_me(): after async cleanup') # this line may or may not be reached, depending on whether we are timed out and cancelled by the wait_for call
        raise
    finally:
        print('cancel_me(): after sleep')

async def main():
    task = asyncio.create_task(cancel_me())
    await asyncio.sleep(1)
    task.cancel()
    try:
        await asyncio.wait_for(task, 5)
    except asyncio.TimeoutError:
        pass
    finally:
        if task.cancelled():
            print("main(): cancel_me is cancelled now")

asyncio.run(main())

The shortcoming is the code does not distinguish whether the wrapped coroutine propagated the CancelledError that was thrown into it from the first cancel, or only from the second cancel after timing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-asyncio type-feature A feature request or enhancement
Projects
Status: Todo
Development

No branches or pull requests