Skip to content

"Coroutines and Tasks" page does not appear to be clear about what asyncio.create_task does #96023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ffunenga opened this issue Aug 16, 2022 · 9 comments
Labels
docs Documentation in the Doc dir topic-asyncio

Comments

@ffunenga
Copy link

In the documentation (specifically, in the "Coroutines and Tasks" page), I am having trouble understanding what happens when asyncio.create_task is called. I have read the headings Coroutines and Creating tasks, but I still don't get it.

Replicating the issue

This can be done by reading the "Coroutines and Tasks" page while trying to learn the explanations and references to the asyncio.create_task.

"Coroutines" heading

In the third point under the "three main mechanisms", the following example is provided:

async def main():
    task1 = asyncio.create_task(say_after(1, 'hello'))
    task2 = asyncio.create_task(say_after(2, 'world'))
    print(f"started at {time.strftime('%X')}")
    # Wait until both tasks are completed (should take around 2 seconds.)
    await task1
    await task2
    # (...)

My question here is: when does the say_after(2, 'world') start running? I see three options:

  1. It starts running at the line task2 = asyncio.create_task(say_after(2, 'world'))
  2. It starts running at the line await task1
  3. It starts running at the line await task2

In the third point under the "three main mechanisms", it is further explained (bold added here):

Note that expected output now shows that the snippet runs 1 second faster than before:

Why is the output "expected"? I am not finding any explanation of why this 1-sec-faster behavior is expected.

"Creating tasks" heading

In this heading, the first line of the __doc__ string for asyncio.create_task explains the following (bold added here):

Wrap the coro coroutine into a Task and schedule its execution. Return the Task object.

I don't understand what is the meaning of "schedule its execution". Does the coro get scheduled to run now or only when the program hits the next await statement?

Suggested solution

Either in the "Coroutines" heading or in the "Creating tasks" heading, inserting an explicit clarification of when the task starts running.

@AlexWaygood AlexWaygood added docs Documentation in the Doc dir topic-asyncio labels Aug 16, 2022
@ffunenga
Copy link
Author

ffunenga commented Aug 16, 2022

For further consideration: I am now seeing that an Async tutorial appears to be in the works since September 2018. This looks nice. The "Three Ways To Execute Async Functions" page in the future Asyncio tutorial seems to be a better place to clarify how the asyncio.create_task works.

(
Note to @cjrh: I have read the provisional version of "Three Ways To Execute Async Functions" page, and I am still having trouble understanding it, specifically with this paragraph:

Even though f() is called first, async function g() will finish first (5 seconds is shorter than 10 seconds), and you'll see "g is done" printed before "f is done". This is because although create_task() does schedule the given async function to be executed, it does not wait for the call to complete, unlike when the await keyword is used.

This paragraph explains create_task() by explaining what it does not do ("...it does not wait for the call to complete, unlike when the await keyword is used"). As reader, I still don't understand what create_task() does (what is the exact meaning of "...although create_task() does schedule the given async function to be executed"? Does the execution only start happening when the program hits the next await statement?). It is also unclear to me why is the return of asyncio.create_task(f()) not being kept in a variable and why is there no await statement for the return of the asyncio.create_task(f()).
)

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Sep 25, 2022

Either in the "Coroutines" heading or in the "Creating tasks" heading, inserting an explicit clarification of when the task starts running.

This is an implementation detail, when task starts running depends upon the number of iterations of loop required before it hits __step. In practice it is currently 1 iteration but is subject to change.

You should use the new TaskGroup which is easier to use.

@gvanrossum
Copy link
Member

Using TaskGroup doesn't change the confusion though.

The key thing to know is that tasks and coroutines (async functions) only run when you await them, directly or indirectly. So create_task(), since it is not a coroutine and is not invoked using await, does not run anything, it just creates a data structure that tells the event loop there is a new task that it should run.

The next thing to know is that await does not automatically run the event loop. It only does so if the thing being awaited (directly or indirectly) ends up actually waiting (blocking) for I/O or for a timer (e.g. asyncio.sleep(0), as a special case). So any tasks created aren't running until the event loop regains control, and that is definitely not before the next await but it could be later if that await ends up producing a result without "blocking".

I'm not sure that the docs make this clear enough, someone should have a look and then report back here or submit a PR.

@alexpovel
Copy link
Contributor

Thanks @gvanrossum , those insights I hadn't seen elsewhere yet, very helpful.

It's an important distinction to make: create_task returns a Task, but that task is cold, to borrow from C# parlance (archived link):

Tasks that are created by the public Task constructors are referred to as cold tasks, because they begin their life cycle in the non-scheduled Created state and are scheduled only when Start is called on these instances.

...whereas all other...

[...] tasks begin their life cycle in a hot state, which means that the asynchronous operations they represent have already been initiated

which is the usual case in C# async code. So create_task is kind of like using C#'s Task constructors directly? It's a more low-level operation, where the user is then responsible for actually managing that task. I suppose in Python we do not have the concept of "starting", only awaiting it. A consequence is that in C#, a previously started Task can turn out to be already done when it's merely awaited later; it can then return synchronously with its result. Would it be right to say that cannot happen in Python, as coroutines/Tasks aren't even scheduled to run until the first await is hit? Calling an async function without await returns us a Coroutine object, which is, as you say, just a data structure.

Is the difference rooted in .NET being natively multi-threaded, so it can spin up those "background jobs" trivially? Python doesn't have that luxury (GIL). It (the event loop) has to patiently wait until control is yielded back to it cooperatively, as control (in the sense of "instruction pointer"/"program counter") can only ever be in one place at a time. C# async is also based around coroutines though, so that part seems surprisingly similar.

Hope I'm not talking rubbish here.


As for the linked guide, it seems outdated. It claims:

Even though f() is called first, async function g() will finish first (5 seconds is shorter than 10 seconds), and you'll see "g is done" printed before "f is done". This is because although create_task() does schedule the given async function to be executed, it does not wait for the call to complete, unlike when the await keyword is used.

for the snippet:

import asyncio


async def f():
    await asyncio.sleep(10)
    print('f is done')

async def g():
    await asyncio.sleep(5)
    print('g is done')

async def main():
    asyncio.create_task(f())  # (1)
    await g()                 # (2)

asyncio.run(main())

which, when run under Python 3.11.2, only prints g is done, and never outputs f is done at all. Makes sense, as it's not awaited. The snippet also had a syntax error and all these docs, while promising, haven't been updated in 4 years, sadly! A long time in Python async world.

But adjusting the snippet makes for a good showcase of what @gvanrossum had mentioned:

import asyncio


async def f():
    print('starting f')
    await asyncio.sleep(10)
    print('f is done')

async def g():
    print('starting g')
    await asyncio.sleep(5)
    print('g is done')

async def main():
    print('starting main')
    asyncio.create_task(f())
    print('scheduled f')
    await g()
    print('bye')

asyncio.run(main())

outputs:

starting main
scheduled f
starting g
starting f
g is done
bye

which seems deterministic on my machine. So clearly, f is never started until await g() is called, and g is even started before f (so it's LIFO).

An open question would still be the behavior when commenting out await g(). Then, the snippet prints:

starting main
scheduled f
bye
starting f

which is... interesting! So the event loop gets control just before interpreter shutdown, it seems (?). Would not have expected that.

@gvanrossum
Copy link
Member

Um, sorry, I don't think the issue tracker is the right place to educate you. You are drawing parallels to C# tasks, which is probably just confusing. If you want to have a discussion, please start a thread on discuss.python.org.

@gvanrossum gvanrossum closed this as not planned Won't fix, can't repro, duplicate, stale Aug 31, 2023
@cjrh
Copy link
Contributor

cjrh commented Sep 2, 2023

@alexpovel I wrote that now old documentation. You are correct that f is done is never printed, and this was my error. Since I made the error, I would like to correct it for you, but I have no intention of reopening this thread.

It was correct about f being started implicitly (that create_task() does indeed schedule the returned coroutine on the event loop), but it was incorrect saying that f is done would be printed. In fact, f() does start running, but the error I made when putting this simple example together was in thinking that the asyncio.run() call would wait for pending coroutines to complete before returning. This is not what happens. Instead, run() will cancel any still-running coroutines once main() returns.

You can verify this by modifying the example slightly:

import asyncio

async def f():
    print('start running f')
    try:
        await asyncio.sleep(10)
    except asyncio.CancelledError:
        print('f was cancelled')
        raise
    print('f is done')

async def g():
    await asyncio.sleep(5)
    print('g is done')

async def main():
    asyncio.create_task(f())  # (1)
    print('about to start g')
    await g()                 # (2)

asyncio.run(main())

This is the output:

$ python3.11 main.py 
about to start g
start running f
g is done
f was cancelled

You can see that f does start running, but it never completes because of the cancellation after g (and therefore main) is complete.

@gvanrossum
Copy link
Member

@cjrh (or anyone still reading this): If the docs are actually incorrect, could you send a PR to fix it? (Or if it was already fixed, indicate here by which PR.)

@cjrh
Copy link
Contributor

cjrh commented Sep 10, 2023

The error was in my asyncio-tutorial PR which never got merged and is now closed (so there are no docs in main branch to fix here). I don't know who if anyone is going to take that work further, but I'll have a look around.

@gvanrossum
Copy link
Member

Sorry for the confusion. Good luck! And thanks for trying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-asyncio
Projects
None yet
Development

No branches or pull requests

6 participants