Skip to content

Crash in asyncio when printing a task object at teardown #96232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pablogsal opened this issue Aug 24, 2022 · 11 comments
Closed

Crash in asyncio when printing a task object at teardown #96232

pablogsal opened this issue Aug 24, 2022 · 11 comments
Labels
3.10 only security fixes 3.11 only security fixes 3.12 only security fixes pending The issue will be closed if no feedback is provided topic-asyncio type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@pablogsal
Copy link
Member

pablogsal commented Aug 24, 2022

If a task object is still alive at interpreter teardown, something can try to print it. This will end in _asyncio.Future._repr_info that in turn calls:

return PyObject_CallOneArg(asyncio_future_repr_info_func, (PyObject *)self);

but asyncio_future_repr_info_func at that point can be NULL because module_free of _asynciomodule.c has been executed already, called by Py_FinalizeEx -> finalize_modules -> _PyInterpreterState_ClearModules -> ... -> module_dealloc.

The crash happens in the last _PyGC_CollectNoFail of finalize_modules so somehow the task survived the first collection and then was garbage in the second one.

I don;t have a simple repro for now, but you can get the failure by cloning https://github.com/Textualize/textual checking commit 5fe47da058564b72e5081a5658c054f55a5d85cd, installing the project and test deps and running pytest.

@pablogsal
Copy link
Member Author

CC: @graingert

@pablogsal
Copy link
Member Author

Backtrace of the crash:

    frame #0: 0x00000001ac496d98 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x00000001ac4cbee0 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x00000001ac406340 libsystem_c.dylib`abort + 168
    frame #3: 0x00000001ac405754 libsystem_c.dylib`__assert_rtn + 272
  * frame #4: 0x0000000103ef8984 _asyncio.cpython-310d-darwin.so`PyVectorcall_Function(callable=0x0000000000000000) at abstract.h:68:5
    frame #5: 0x0000000103ef8870 _asyncio.cpython-310d-darwin.so`_PyObject_VectorcallTstate(tstate=0x0000000100a04270, callable=0x0000000000000000, args=0x000000016fdfc9e0, nargsf=9223372036854775809, kwnames=0x0000000000000000) at abstract.h:109:12
    frame #6: 0x0000000103ef8754 _asyncio.cpython-310d-darwin.so`PyObject_CallOneArg(func=0x0000000000000000, arg=0x0000000106be8d10) at abstract.h:184:12
    frame #7: 0x0000000103efe81c _asyncio.cpython-310d-darwin.so`_asyncio_Future__repr_info_impl(self=0x0000000106be8d10) at _asynciomodule.c:1399:12
    frame #8: 0x0000000103efe390 _asyncio.cpython-310d-darwin.so`_asyncio_Future__repr_info(self=0x0000000106be8d10, _unused_ignored=0x0000000000000000) at _asynciomodule.c.h:309:12
    frame #9: 0x00000001000eac5c python`method_vectorcall_NOARGS(func=0x00000001040141d0, args=0x000000016fdfcbb8, nargsf=1, kwnames=0x0000000000000000) at descrobject.c:432:24
    frame #10: 0x00000001000dcba0 python`_PyObject_VectorcallTstate(tstate=0x0000000100a04270, callable=0x00000001040141d0, args=0x000000016fdfcbb8, nargsf=1, kwnames=0x0000000000000000) at abstract.h:114:11
    frame #11: 0x00000001000dca30 python`PyObject_VectorcallMethod(name=0x0000000103ee4d00, args=0x000000016fdfcbb8, nargsf=1, kwnames=0x0000000000000000) at call.c:770:24
    frame #12: 0x0000000103efad44 _asyncio.cpython-310d-darwin.so`_PyObject_VectorcallMethodId(name=0x0000000103f09010, args=0x000000016fdfcbb8, nargsf=9223372036854775809, kwnames=0x0000000000000000) at abstract.h:233:12
    frame #13: 0x0000000103efc100 _asyncio.cpython-310d-darwin.so`_PyObject_CallMethodIdNoArgs(self=0x0000000106be8d10, name=0x0000000103f09010) at abstract.h:239:12
    frame #14: 0x0000000103efc718 _asyncio.cpython-310d-darwin.so`FutureObj_repr(fut=0x0000000106be8d10) at _asynciomodule.c:1409:23
    frame #15: 0x000000010015a928 python`PyObject_Repr(v=0x0000000106be8d10) at object.c:440:11
    frame #16: 0x00000001001a20bc python`unicode_fromformat_arg(writer=0x000000016fdfce60, f="R is not in deque", vargs=0x000000016fdfcea0) at unicodeobject.c:3090:16
    frame #17: 0x00000001001a1020 python`PyUnicode_FromFormatV(format="%R is not in deque", vargs="\U00000010\x8d\xbe\U00000006\U00000001") at unicodeobject.c:3154:17
    frame #18: 0x00000001002b3a34 python`_PyErr_FormatV(tstate=0x0000000100a04270, exception=0x000000010048cfd0, format="%R is not in deque", vargs="\U00000010\x8d\xbe\U00000006\U00000001") at errors.c:1059:14
    frame #19: 0x00000001002b3a9c python`PyErr_Format(exception=0x000000010048cfd0, format="%R is not in deque") at errors.c:1101:5
    frame #20: 0x000000010037b9ec python`deque_remove(deque=0x0000000106f53e50, value=0x0000000106be8d10) at _collectionsmodule.c:1260:9
    frame #21: 0x00000001000ead8c python`method_vectorcall_O(func=0x0000000100e5e1b0, args=0x00000001389049e0, nargsf=9223372036854775810, kwnames=0x0000000000000000) at descrobject.c:460:24
    frame #22: 0x0000000100287938 python`_PyObject_VectorcallTstate(tstate=0x0000000100a04270, callable=0x0000000100e5e1b0, args=0x00000001389049e0, nargsf=9223372036854775810, kwnames=0x0000000000000000) at abstract.h:114:11
    frame #23: 0x0000000100283c5c python`PyObject_Vectorcall(callable=0x0000000100e5e1b0, args=0x00000001389049e0, nargsf=9223372036854775810, kwnames=0x0000000000000000) at abstract.h:123:12
    frame #24: 0x0000000100283dac python`call_function(tstate=0x0000000100a04270, trace_info=0x000000016fdfe358, pp_stack=0x000000016fdfd558, oparg=2, kwnames=0x0000000000000000) at ceval.c:5891:13
    frame #25: 0x000000010027e348 python`_PyEval_EvalFrameDefault(tstate=0x0000000100a04270, f=0x0000000138904850, throwflag=1) at ceval.c:4198:23
    frame #26: 0x0000000100100528 python`_PyEval_EvalFrame(tstate=0x0000000100a04270, f=0x0000000138904850, throwflag=1) at pycore_ceval.h:46:12
    frame #27: 0x00000001001000fc python`gen_send_ex2(gen=0x0000000106ff43b0, arg=0x00000001004984c8, presult=0x000000016fdfe550, exc=1, closing=1) at genobject.c:213:14
    frame #28: 0x00000001000ffb14 python`gen_send_ex(gen=0x0000000106ff43b0, arg=0x00000001004984c8, exc=1, closing=1) at genobject.c:279:9
    frame #29: 0x00000001000fd7f4 python`gen_close(gen=0x0000000106ff43b0, args=0x0000000000000000) at genobject.c:384:14
    frame #30: 0x00000001000ffa28 python`gen_close_iter(yf=0x0000000106ff43b0) at genobject.c:320:18
    frame #31: 0x00000001000fd790 python`gen_close(gen=0x0000000106ff4320, args=0x0000000000000000) at genobject.c:378:15
    frame #32: 0x00000001000ffa28 python`gen_close_iter(yf=0x0000000106ff4320) at genobject.c:320:18
    frame #33: 0x00000001000fd790 python`gen_close(gen=0x0000000106ff4200, args=0x0000000000000000) at genobject.c:378:15
    frame #34: 0x00000001000ffa28 python`gen_close_iter(yf=0x0000000106ff4200) at genobject.c:320:18
    frame #35: 0x00000001000fd790 python`gen_close(gen=0x000000013886fe30, args=0x0000000000000000) at genobject.c:378:15
    frame #36: 0x00000001000fd52c python`_PyGen_Finalize(self=0x000000013886fe30) at genobject.c:87:15
    frame #37: 0x0000000100329efc python`finalize_garbage(tstate=0x0000000100a04270, collectable=0x000000016fdfe878) at gcmodule.c:982:13
    frame #38: 0x0000000100326164 python`gc_collect_main(tstate=0x0000000100a04270, generation=2, n_collected=0x0000000000000000, n_uncollectable=0x0000000000000000, nofail=1) at gcmodule.c:1287:5
    frame #39: 0x0000000100325d84 python`_PyGC_CollectNoFail(tstate=0x0000000100a04270) at gcmodule.c:2123:9
    frame #40: 0x00000001002e6be8 python`finalize_modules(tstate=0x0000000100a04270) at pylifecycle.c:1565:5
    frame #41: 0x00000001002e6824 python`Py_FinalizeEx at pylifecycle.c:1784:5
    frame #42: 0x0000000100323d54 python`Py_RunMain at main.c:668:9
    frame #43: 0x00000001003240e0 python`pymain_main(args=0x000000016fdfea00) at main.c:696:12
    frame #44: 0x0000000100324130 python`Py_BytesMain(argc=4, argv=0x000000016fdfebc8) at main.c:720:12
    frame #45: 0x0000000100003bf8 python`main(argc=4, argv=0x000000016fdfebc8) at python.c:15:12
    frame #46: 0x000000010062108c dyld`start + 520

In debug mode we get an assert failing:

* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #4: 0x0000000103ef8984 _asyncio.cpython-310d-darwin.so`PyVectorcall_Function(callable=0x0000000000000000) at abstract.h:68:5
   65       Py_ssize_t offset;
   66       vectorcallfunc ptr;
   67
-> 68       assert(callable != NULL);
   69       tp = Py_TYPE(callable);
   70       if (!PyType_HasFeature(tp, Py_TPFLAGS_HAVE_VECTORCALL)) {
   71           return NULL;
Target 0: (python) stopped.

And here is the problem already-cleaned object:

frame #7: 0x0000000103efe81c _asyncio.cpython-310d-darwin.so`_asyncio_Future__repr_info_impl(self=0x0000000106be8d10) at _asynciomodule.c:1399:12
   1396 _asyncio_Future__repr_info_impl(FutureObj *self)
   1397 /*[clinic end generated code: output=fa69e901bd176cfb input=f21504d8e2ae1ca2]*/
   1398 {
-> 1399     return PyObject_CallOneArg(asyncio_future_repr_info_func, (PyObject *)self);
   1400 }
   1401
   1402 static PyObject *
(lldb) p asyncio_future_repr_info_func
(PyObject *) $0 = NULL

@graingert graingert self-assigned this Aug 24, 2022
@pablogsal
Copy link
Member Author

pablogsal commented Aug 24, 2022

This is enough to fix the segfault, but is a very incomplete fix:

diff --git a/Modules/_asynciomodule.c b/Modules/_asynciomodule.c
index 1f6de2177a..0be5ca1e8c 100644
--- a/Modules/_asynciomodule.c
+++ b/Modules/_asynciomodule.c
@@ -1396,6 +1396,10 @@ static PyObject *
 _asyncio_Future__repr_info_impl(FutureObj *self)
 /*[clinic end generated code: output=fa69e901bd176cfb input=f21504d8e2ae1ca2]*/
 {
+    if (!asyncio_future_repr_info_func) {
+        PyErr_SetString(PyExc_RuntimeError, "Printing a _asyncio.Future object at shutdown");
+        return NULL;
+    }
     return PyObject_CallOneArg(asyncio_future_repr_info_func, (PyObject *)self);
 }

Technically the crash can happen with any method.

@AlexWaygood AlexWaygood added topic-asyncio type-crash A hard crash of the interpreter, possibly with a core dump labels Aug 24, 2022
@pablogsal
Copy link
Member Author

pablogsal commented Aug 24, 2022

Would be great if someone can work on a simple reproducer. We need a task/future object that survives the first collection at shutdown and something tries to print it in the second. In the reproducer, what tries to remove it is this code:

frame #20: 0x000000010037b9ec python`deque_remove(deque=0x0000000106653e50, value=0x00000001062e8d10) at _collectionsmodule.c:1260:9
   1257         }
   1258     }
   1259     if (i == n) {
-> 1260         PyErr_Format(PyExc_ValueError, "%R is not in deque", value);
   1261         return NULL;
   1262     }
   1263     rv = deque_del_item(deque, I);

that value is the task object. That happens while calling gen_close_iter on a coroutine object MessagePump._process_messages object.

@pablogsal
Copy link
Member Author

pablogsal commented Aug 24, 2022

CC: @kumaraditya303 in case you want a challenge :)

@pablogsal pablogsal added 3.11 only security fixes 3.10 only security fixes 3.12 only security fixes labels Aug 24, 2022
@graingert graingert removed their assignment Aug 24, 2022
@graingert
Copy link
Contributor

graingert commented Aug 24, 2022

I started messing around triggering Task GC in this reproducer, I'm not sure if it will help here

#94972 (comment)

@gvanrossum
Copy link
Member

@pablogsal Which exact Python version and commit were you using? (The line numbers in the stack trace seem slightly off.)

Maybe this was already obvious to everybody on this issue, but from the stack trace it looks like a bunch of nested await calls are finally being terminated.

There are several await calls in MessagePump._process_messages (I didn't find this code -- Pablo mentioned it, it's in the textual package) and the last one is inside a try-finally whose finally block does more await calls (dispatching "on_idle" events, which is a textual thing).

It would seem that something is finally cancelling this stack of await calls -- the stack trace shows that it's throwing a GeneratorExit into the innermost coroutine. I suspect that this might be calling asyncio.Queue.get() (but I'm not sure).

I also note that the message pump code uses a WeakSet of tasks, and in close_messages() there is this code:

        for task in self._child_tasks:
            task.cancel()
            await task
        self._child_tasks.clear()

Would it be possible that this loop doesn't cancel all the tasks because some tasks are GC'ed elsewhere? We know that keeping a weak reference to a task is not enough to keep it alive, GC can kill it without closing it. This mechanism is only used for timers created by textual/timer.py.

Someone should probably investigate that timer class more (since there may be a clue to a simpler repro here), but I'm tired and I need to go to bed.

@gvanrossum
Copy link
Member

Oh darn, looks like I was looking at the HEAD of the main branch of textualize, instead of the commit 5fe47da058564b72e5081a5658c054f55a5d85cd as Pablo said. The finally block in _process_messages is even more complex there. :-(

@kumaraditya303
Copy link
Contributor

I still haven't managed to create a smaller reproducer but hopefully this will be fixed once #91375 is fixed.

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Nov 29, 2022

PR #99122 is merged now so this shouldn't happen on 3.12+. Since nobody was able to create a reproducer for this, I propose to close this as won't fix.

@kumaraditya303 kumaraditya303 added skip news pending The issue will be closed if no feedback is provided and removed skip news labels Nov 29, 2022
@ezio-melotti ezio-melotti moved this to Todo in asyncio Nov 29, 2022
@gvanrossum
Copy link
Member

Agreed.

@kumaraditya303 kumaraditya303 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2022
Repository owner moved this from Todo to Done in asyncio Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes 3.12 only security fixes pending The issue will be closed if no feedback is provided topic-asyncio type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Status: Done
Development

No branches or pull requests

5 participants