-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-108973: Fix asyncio test_subprocess_consistent_callbacks() #109431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-108973: Fix asyncio test_subprocess_consistent_callbacks() #109431
Conversation
I’m on vacation until Sept. 25. Can this wait? |
This bug is affecting many buildbots, it fails quite often :-( I will try to get a review from someone else :-) @sorcio seems to now have a good knowledge of what's going on. |
|
||
def exit_maybe(self): | ||
# Only exit when we got all expected events | ||
if len(events) >= len(expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, we stopped as soon as we seen process_exited. But sometimes, process_exited is get earlier than other events, and so events only contains some expected events and the test fails also in this case :-(
Procedure to reproduce the bug, by @sorcio. Create #include <unistd.h>
int main(int argc, char** argv) {
write(1, "stdout", 6);
write(2, "stderr", 6);
return 0;
} Build it with: cc subprocess-test-program.c -O3 -o subprocess-test-program Apply this patch: diff --git a/Lib/test/test_asyncio/test_subprocess.py b/Lib/test/test_asyncio/test_subprocess.py
index eeeca40c15..882c20a7ad 100644
--- a/Lib/test/test_asyncio/test_subprocess.py
+++ b/Lib/test/test_asyncio/test_subprocess.py
@@ -773,8 +773,9 @@ async def main() -> None:
loop = asyncio.get_running_loop()
exit_future = asyncio.Future()
code = 'import sys; sys.stdout.write("stdout"); sys.stderr.write("stderr")'
+ program = os.path.join(os.path.dirname(sys.executable), 'subprocess-test-program')
transport, _ = await loop.subprocess_exec(lambda: MyProtocol(exit_future),
- sys.executable, '-c', code, stdin=None)
+ program, stdin=None)
await exit_future
transport.close()
self.assertEqual(events, [ Run:
On Linux, it takes less than 5 seconds to reproduce the issue. |
With this change, I can no longer reproduce the bug on Linux. I stressed the test with (my laptop has 6 threads / 12 CPU cores):
I stopped the test after 7 minutes and 2656 success:
|
me:
Oh, just a recent example (1 hour ago): I just fixed AMD64 RHEL8 FIPS Only Blake2 Builtin Hash 3.x buildbot which always failed. The buildbot turned green for its first time since it was set up! The build 6002 is a success. And you know what? The following build 6003 is a failure because of:
This whack-a-mole game is less funny if I see the same bugs over and over :-( |
@kumaraditya303 @sorcio: Would you mind to review this test_asyncio fix? See the issue for the rationale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, this test was introduced on the faulty assumption that process_exited
should come last, but asyncio doesn't try to do that, and it doesn't always make sense. It has value in that it's the only subprocess test case that also covers pipe events, though. Unless we add better coverage, your version of the test is okay. The question is whether it's necessary. I suggest to remove it entirely, or at least to make it clear what it's testing (the name is not descriptive at all).
Oh. The asyncio doc has an example which makes the assumption that process_exit() will be first called last. As soon as it's called, the program stops: https://docs.python.org/dev/library/asyncio-protocol.html#loop-subprocess-exec-and-subprocessprotocol.
@sorcio: Would you mind to review the updated PR? |
I also reverted commit 282edd7 as @sorcio suggested in his comment: #108973 (comment) |
@@ -227,7 +227,7 @@ async def _make_subprocess_transport(self, protocol, args, shell, | |||
|
|||
def _child_watcher_callback(self, pid, returncode, transp): | |||
# Skip one iteration for callbacks to be executed | |||
self.call_soon_threadsafe(self.call_soon, transp._process_exited, returncode) | |||
self.call_soon_threadsafe(transp._process_exited, returncode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to call the method immediately. It's up to the child watcher to take care to call this in the loop thread. ThreadedChildWatcher
already does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The child watcher runs in a different thread. call_soon_threadsafe() is needed to make sure that the callback is run in the same thread that the event loop. It's an important principle in asyncio, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that the child watcher itself already uses call_soon_threadsafe()
to schedule the callback. But I've just noticed a little complication. The documentation for AbstractChildWatcher.add_child_handler
requires the callback itself to be thread-safe, which could mean that it can be called from any thread. One of the calls to call_soon_threadsafe()
is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit of digging and the comment requiring the callback to be thread-safe was there already before there was even support for running the loop on different threads. The watchers have been updated to use call_soon_threadsafe()
for callbacks after that. Since watchers are public API (thankfully deprecated), the requirement can't be relaxed.
Possibly one of the redundant calls can still be removed, but considering 3.14 will remove watchers altogether it might not be worth it. Let's say that I'm skeptical of this approach of several layers of scheduling callbacks (there is one more when the transport calls the protocol's process_exited
method) because it can hide race conditions such as the one in this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be ok if this call is removed in a separated PR? I'm not comfortable to touch asyncio, but the bug is impacting many CIs and it's very annoying. I would prefer that someone who is more comfortable with asyncio does this change :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, clearly this would need more discussion/investigation/etc. Let's forget about it for this PR.
209b7e5
to
22489d4
Compare
SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
22489d4
to
59400e9
Compare
@sorcio: Would you mind to review my updated PR? |
I stress tested my fix: it works as expected:
I interrupted it after 4 min 30 sec. The only failure is unrelated to the initial issue:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I came to appreciate that the example now shows the correct relationship between events for a simple case like this. It makes the documentation a lot clearer, so it doesn't matter if (if!) it's not perfectly idiomatic.
Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12. |
Sorry, @vstinner, I could not cleanly backport this to |
…ythonGH-109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. (cherry picked from commit ced6924) Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Davide Rizzo <[email protected]>
GH-109609 is a backport of this pull request to the 3.12 branch. |
@sorcio: In the main branch, I didn't want to remove call_soon_threadsafe() in _child_watcher_callback(), since this change is going to be backported to stable versions. Also, as I wrote, I'm not comfortable with touching asyncio in general. Do you want to propose a PR to change this code in the main branch? |
Oh, the test was added to Python 3.11 and so test_asyncio is not affected in Python 3.11. |
GH-109610 is a backport of this pull request to the 3.11 branch. |
I just wrote PR #109610 to backport the documentation changes. |
Thanks a lot for the very constructive review @sorcio, apparently, I misunderstood how subprocess events are delivered in asyncio and you helped me to fix the doc, the doc example and the test. I think that the situation is now well better documented and accurate than before. |
|
gh-108973: Fix asyncio SubprocessProtocol doc (#109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the example in the doc. Co-authored-by: Davide Rizzo <[email protected]> (cherry picked from commit ced6924)
Do we need an issue for this, or just a PR referencing the same issue? |
|
|
|
|
|
…ython#109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
…H-109431) (#109609) gh-108973: Fix asyncio test_subprocess_consistent_callbacks() (GH-109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. (cherry picked from commit ced6924) Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Davide Rizzo <[email protected]>
…ython#109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
SubprocessProtocol process_exited() method can be called before
pipe_data_received() and pipe_connection_lost() methods. Document it
and adapt the test for that.
Revert commit 282edd7.
_child_watcher_callback() calls immediately _process_exited(): don't
add an additional delay with call_soon(). The reverted change didn't
make _process_exited() more determistic: it can still be called
before pipe_connection_lost() for example.
Co-authored-by: Davide Rizzo [email protected]