-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-87597: Decode subprocess output in text mode when timeout is hit #95579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e5f4ee1
to
aecc55b
Compare
Misc/NEWS.d/next/Core and Builtins/2022-08-02-18-12-34.gh-issue-87597.UHFR0H.rst
Outdated
Show resolved
Hide resolved
…e-87597.UHFR0H.rst
I'm not sure why this is failing on MacOS, I would've expected it to follow the posix code paths, but I don't know much about mac... Any suggestions? |
Any chance of a review @eryksun? :) Do we think the fix should be backported, since it's marked as a bug? It would be great to get this in 3.9 so I can remove a workaround I currently have in place! |
encoding, | ||
errors) | ||
else: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception would be raised instead of the timeout error, which isn't going to be helpful. I considered three alternatives, but I think the last is the best:
- start raising the TimeoutExpired, then raise the UnicodeDecodeError
from
it - defer decoding to the
TimeoutExpired
class (so it learns to keep encoding/errors around and do the decode on demand) - decode up to
exc.start
here, then decode the rest with thereplace
error handler and concatenate it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I like that latter strategy. I think it is important to have the TimeoutExpired
be the primary error. Otherwise the caller could wrongly interpret the UnicodeDecodeError
as the process exiting naturally while producing undecodable output. If the process did emit undecodable data on its own, that isn't important in this situation.
if stdout_seq is not None: | ||
stdout = b''.join(stdout_seq) | ||
if self.text_mode: | ||
stdout = translate_newlines_partial_output( | ||
stdout, self.stdout.encoding, self.stdout.errors) | ||
else: | ||
stdout = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this whole pattern could go into the helper function? join_and_translate_newlines
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the if/else around the None makes sense here as is, but the join and text_mode conditional could be done in-function.
basically
def join_and_maybe_decode(output_seq, data, encoding, errors):
output = b''.join(output_seq)
if self.text_mode:
try:
... # existing translate_newlines_partial_output code
and here you'd have things like
if stdout_seq is not None:
stdout = join_and_maybe_decode(stdout_seq, data, encoding, errors)
else:
stdout = None
# and the same for stderr
encoding, | ||
errors) | ||
else: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I like that latter strategy. I think it is important to have the TimeoutExpired
be the primary error. Otherwise the caller could wrongly interpret the UnicodeDecodeError
as the process exiting naturally while producing undecodable output. If the process did emit undecodable data on its own, that isn't important in this situation.
if stdout_seq is not None: | ||
stdout = b''.join(stdout_seq) | ||
if self.text_mode: | ||
stdout = translate_newlines_partial_output( | ||
stdout, self.stdout.encoding, self.stdout.errors) | ||
else: | ||
stdout = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the if/else around the None makes sense here as is, but the join and text_mode conditional could be done in-function.
basically
def join_and_maybe_decode(output_seq, data, encoding, errors):
output = b''.join(output_seq)
if self.text_mode:
try:
... # existing translate_newlines_partial_output code
and here you'd have things like
if stdout_seq is not None:
stdout = join_and_maybe_decode(stdout_seq, data, encoding, errors)
else:
stdout = None
# and the same for stderr
@@ -1129,6 +1129,37 @@ def test_universal_newlines_communicate_encodings(self): | |||
stdout, stderr = popen.communicate(input='') | |||
self.assertEqual(stdout, '1\n2\n3\n4') | |||
|
|||
@unittest.skipIf(mswindows, "behavior currently not supported on Windows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is needed for this to work on Windows?
having inconsistent behavior in the returned stderr/stdout types on Timeout between the two platforms will make people's code difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, I didn't even notice this.
I'm guessing this should actually be "platforms that don't default to UTF-8 don't trigger the decoding error", which luckily is easily resolved by explicitly specifying that the encoding should be utf-8.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
We cannot. I don't even think we can accept this as written today without a deprecation period as this is changing a public API so that makes this a feature. People have already written code less pedantic than your own isinstance checking example that blindly assumes on TimeoutExpired that the stdout/stderr data is bytes or None. So this is an API change that breaks code in existing releases. That also means we cannot accept this behavior change for 3.12 and need to modify this PR to either:
Whatever behavior we have needs to become consistent across platforms as well. |
This is true. What we might be able to do now is to make
This is safe enough, because all versions will support setting that attribute, it just won't have any effect on old ones. We can then also show a deprecation warning for accessing the attributes without setting |
When using
text=True
and a timeout is hit fromsubprocess.run(cmd, timeout=T, text=True, capture_output=True)
then the resultingsubprocess.TimeoutExpired
exception incorrectly storesstdout
andstderr
in bytes, or if no output was received sets the attributes toNone
rather than the empty string.This results in the need for ugly workarounds such as:
The complexity is around the fact that the subprocess that was timed out may have given partial output that cannot be decoded (only some bytes from a codepoint), but this can be handled by ignoring a partial trailing codepoint.
Credit to @JessToudic for the suggestion of using the info already available on the
UnicodeDecodeError
exception and for helping to reproduce the issue/come up with the fix!@eryksun, @macdjord (participants on the issue)