Skip to content

gh-94606: Fix error when message with Unicode surrogate not surrogateescaped string #94641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 11, 2023

Conversation

sidney
Copy link
Contributor

@sidney sidney commented Jul 7, 2022

Fix for issue #94606 email.message.get_payload raises UnicodeEncodeError when the message body contains a Unicode surrogate character but is not a valid string that can be decoded using surrogateescape

Added a more strict test for string being surrogateescape decoded that is only used on strings that pass the fast heuristic test, which should be rare.

@sidney sidney requested a review from a team as a code owner July 7, 2022 05:21
@ghost
Copy link

ghost commented Jul 7, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@serhiy-storchaka serhiy-storchaka self-requested a review July 7, 2022 08:31
erlend-aasland and others added 4 commits July 7, 2022 20:49
…timization for blocks that have a line number (pythonGH-94592)

Inlining of code that corresponds to source code lines, can make it hard to distinguish later between code which is only reachable from except handlers, and that which is reachable in normal control flow. This caused problems with the debugger's jump feature.

This PR turns off the inlining optimisation for code which has line numbers. We still inline things like the implicit "return None".
@sidney
Copy link
Contributor Author

sidney commented Jul 7, 2022

Somehow I ended up with four commits by erland-assland for sqlite3 showing up in this PR. All I thought I did was add a commit to my branch for a change to the unit tests. Did I do something wrong in git, and is this PR now messed when it comes to merging?

@arhadthedev
Copy link
Member

Whatever it was, all commits will be squashed while merging anyway. So there is nothing worth fixing.

Comment on lines 293 to 294
if utils._has_decoded_with_surrogateescape(payload):
bpayload = payload.encode('ascii', 'surrogateescape')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

payload.encode('ascii', 'surrogateescape') is calculated twice. First in _has_decoded_with_surrogateescape(), the, if it successes, the result is dropped and it is encoded again here. It is possible to optimize this, but it requires larger change.

@serhiy-storchaka serhiy-storchaka added needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes labels Dec 11, 2023
@serhiy-storchaka serhiy-storchaka merged commit 27a5fd8 into python:main Dec 11, 2023
@miss-islington-app
Copy link

Thanks @sidney for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Dec 11, 2023
…rogateescaped string (pythonGH-94641)

(cherry picked from commit 27a5fd8)

Co-authored-by: Sidney Markowitz <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Dec 11, 2023

GH-112971 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Dec 11, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Dec 11, 2023
…rogateescaped string (pythonGH-94641)

(cherry picked from commit 27a5fd8)

Co-authored-by: Sidney Markowitz <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Dec 11, 2023

GH-112972 is a backport of this pull request to the 3.11 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.11 only security fixes label Dec 11, 2023
serhiy-storchaka added a commit that referenced this pull request Dec 11, 2023
…rrogateescaped string (GH-94641) (GH-112972)

(cherry picked from commit 27a5fd8)

Co-authored-by: Sidney Markowitz <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
serhiy-storchaka added a commit that referenced this pull request Dec 11, 2023
…rrogateescaped string (GH-94641) (GH-112971)

(cherry picked from commit 27a5fd8)

Co-authored-by: Sidney Markowitz <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

email.message get_payload throws UnicodeEncodeError with some surrogate Unicode characters
6 participants