-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-94606: Fix error when message with Unicode surrogate not surrogateescaped string #94641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ot valid surrogateescaped string
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
…timization for blocks that have a line number (pythonGH-94592) Inlining of code that corresponds to source code lines, can make it hard to distinguish later between code which is only reachable from except handlers, and that which is reachable in normal control flow. This caused problems with the debugger's jump feature. This PR turns off the inlining optimisation for code which has line numbers. We still inline things like the implicit "return None".
…ython#94629) Co-authored-by: CAM Gerlach <[email protected]>
Somehow I ended up with four commits by erland-assland for sqlite3 showing up in this PR. All I thought I did was add a commit to my branch for a change to the unit tests. Did I do something wrong in git, and is this PR now messed when it comes to merging? |
Whatever it was, all commits will be squashed while merging anyway. So there is nothing worth fixing. |
Lib/email/message.py
Outdated
if utils._has_decoded_with_surrogateescape(payload): | ||
bpayload = payload.encode('ascii', 'surrogateescape') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
payload.encode('ascii', 'surrogateescape')
is calculated twice. First in _has_decoded_with_surrogateescape()
, the, if it successes, the result is dropped and it is encoded again here. It is possible to optimize this, but it requires larger change.
Thanks @sidney for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12. |
…rogateescaped string (pythonGH-94641) (cherry picked from commit 27a5fd8) Co-authored-by: Sidney Markowitz <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
GH-112971 is a backport of this pull request to the 3.12 branch. |
…rogateescaped string (pythonGH-94641) (cherry picked from commit 27a5fd8) Co-authored-by: Sidney Markowitz <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
GH-112972 is a backport of this pull request to the 3.11 branch. |
…rrogateescaped string (GH-94641) (GH-112972) (cherry picked from commit 27a5fd8) Co-authored-by: Sidney Markowitz <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…rrogateescaped string (GH-94641) (GH-112971) (cherry picked from commit 27a5fd8) Co-authored-by: Sidney Markowitz <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…rogateescaped string (pythonGH-94641) Co-authored-by: Serhiy Storchaka <[email protected]>
…rogateescaped string (pythonGH-94641) Co-authored-by: Serhiy Storchaka <[email protected]>
Fix for issue #94606 email.message.get_payload raises UnicodeEncodeError when the message body contains a Unicode surrogate character but is not a valid string that can be decoded using surrogateescape
Added a more strict test for string being surrogateescape decoded that is only used on strings that pass the fast heuristic test, which should be rare.