Skip to content

bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. #14304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 22, 2019

  • The UTF-8 incremental decoders fails now fast if encounter
    a sequence that can't be handled by the error handler.
  • The UTF-16 incremental decoders with the surrogatepass error
    handler decodes now a lone low surrogate with final=False.

https://bugs.python.org/issue24214

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
@tirkarthi
Copy link
Member

Is there a test case similar to the one in wsproto project's test present in test_codecs.py to check for UnicodeDecodeError ? I could see the below test raising UnicodeDecodeError like older behavior with the PR where as it returns 'f' on master.

from codecs import getincrementaldecoder
decoder = getincrementaldecoder("utf-8")()
print(decoder.decode(b'f\xf1\xf6rd', False))

@serhiy-storchaka
Copy link
Member Author

I was not sure that we should guarantee this behavior. But new tests helped to make the fix more limited.

@miss-islington
Copy link
Contributor

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

@serhiy-storchaka serhiy-storchaka deleted the utf8-utf16-incremental-decoder branch June 25, 2019 08:54
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 25, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-bot
Copy link

GH-14368 is a backport of this pull request to the 3.8 branch.

@bedevere-bot
Copy link

GH-14369 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 25, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <[email protected]>
miss-islington added a commit that referenced this pull request Jun 25, 2019
* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <[email protected]>
vstinner pushed a commit that referenced this pull request Jun 25, 2019
…-14304) (GH-14369)

* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <[email protected]>
ned-deily pushed a commit to ned-deily/cpython that referenced this pull request Jul 2, 2019
…thonGH-14304) (pythonGH-14369)

* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (pythonGH-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <[email protected]>
lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants