-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. #14304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. #14304
Conversation
* The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
Is there a test case similar to the one in wsproto project's test present in test_codecs.py to check for from codecs import getincrementaldecoder
decoder = getincrementaldecoder("utf-8")()
print(decoder.decode(b'f\xf1\xf6rd', False)) |
I was not sure that we should guarantee this behavior. But new tests helped to make the fix more limited. |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8. |
…-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <[email protected]>
GH-14368 is a backport of this pull request to the 3.8 branch. |
GH-14369 is a backport of this pull request to the 3.7 branch. |
…-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <[email protected]>
* The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <[email protected]>
…-14304) (GH-14369) * bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <[email protected]>
…thonGH-14304) (pythonGH-14369) * bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (pythonGH-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b) Co-authored-by: Serhiy Storchaka <[email protected]>
…-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
…-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
a sequence that can't be handled by the error handler.
handler decodes now a lone low surrogate with final=False.
https://bugs.python.org/issue24214