-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Add support of UnicodeTranslateError in standard error handlers #67864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Proposed patch adds support of UnicodeTranslateError in standard error handlers "xmlcharrefreplace", "namereplace" and "surrogatepass". Support in "backslashreplace" was added in bpo-22286, support in "strict", "ignore" and "replace" was always, support in "surrogateescape" is unlikely possible. This can be used with bpo-18814. |
Fixed a bug in "surrogatepass" with translating and added the versionchanged directive. |
I think I saw your patch for bpo-18814 proposes to use UnicodeTranslateError. Is there any other case where it is used, either currently or in the past? All I know of it is the documentation, which says it is raised “during translating”. Experimenting with the constructor reveals that the “object” attribute is only allowed to be a text string (not bytes). So perhaps “translating” actually means converting from text strings to text strings, like “rot-13”. It would be nice if this were documented somewhere, rather than just saying translating is now supported. |
No, currently UnicodeTranslateError is not used in the stdlib in 3.x. But it is documented and supported by some error handlers. I think it should be wider used in text-to-text translations similar to proposed in bpo-18814. |
I'm sorry, I don't understand this issue. Could you please elaborate the use case? Why do you want to support more error handlers? str.translate() calls _PyUnicode_TranslateCharmap() with errors="ignore", it's not possible to choose the error handler. Many codecs are implemented in Python and some of them are implemented with "charmap". Does this issue enhance the codecs implemented with "charmap"? "a\udc80".encode("latin9", "surrogatepass") raises UnicodeEncodeError with and without the patch, b"\x81".decode("cp1252", "surrogatepass") raises UnicodeDecodeError with and without the patch. Hum, I'm not sure that codecs.charmap_build() is related str.translate(). |
str.encode, bytes.decode and str.translate are unrelated to UnicodeTranslateError. But str.transform could be. |
Serhiy Storchaka added the comment:
Can you please give an example of Python code to show your change? |
…r handlers Changed error handlers are: 'xmlcharrefreplace', 'namereplace' and 'surrogatepass'. Error handlers 'strict', 'ignore', 'replace', and 'backslashreplace' already supported it. All standard error handlers except 'surrogateescape' now support translating.
Uh oh!
There was an error while loading. Please reload this page.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: