Skip to content

Add support of UnicodeTranslateError in standard error handlers #67864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
serhiy-storchaka opened this issue Mar 15, 2015 · 8 comments
Open
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Mar 15, 2015

BPO 23676
Nosy @malemburg, @doerwalter, @ncoghlan, @vstinner, @vadmium, @serhiy-storchaka
Files
  • translate_error_handlers.patch
  • translate_error_handlers_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = None
    created_at = <Date 2015-03-15.21:59:08.104>
    labels = ['interpreter-core', 'type-feature']
    title = 'Add support of UnicodeTranslateError in standard error handlers'
    updated_at = <Date 2015-03-26.22:45:41.845>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2015-03-26.22:45:41.845>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2015-03-15.21:59:08.104>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['38502', '38504']
    hgrepos = []
    issue_num = 23676
    keywords = ['patch']
    message_count = 8.0
    messages = ['238163', '238180', '238973', '239018', '239353', '239355', '239357', '239358']
    nosy_count = 6.0
    nosy_names = ['lemburg', 'doerwalter', 'ncoghlan', 'vstinner', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23676'
    versions = ['Python 3.5']

    Linked PRs

    @serhiy-storchaka
    Copy link
    Member Author

    Proposed patch adds support of UnicodeTranslateError in standard error handlers "xmlcharrefreplace", "namereplace" and "surrogatepass". Support in "backslashreplace" was added in bpo-22286, support in "strict", "ignore" and "replace" was always, support in "surrogateescape" is unlikely possible.

    This can be used with bpo-18814.

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Mar 15, 2015
    @serhiy-storchaka
    Copy link
    Member Author

    Fixed a bug in "surrogatepass" with translating and added the versionchanged directive.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 23, 2015

    I think I saw your patch for bpo-18814 proposes to use UnicodeTranslateError. Is there any other case where it is used, either currently or in the past? All I know of it is the documentation, which says it is raised “during translating”.

    Experimenting with the constructor reveals that the “object” attribute is only allowed to be a text string (not bytes). So perhaps “translating” actually means converting from text strings to text strings, like “rot-13”. It would be nice if this were documented somewhere, rather than just saying translating is now supported.

    @serhiy-storchaka
    Copy link
    Member Author

    No, currently UnicodeTranslateError is not used in the stdlib in 3.x. But it is documented and supported by some error handlers. I think it should be wider used in text-to-text translations similar to proposed in bpo-18814.

    @serhiy-storchaka serhiy-storchaka self-assigned this Mar 23, 2015
    @vstinner
    Copy link
    Member

    I'm sorry, I don't understand this issue. Could you please elaborate the use case? Why do you want to support more error handlers? str.translate() calls _PyUnicode_TranslateCharmap() with errors="ignore", it's not possible to choose the error handler.

    Many codecs are implemented in Python and some of them are implemented with "charmap". Does this issue enhance the codecs implemented with "charmap"?

    "a\udc80".encode("latin9", "surrogatepass") raises UnicodeEncodeError with and without the patch, b"\x81".decode("cp1252", "surrogatepass") raises UnicodeDecodeError with and without the patch.

    Hum, I'm not sure that codecs.charmap_build() is related str.translate().

    @serhiy-storchaka
    Copy link
    Member Author

    str.encode, bytes.decode and str.translate are unrelated to UnicodeTranslateError. But str.transform could be.

    @vstinner
    Copy link
    Member

    Serhiy Storchaka added the comment:

    str.encode, bytes.decode and str.translate are unrelated to UnicodeTranslateError. But str.transform could be.

    Can you please give an example of Python code to show your change?

    @serhiy-storchaka
    Copy link
    Member Author

    bpo-18814

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Jul 1, 2024
    …r handlers
    
    Changed error handlers are: 'xmlcharrefreplace', 'namereplace' and
    'surrogatepass'.  Error handlers 'strict', 'ignore', 'replace', and
    'backslashreplace' already supported it.
    
    All standard error handlers except 'surrogateescape' now support
    translating.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    Development

    No branches or pull requests

    3 participants