Skip to content

gh-107607: Update comment about utf-8 BOM being ignored #107858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 19, 2024

Conversation

rscarrera27
Copy link
Contributor

@rscarrera27 rscarrera27 commented Aug 11, 2023

According to the Python codecs docs[1], Python calls UTF-8 with BOM as utf-8-sig. Therefore, using "UTF-8-Sig" seems more appropriate.

(...) To increase the reliability with which a UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8 (that Python calls "utf-8-sig")
(...) On encoding the utf-8-sig codec will write 0xef, 0xbb, 0xbf as the first three bytes to the file.

[1] docs.python.org/3/library/codecs.html#encodings-and-unicode


📚 Documentation preview 📚: https://cpython-previews--107858.org.readthedocs.build/

@rscarrera27
Copy link
Contributor Author

@corona10 🙏

@corona10 corona10 requested a review from terryjreedy August 11, 2023 08:31
@ambv
Copy link
Contributor

ambv commented Aug 11, 2023

Closing and re-opening to retrigger CLA checks. Sorry for the noise.

@ambv ambv closed this Aug 11, 2023
@ambv ambv reopened this Aug 11, 2023
@ghost
Copy link

ghost commented Aug 11, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

Copy link
Member

@terryjreedy terryjreedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not yet ready to merge. I am working on a comment on the issue.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@rscarrera27
Copy link
Contributor Author

If we aren't limited to choosing just one of the terms between "UTF-8-Sig" and "UTF_8_Sig", I also agree to use "UTF-8 with BOM" which is already used in Windows, since this page is for Windows users.

@terryjreedy
Copy link
Member

terryjreedy commented Aug 12, 2023

As explained in #107607 (comment), I now think that the entire sentence in question should be replaced with "If the implicit or explicit encoding of a file is UTF-8, a UTF-8 byte-order mark (b'\xef\xbb\xbf') is ignored rather than being a syntax error." or something very close to this. Edit: Done.

@terryjreedy terryjreedy changed the title gh-107607: Correct encoding name in docs gh-107607: Update comment about utf-8 BOM being ignored Mar 19, 2024
@terryjreedy terryjreedy merged commit 7f64ae3 into python:main Mar 19, 2024
@terryjreedy terryjreedy added the needs backport to 3.11 only security fixes label Mar 19, 2024
@terryjreedy terryjreedy added the needs backport to 3.12 only security fixes label Mar 19, 2024
@miss-islington-app
Copy link

Thanks @rscarrera27 for the PR, and @terryjreedy for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @rscarrera27 for the PR, and @terryjreedy for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 19, 2024
…GH-107858)

---------
Co-authored-by: Terry Jan Reedy <[email protected]>
(cherry picked from commit 7f64ae3)

Co-authored-by: Sunghyun Kim <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Mar 19, 2024

GH-117015 is a backport of this pull request to the 3.11 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.11 only security fixes label Mar 19, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 19, 2024
…GH-107858)

---------
Co-authored-by: Terry Jan Reedy <[email protected]>
(cherry picked from commit 7f64ae3)

Co-authored-by: Sunghyun Kim <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Mar 19, 2024

GH-117016 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Mar 19, 2024
terryjreedy pushed a commit that referenced this pull request Mar 19, 2024
terryjreedy pushed a commit that referenced this pull request Mar 19, 2024
vstinner pushed a commit to vstinner/cpython that referenced this pull request Mar 20, 2024
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 25, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants