-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-107607: Update comment about utf-8 BOM being ignored #107858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Closing and re-opening to retrigger CLA checks. Sorry for the noise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not yet ready to merge. I am working on a comment on the issue.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
If we aren't limited to choosing just one of the terms between "UTF-8-Sig" and "UTF_8_Sig", I also agree to use "UTF-8 with BOM" which is already used in Windows, since this page is for Windows users. |
As explained in #107607 (comment), I now think that the entire sentence in question should be replaced with "If the implicit or explicit encoding of a file is UTF-8, a UTF-8 byte-order mark (b'\xef\xbb\xbf') is ignored rather than being a syntax error." or something very close to this. Edit: Done. |
Thanks @rscarrera27 for the PR, and @terryjreedy for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11. |
Thanks @rscarrera27 for the PR, and @terryjreedy for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12. |
…GH-107858) --------- Co-authored-by: Terry Jan Reedy <[email protected]> (cherry picked from commit 7f64ae3) Co-authored-by: Sunghyun Kim <[email protected]>
GH-117015 is a backport of this pull request to the 3.11 branch. |
…GH-107858) --------- Co-authored-by: Terry Jan Reedy <[email protected]> (cherry picked from commit 7f64ae3) Co-authored-by: Sunghyun Kim <[email protected]>
GH-117016 is a backport of this pull request to the 3.12 branch. |
…7858) (#117015) (cherry picked from commit 7f64ae3) Co-authored-by: Terry Jan Reedy [email protected]
…7858) (#117016) (cherry picked from commit 7f64ae3) Co-authored-by: Terry Jan Reedy [email protected]
…#107858) --------- Co-authored-by: Terry Jan Reedy <[email protected]>
…#107858) --------- Co-authored-by: Terry Jan Reedy <[email protected]>
…#107858) --------- Co-authored-by: Terry Jan Reedy <[email protected]>
According to the Python codecs docs[1], Python calls UTF-8 with BOM as utf-8-sig. Therefore, using "UTF-8-Sig" seems more appropriate.
[1] docs.python.org/3/library/codecs.html#encodings-and-unicode
📚 Documentation preview 📚: https://cpython-previews--107858.org.readthedocs.build/