Skip to content

Commit 4827483

Browse files
authored
bpo-43510: Implement PEP 597 opt-in EncodingWarning. (GH-19481)
See [PEP 597](https://www.python.org/dev/peps/pep-0597/). * Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`. * Add EncodingWarning * Add io.text_encoding() * open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled. * _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python) * bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding(). * What's new entry
1 parent 261a452 commit 4827483

32 files changed

+366
-18
lines changed

Doc/c-api/init_config.rst

+9
Original file line numberDiff line numberDiff line change
@@ -583,6 +583,15 @@ PyConfig
583583
584584
Default: ``0``.
585585
586+
.. c:member:: int warn_default_encoding
587+
588+
If non-zero, emit a :exc:`EncodingWarning` warning when :class:`io.TextIOWrapper`
589+
uses its default encoding. See :ref:`io-encoding-warning` for details.
590+
591+
Default: ``0``.
592+
593+
.. versionadded:: 3.10
594+
586595
.. c:member:: wchar_t* check_hash_pycs_mode
587596
588597
Control the validation behavior of hash-based ``.pyc`` files:

Doc/library/exceptions.rst

+9
Original file line numberDiff line numberDiff line change
@@ -741,6 +741,15 @@ The following exceptions are used as warning categories; see the
741741
Base class for warnings related to Unicode.
742742

743743

744+
.. exception:: EncodingWarning
745+
746+
Base class for warnings related to encodings.
747+
748+
See :ref:`io-encoding-warning` for details.
749+
750+
.. versionadded:: 3.10
751+
752+
744753
.. exception:: BytesWarning
745754

746755
Base class for warnings related to :class:`bytes` and :class:`bytearray`.

Doc/library/io.rst

+81
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,56 @@ stream by opening a file in binary mode with buffering disabled::
106106
The raw stream API is described in detail in the docs of :class:`RawIOBase`.
107107

108108

109+
.. _io-text-encoding:
110+
111+
Text Encoding
112+
-------------
113+
114+
The default encoding of :class:`TextIOWrapper` and :func:`open` is
115+
locale-specific (:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`).
116+
117+
However, many developers forget to specify the encoding when opening text files
118+
encoded in UTF-8 (e.g. JSON, TOML, Markdown, etc...) since most Unix
119+
platforms use UTF-8 locale by default. This causes bugs because the locale
120+
encoding is not UTF-8 for most Windows users. For example::
121+
122+
# May not work on Windows when non-ASCII characters in the file.
123+
with open("README.md") as f:
124+
long_description = f.read()
125+
126+
Additionally, while there is no concrete plan as of yet, Python may change
127+
the default text file encoding to UTF-8 in the future.
128+
129+
Accordingly, it is highly recommended that you specify the encoding
130+
explicitly when opening text files. If you want to use UTF-8, pass
131+
``encoding="utf-8"``. To use the current locale encoding,
132+
``encoding="locale"`` is supported in Python 3.10.
133+
134+
When you need to run existing code on Windows that attempts to opens
135+
UTF-8 files using the default locale encoding, you can enable the UTF-8
136+
mode. See :ref:`UTF-8 mode on Windows <win-utf8-mode>`.
137+
138+
.. _io-encoding-warning:
139+
140+
Opt-in EncodingWarning
141+
^^^^^^^^^^^^^^^^^^^^^^
142+
143+
.. versionadded:: 3.10
144+
See :pep:`597` for more details.
145+
146+
To find where the default locale encoding is used, you can enable
147+
the ``-X warn_default_encoding`` command line option or set the
148+
:envvar:`PYTHONWARNDEFAULTENCODING` environment variable, which will
149+
emit an :exc:`EncodingWarning` when the default encoding is used.
150+
151+
If you are providing an API that uses :func:`open` or
152+
:class:`TextIOWrapper` and passes ``encoding=None`` as a parameter, you
153+
can use :func:`text_encoding` so that callers of the API will emit an
154+
:exc:`EncodingWarning` if they don't pass an ``encoding``. However,
155+
please consider using UTF-8 by default (i.e. ``encoding="utf-8"``) for
156+
new APIs.
157+
158+
109159
High-level Module Interface
110160
---------------------------
111161

@@ -143,6 +193,32 @@ High-level Module Interface
143193
.. versionadded:: 3.8
144194

145195

196+
.. function:: text_encoding(encoding, stacklevel=2)
197+
198+
This is a helper function for callables that use :func:`open` or
199+
:class:`TextIOWrapper` and have an ``encoding=None`` parameter.
200+
201+
This function returns *encoding* if it is not ``None`` and ``"locale"`` if
202+
*encoding* is ``None``.
203+
204+
This function emits an :class:`EncodingWarning` if
205+
:data:`sys.flags.warn_default_encoding <sys.flags>` is true and *encoding*
206+
is None. *stacklevel* specifies where the warning is emitted.
207+
For example::
208+
209+
def read_text(path, encoding=None):
210+
encoding = io.text_encoding(encoding) # stacklevel=2
211+
with open(path, encoding) as f:
212+
return f.read()
213+
214+
In this example, an :class:`EncodingWarning` is emitted for the caller of
215+
``read_text()``.
216+
217+
See :ref:`io-text-encoding` for more information.
218+
219+
.. versionadded:: 3.10
220+
221+
146222
.. exception:: BlockingIOError
147223

148224
This is a compatibility alias for the builtin :exc:`BlockingIOError`
@@ -869,6 +945,8 @@ Text I/O
869945
*encoding* gives the name of the encoding that the stream will be decoded or
870946
encoded with. It defaults to
871947
:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`.
948+
``encoding="locale"`` can be used to specify the current locale's encoding
949+
explicitly. See :ref:`io-text-encoding` for more information.
872950

873951
*errors* is an optional string that specifies how encoding and decoding
874952
errors are to be handled. Pass ``'strict'`` to raise a :exc:`ValueError`
@@ -920,6 +998,9 @@ Text I/O
920998
locale encoding using :func:`locale.setlocale`, use the current locale
921999
encoding instead of the user preferred encoding.
9221000

1001+
.. versionchanged:: 3.10
1002+
The *encoding* argument now supports the ``"locale"`` dummy encoding name.
1003+
9231004
:class:`TextIOWrapper` provides these data attributes and methods in
9241005
addition to those from :class:`TextIOBase` and :class:`IOBase`:
9251006

Doc/using/cmdline.rst

+15
Original file line numberDiff line numberDiff line change
@@ -453,6 +453,9 @@ Miscellaneous options
453453
* ``-X pycache_prefix=PATH`` enables writing ``.pyc`` files to a parallel
454454
tree rooted at the given directory instead of to the code tree. See also
455455
:envvar:`PYTHONPYCACHEPREFIX`.
456+
* ``-X warn_default_encoding`` issues a :class:`EncodingWarning` when the
457+
locale-specific default encoding is used for opening files.
458+
See also :envvar:`PYTHONWARNDEFAULTENCODING`.
456459

457460
It also allows passing arbitrary values and retrieving them through the
458461
:data:`sys._xoptions` dictionary.
@@ -482,6 +485,9 @@ Miscellaneous options
482485

483486
The ``-X showalloccount`` option has been removed.
484487

488+
.. versionadded:: 3.10
489+
The ``-X warn_default_encoding`` option.
490+
485491
.. deprecated-removed:: 3.9 3.10
486492
The ``-X oldparser`` option.
487493

@@ -907,6 +913,15 @@ conflict.
907913

908914
.. versionadded:: 3.7
909915

916+
.. envvar:: PYTHONWARNDEFAULTENCODING
917+
918+
If this environment variable is set to a non-empty string, issue a
919+
:class:`EncodingWarning` when the locale-specific default encoding is used.
920+
921+
See :ref:`io-encoding-warning` for details.
922+
923+
.. versionadded:: 3.10
924+
910925

911926
Debug-mode variables
912927
~~~~~~~~~~~~~~~~~~~~

Doc/whatsnew/3.10.rst

+24
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,30 @@ For the full specification see :pep:`634`. Motivation and rationale
454454
are in :pep:`635`, and a longer tutorial is in :pep:`636`.
455455
456456
457+
.. _whatsnew310-pep597:
458+
459+
Optional ``EncodingWarning`` and ``encoding="locale"`` option
460+
-------------------------------------------------------------
461+
462+
The default encoding of :class:`TextIOWrapper` and :func:`open` is
463+
platform and locale dependent. Since UTF-8 is used on most Unix
464+
platforms, omitting ``encoding`` option when opening UTF-8 files
465+
(e.g. JSON, YAML, TOML, Markdown) is very common bug. For example::
466+
467+
# BUG: "rb" mode or encoding="utf-8" should be used.
468+
with open("data.json") as f:
469+
data = json.laod(f)
470+
471+
To find this type of bugs, optional ``EncodingWarning`` is added.
472+
It is emitted when :data:`sys.flags.warn_default_encoding <sys.flags>`
473+
is true and locale-specific default encoding is used.
474+
475+
``-X warn_default_encoding`` option and :envvar:`PYTHONWARNDEFAULTENCODING`
476+
are added to enable the warning.
477+
478+
See :ref:`io-text-encoding` for more information.
479+
480+
457481
New Features Related to Type Annotations
458482
========================================
459483

Include/cpython/initconfig.h

+1
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ typedef struct PyConfig {
153153
PyWideStringList warnoptions;
154154
int site_import;
155155
int bytes_warning;
156+
int warn_default_encoding;
156157
int inspect;
157158
int interactive;
158159
int optimization_level;

Include/internal/pycore_initconfig.h

+1
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ typedef struct {
102102
int isolated; /* -I option */
103103
int use_environment; /* -E option */
104104
int dev_mode; /* -X dev and PYTHONDEVMODE */
105+
int warn_default_encoding; /* -X warn_default_encoding and PYTHONWARNDEFAULTENCODING */
105106
} _PyPreCmdline;
106107

107108
#define _PyPreCmdline_INIT \

Include/pyerrors.h

+1
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,7 @@ PyAPI_DATA(PyObject *) PyExc_FutureWarning;
146146
PyAPI_DATA(PyObject *) PyExc_ImportWarning;
147147
PyAPI_DATA(PyObject *) PyExc_UnicodeWarning;
148148
PyAPI_DATA(PyObject *) PyExc_BytesWarning;
149+
PyAPI_DATA(PyObject *) PyExc_EncodingWarning;
149150
PyAPI_DATA(PyObject *) PyExc_ResourceWarning;
150151

151152

Lib/_pyio.py

+37-10
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,29 @@
4040
_CHECK_ERRORS = _IOBASE_EMITS_UNRAISABLE
4141

4242

43+
def text_encoding(encoding, stacklevel=2):
44+
"""
45+
A helper function to choose the text encoding.
46+
47+
When encoding is not None, just return it.
48+
Otherwise, return the default text encoding (i.e. "locale").
49+
50+
This function emits an EncodingWarning if *encoding* is None and
51+
sys.flags.warn_default_encoding is true.
52+
53+
This can be used in APIs with an encoding=None parameter
54+
that pass it to TextIOWrapper or open.
55+
However, please consider using encoding="utf-8" for new APIs.
56+
"""
57+
if encoding is None:
58+
encoding = "locale"
59+
if sys.flags.warn_default_encoding:
60+
import warnings
61+
warnings.warn("'encoding' argument not specified.",
62+
EncodingWarning, stacklevel + 1)
63+
return encoding
64+
65+
4366
def open(file, mode="r", buffering=-1, encoding=None, errors=None,
4467
newline=None, closefd=True, opener=None):
4568

@@ -248,6 +271,7 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
248271
result = buffer
249272
if binary:
250273
return result
274+
encoding = text_encoding(encoding)
251275
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
252276
result = text
253277
text.mode = mode
@@ -2004,19 +2028,22 @@ class TextIOWrapper(TextIOBase):
20042028
def __init__(self, buffer, encoding=None, errors=None, newline=None,
20052029
line_buffering=False, write_through=False):
20062030
self._check_newline(newline)
2007-
if encoding is None:
2031+
encoding = text_encoding(encoding)
2032+
2033+
if encoding == "locale":
20082034
try:
2009-
encoding = os.device_encoding(buffer.fileno())
2035+
encoding = os.device_encoding(buffer.fileno()) or "locale"
20102036
except (AttributeError, UnsupportedOperation):
20112037
pass
2012-
if encoding is None:
2013-
try:
2014-
import locale
2015-
except ImportError:
2016-
# Importing locale may fail if Python is being built
2017-
encoding = "ascii"
2018-
else:
2019-
encoding = locale.getpreferredencoding(False)
2038+
2039+
if encoding == "locale":
2040+
try:
2041+
import locale
2042+
except ImportError:
2043+
# Importing locale may fail if Python is being built
2044+
encoding = "utf-8"
2045+
else:
2046+
encoding = locale.getpreferredencoding(False)
20202047

20212048
if not isinstance(encoding, str):
20222049
raise ValueError("invalid encoding: %r" % encoding)

Lib/bz2.py

+1
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,7 @@ def open(filename, mode="rb", compresslevel=9,
311311
binary_file = BZ2File(filename, bz_mode, compresslevel=compresslevel)
312312

313313
if "t" in mode:
314+
encoding = io.text_encoding(encoding)
314315
return io.TextIOWrapper(binary_file, encoding, errors, newline)
315316
else:
316317
return binary_file

Lib/configparser.py

+1
Original file line numberDiff line numberDiff line change
@@ -690,6 +690,7 @@ def read(self, filenames, encoding=None):
690690
"""
691691
if isinstance(filenames, (str, bytes, os.PathLike)):
692692
filenames = [filenames]
693+
encoding = io.text_encoding(encoding)
693694
read_ok = []
694695
for filename in filenames:
695696
try:

Lib/gzip.py

+1
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ def open(filename, mode="rb", compresslevel=_COMPRESS_LEVEL_BEST,
6262
raise TypeError("filename must be a str or bytes object, or a file")
6363

6464
if "t" in mode:
65+
encoding = io.text_encoding(encoding)
6566
return io.TextIOWrapper(binary_file, encoding, errors, newline)
6667
else:
6768
return binary_file

Lib/io.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
from _io import (DEFAULT_BUFFER_SIZE, BlockingIOError, UnsupportedOperation,
5555
open, open_code, FileIO, BytesIO, StringIO, BufferedReader,
5656
BufferedWriter, BufferedRWPair, BufferedRandom,
57-
IncrementalNewlineDecoder, TextIOWrapper)
57+
IncrementalNewlineDecoder, text_encoding, TextIOWrapper)
5858

5959
OpenWrapper = _io.open # for compatibility with _pyio
6060

Lib/lzma.py

+1
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,7 @@ def open(filename, mode="rb", *,
302302
preset=preset, filters=filters)
303303

304304
if "t" in mode:
305+
encoding = io.text_encoding(encoding)
305306
return io.TextIOWrapper(binary_file, encoding, errors, newline)
306307
else:
307308
return binary_file

Lib/pathlib.py

+4
Original file line numberDiff line numberDiff line change
@@ -1241,6 +1241,8 @@ def open(self, mode='r', buffering=-1, encoding=None,
12411241
Open the file pointed by this path and return a file object, as
12421242
the built-in open() function does.
12431243
"""
1244+
if "b" not in mode:
1245+
encoding = io.text_encoding(encoding)
12441246
return io.open(self, mode, buffering, encoding, errors, newline,
12451247
opener=self._opener)
12461248

@@ -1255,6 +1257,7 @@ def read_text(self, encoding=None, errors=None):
12551257
"""
12561258
Open the file in text mode, read it, and close the file.
12571259
"""
1260+
encoding = io.text_encoding(encoding)
12581261
with self.open(mode='r', encoding=encoding, errors=errors) as f:
12591262
return f.read()
12601263

@@ -1274,6 +1277,7 @@ def write_text(self, data, encoding=None, errors=None, newline=None):
12741277
if not isinstance(data, str):
12751278
raise TypeError('data must be str, not %s' %
12761279
data.__class__.__name__)
1280+
encoding = io.text_encoding(encoding)
12771281
with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
12781282
return f.write(data)
12791283

Lib/site.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,9 @@ def addpackage(sitedir, name, known_paths):
170170
fullname = os.path.join(sitedir, name)
171171
_trace(f"Processing .pth file: {fullname!r}")
172172
try:
173-
f = io.TextIOWrapper(io.open_code(fullname))
173+
# locale encoding is not ideal especially on Windows. But we have used
174+
# it for a long time. setuptools uses the locale encoding too.
175+
f = io.TextIOWrapper(io.open_code(fullname), encoding="locale")
174176
except OSError:
175177
return
176178
with f:

0 commit comments

Comments
 (0)