Skip to content

Commit 1c4f01a

Browse files
[3.14] gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900) (#135548)
gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900) * gh-67022: Document bytes/str inconsistency in email.header.decode_header() This function's possible return types have been surprising and error-prone for the entirety of its Python 3.x history. It can return either: 1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1 2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1 This means that any user of this function must be prepared to accept either `bytes` or `str` for the first member of the 2-tuples it returns, which is a very surprising behavior in Python 3.x, particularly given that the second member of the tuple is supposed to represent the charset/encoding of the first member. This patch documents the behavior of this function, and adds test cases to demonstrate it. As discussed in bpo-22833, this cannot be changed in a backwards-compatible way, and some users of this function depend precisely on the existing behavior. Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions. Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested in #92900 (comment) (cherry picked from commit 60181f4) Co-authored-by: Dan Lenski <[email protected]>
1 parent 7fd8857 commit 1c4f01a

File tree

3 files changed

+54
-9
lines changed

3 files changed

+54
-9
lines changed

Doc/library/email.header.rst

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -178,16 +178,36 @@ The :mod:`email.header` module also provides the following convenient functions.
178178
Decode a message header value without converting the character set. The header
179179
value is in *header*.
180180

181-
This function returns a list of ``(decoded_string, charset)`` pairs containing
182-
each of the decoded parts of the header. *charset* is ``None`` for non-encoded
183-
parts of the header, otherwise a lower case string containing the name of the
184-
character set specified in the encoded string.
181+
For historical reasons, this function may return either:
185182

186-
Here's an example::
183+
1. A list of pairs containing each of the decoded parts of the header,
184+
``(decoded_bytes, charset)``, where *decoded_bytes* is always an instance of
185+
:class:`bytes`, and *charset* is either:
186+
187+
- A lower case string containing the name of the character set specified.
188+
189+
- ``None`` for non-encoded parts of the header.
190+
191+
2. A list of length 1 containing a pair ``(string, None)``, where
192+
*string* is always an instance of :class:`str`.
193+
194+
An :exc:`email.errors.HeaderParseError` may be raised when certain decoding
195+
errors occur (e.g. a base64 decoding exception).
196+
197+
Here are examples:
187198

188199
>>> from email.header import decode_header
189200
>>> decode_header('=?iso-8859-1?q?p=F6stal?=')
190201
[(b'p\xf6stal', 'iso-8859-1')]
202+
>>> decode_header('unencoded_string')
203+
[('unencoded_string', None)]
204+
>>> decode_header('bar =?utf-8?B?ZsOzbw==?=')
205+
[(b'bar ', None), (b'f\xc3\xb3o', 'utf-8')]
206+
207+
.. note::
208+
209+
This function exists for for backwards compatibility only. For
210+
new code, we recommend using :class:`email.headerregistry.HeaderRegistry`.
191211

192212

193213
.. function:: make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ')
@@ -203,3 +223,7 @@ The :mod:`email.header` module also provides the following convenient functions.
203223
:class:`Header` instance. Optional *maxlinelen*, *header_name*, and
204224
*continuation_ws* are as in the :class:`Header` constructor.
205225

226+
.. note::
227+
228+
This function exists for for backwards compatibility only, and is
229+
not recommended for use in new code.

Lib/email/header.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,16 +59,22 @@
5959
def decode_header(header):
6060
"""Decode a message header value without converting charset.
6161
62-
Returns a list of (string, charset) pairs containing each of the decoded
63-
parts of the header. Charset is None for non-encoded parts of the header,
64-
otherwise a lower-case string containing the name of the character set
65-
specified in the encoded string.
62+
For historical reasons, this function may return either:
63+
64+
1. A list of length 1 containing a pair (str, None).
65+
2. A list of (bytes, charset) pairs containing each of the decoded
66+
parts of the header. Charset is None for non-encoded parts of the header,
67+
otherwise a lower-case string containing the name of the character set
68+
specified in the encoded string.
6669
6770
header may be a string that may or may not contain RFC2047 encoded words,
6871
or it may be a Header object.
6972
7073
An email.errors.HeaderParseError may be raised when certain decoding error
7174
occurs (e.g. a base64 decoding exception).
75+
76+
This function exists for backwards compatibility only. For new code, we
77+
recommend using email.headerregistry.HeaderRegistry instead.
7278
"""
7379
# If it is a Header object, we can just return the encoded chunks.
7480
if hasattr(header, '_chunks'):
@@ -161,6 +167,9 @@ def make_header(decoded_seq, maxlinelen=None, header_name=None,
161167
This function takes one of those sequence of pairs and returns a Header
162168
instance. Optional maxlinelen, header_name, and continuation_ws are as in
163169
the Header constructor.
170+
171+
This function exists for backwards compatibility only, and is not
172+
recommended for use in new code.
164173
"""
165174
h = Header(maxlinelen=maxlinelen, header_name=header_name,
166175
continuation_ws=continuation_ws)

Lib/test/test_email/test_email.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2568,6 +2568,18 @@ def test_multiline_header(self):
25682568
self.assertEqual(str(make_header(decode_header(s))),
25692569
'"Müller T" <[email protected]>')
25702570

2571+
def test_unencoded_ascii(self):
2572+
# bpo-22833/gh-67022: returns [(str, None)] rather than [(bytes, None)]
2573+
s = 'header without encoded words'
2574+
self.assertEqual(decode_header(s),
2575+
[('header without encoded words', None)])
2576+
2577+
def test_unencoded_utf8(self):
2578+
# bpo-22833/gh-67022: returns [(str, None)] rather than [(bytes, None)]
2579+
s = 'header with unexpected non ASCII caract\xe8res'
2580+
self.assertEqual(decode_header(s),
2581+
[('header with unexpected non ASCII caract\xe8res', None)])
2582+
25712583

25722584
# Test the MIMEMessage class
25732585
class TestMIMEMessage(TestEmailBase):

0 commit comments

Comments
 (0)