-
-
Notifications
You must be signed in to change notification settings - Fork 32.9k
Description
Bug report
Bug description:
The email
library preserves whitespace when a line is folded.
from email import message_from_string
from email.policy import default
message = message_from_string("Message-id:\r\n\t<[email protected]>\r\n\r\n")
assert message['message-id'] == '<[email protected]>', \
f"Expected '<[email protected]>' but got {message['message-id']!r}"
The failure message says
Traceback (most recent call last):
File "/Users/myself/work/email-cpython-fork/repronnnn.py", line 5, in <module>
assert message['message-id'] == '<[email protected]>', \
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Expected '<[email protected]>' but got '\r\n\t<[email protected]>'
To quote RFC5322 section 3.2.2, 'any CRLF that appears in FWS is semantically "invisible"' (where "FWS" means "folding whitespace"). Later on, the section concludes, ' Runs of FWS, comment, or CFWS that occur between lexical tokens in a structured header field are semantically interpreted as a single space character' (where CFWS means runs of parenthesized comments and/or FWS).
(The Message-Id header is a structured field. Elsewhere in the RFC, the Subject: header, which is unstructured, is used in an example to illustrate this mechanism, though I'd still have to verify whether there is a gap in the RFC when it comes to specifying the semantics of non-CRLF FWS in unstructured header values.)
CPython versions tested on:
3.9, 3.11, CPython main branch
Operating systems tested on:
macOS