Skip to content

Email parser preserves leading whitespace at beginning of wrapped header value #109252

@fsc-eriker

Description

@fsc-eriker

Bug report

Bug description:

The email library preserves whitespace when a line is folded.

from email import message_from_string
from email.policy import default

message = message_from_string("Message-id:\r\n\t<[email protected]>\r\n\r\n")
assert message['message-id'] == '<[email protected]>', \
    f"Expected '<[email protected]>' but got {message['message-id']!r}"

The failure message says

Traceback (most recent call last):
  File "/Users/myself/work/email-cpython-fork/repronnnn.py", line 5, in <module>
    assert message['message-id'] == '<[email protected]>', \
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Expected '<[email protected]>' but got '\r\n\t<[email protected]>'

To quote RFC5322 section 3.2.2, 'any CRLF that appears in FWS is semantically "invisible"' (where "FWS" means "folding whitespace"). Later on, the section concludes, ' Runs of FWS, comment, or CFWS that occur between lexical tokens in a structured header field are semantically interpreted as a single space character' (where CFWS means runs of parenthesized comments and/or FWS).

(The Message-Id header is a structured field. Elsewhere in the RFC, the Subject: header, which is unstructured, is used in an example to illustrate this mechanism, though I'd still have to verify whether there is a gap in the RFC when it comes to specifying the semantics of non-CRLF FWS in unstructured header values.)

CPython versions tested on:

3.9, 3.11, CPython main branch

Operating systems tested on:

macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-emailtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions