Skip to content

mboxMessage.get_payload throws TypeError on malformed content type #80361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
enrico mannequin opened this issue Mar 4, 2019 · 2 comments
Closed

mboxMessage.get_payload throws TypeError on malformed content type #80361

enrico mannequin opened this issue Mar 4, 2019 · 2 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@enrico
Copy link
Mannequin

enrico mannequin commented Mar 4, 2019

BPO 36180
Nosy @warsaw, @bitdancer, @mapreri, @tirkarthi
Files
  • broken.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-03-04.11:03:37.379>
    labels = ['3.8', 'type-bug', '3.7', 'expert-email']
    title = 'mboxMessage.get_payload throws TypeError on malformed content type'
    updated_at = <Date 2019-04-14.06:40:20.881>
    user = 'https://bugs.python.org/enrico'

    bugs.python.org fields:

    activity = <Date 2019-04-14.06:40:20.881>
    actor = 'xtreak'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['email']
    creation = <Date 2019-03-04.11:03:37.379>
    creator = 'enrico'
    dependencies = []
    files = ['48184']
    hgrepos = []
    issue_num = 36180
    keywords = []
    message_count = 2.0
    messages = ['337091', '340187']
    nosy_count = 5.0
    nosy_names = ['barry', 'r.david.murray', 'enrico', 'mapreri', 'xtreak']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'test needed'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue36180'
    versions = ['Python 3.7', 'Python 3.8']

    Linked PRs

    @enrico
    Copy link
    Mannequin Author

    enrico mannequin commented Mar 4, 2019

    This simple code:

    import mailbox
    
    mbox = mailbox.mbox("broken.mbox")
    for msg in mbox:
        msg.get_payload()
    

    Fails rather unexpectedly:

    $ python3 broken.py 
    Traceback (most recent call last):
      File "broken.py", line 5, in <module>
        msg.get_payload()
      File "/usr/lib/python3.7/email/message.py", line 267, in get_payload
        payload = bpayload.decode(self.get_param('charset', 'ascii'), 'replace')
    TypeError: decode() argument 1 must be str, not tuple
    

    (I'm attaching a zip with code and mailbox)

    I would have expected either that the part past text/plain is ignored if it doesn't make sense, or that content-type is completely ignored.

    I have to process a large mailbox archive, and this is currently how I had to work around this issue, and it's causing me to have to skip email content which would otherwise be reasonably accessible:

    https://salsa.debian.org/nm-team/echelon/commit/617ce935a31f6256257ffb24e11a5666306406c3

    @SilentGhost SilentGhost mannequin added topic-email 3.7 (EOL) end of life 3.8 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Mar 4, 2019
    @tirkarthi
    Copy link
    Member

    tirkarthi commented Apr 14, 2019

    A simplified reproducer as below. The tuple is returned from here

    def _unquotevalue(value):
    and perhaps is an untested code path? The charset gets a tuple value of ('utf-8��', '', '"utf-8Â\xa0"') .

    import mailbox
    import tempfile
    
    broken_message = """
    From [[email protected]](mailto:[email protected]) Wed Sep 24 01:22:15 2003
    Date: Wed, 24 Sep 2003 07:05:50 +0200
    From: Test test <[[email protected]](mailto:[email protected])>
    To: [[email protected]](mailto:[email protected])
    Subject: Re: Test
    Mime-Version: 1.0
    Content-Type: text/plain; charset*=utf-8†''utf-8%C2%A0
    
    trés intéressé
    """
    
    with tempfile.NamedTemporaryFile() as f:
        f.write(broken_message.encode())
        f.seek(0)
        msg = mailbox.mbox(f.name)
        for m in msg:
            print(m.get_payload())
    $ ../cpython/python.exe bpo36180.py
    Traceback (most recent call last):
      File "bpo36180.py", line 21, in <module>
        print(m.get_payload())
      File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/email/message.py", line 267, in get_payload
        payload = bpayload.decode(self.get_param('charset', 'ascii'), 'replace')
    TypeError: decode() argument 1 must be str, not tuple
    sys:1: ResourceWarning: unclosed file <_io.BufferedRandom name='/var/folders/2b/mhgtnnpx4z943t4cc9yvw4qw0000gn/T/tmp4ddavb6g'>

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 23, 2023
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Apr 17, 2024
    It was raised when the charset is rfc2231 encoded, e.g.:
    
       Content-Type: text/plain; charset*=ansi-x3.4-1968''utf-8
    serhiy-storchaka added a commit that referenced this issue Apr 17, 2024
    It was raised when the charset is rfc2231 encoded, e.g.:
    
       Content-Type: text/plain; charset*=ansi-x3.4-1968''utf-8
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Apr 17, 2024
    …H-117994)
    
    It was raised when the charset is rfc2231 encoded, e.g.:
    
       Content-Type: text/plain; charset*=ansi-x3.4-1968''utf-8
    (cherry picked from commit deaecb8)
    
    Co-authored-by: Serhiy Storchaka <[email protected]>
    serhiy-storchaka added a commit that referenced this issue Apr 17, 2024
    ) (GH-117998)
    
    It was raised when the charset is rfc2231 encoded, e.g.:
    
       Content-Type: text/plain; charset*=ansi-x3.4-1968''utf-8
    (cherry picked from commit deaecb8)
    
    Co-authored-by: Serhiy Storchaka <[email protected]>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants