Skip to content

read1 and readline of http.client.HTTPResponse do not raise IncompleteRead #115997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
illia-v opened this issue Feb 27, 2024 · 6 comments
Open
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@illia-v
Copy link
Contributor

illia-v commented Feb 27, 2024

Bug report

Bug description:

Unlike http.client.HTTPResponse.read, read1 and readline do not raise IncompleteRead when the content length is know and a connection is closed before everything has been read.

FYI, these two methods have had a common issue in the past #113199.

CPython versions tested on:

3.8, 3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux, macOS, Windows

Linked PRs

@illia-v illia-v added the type-bug An unexpected behavior, bug, or error label Feb 27, 2024
illia-v added a commit to illia-v/cpython that referenced this issue Feb 27, 2024
illia-v added a commit to illia-v/cpython that referenced this issue Feb 27, 2024
illia-v added a commit to illia-v/cpython that referenced this issue Mar 27, 2024
illia-v added a commit to illia-v/cpython that referenced this issue Jun 6, 2024
@serhiy-storchaka
Copy link
Member

I am not sure that they should raise IncompleteRead.

AFAIK, only read() without argument raises IncompleteRead, because it needs to read the whole content. read1() and readline() without argument have different semantic, they do not read to the end of the stream. readline() reads to the first end of line, and read1() returns the amount of data depending on buffering. Raising IncompleteRead with second argument equal to self.length would be incorrect, because we cannot guarantee that read1() and readline() would return the specified number of bytes if the connection was not closed.

@vadmium
Copy link
Member

vadmium commented Jan 30, 2025

Illia described the proposal as “Make [read1 and readline] raise IncompleteRead instead of returning zero bytes if a connection is closed before an expected number of bytes has been read.”

I’m in favour of raising an exception. If the transport is shut down before the end of a non-chunked HTTP/1.1 message with Content-Length specified, it looks like read1 and readline will eventually return an empty byte string, which normally indicates EOF or end-of-stream. Since the HTTP protocol has broken down and the end of the HTTP message has not been received, it is better to raise an exception.

There are at least two more cases that should raise an exception. (There may be others; I haven’t looked at every case.)

  • readline where an incomplete line is returned (zero or more bytes but not ending in a newline), and the HTTP transport has shut down before Content-Length bytes received. Readline should only return an incomplete line if it has read to the true end of the HTTP message (or it has reached the size limit passed to the method).
  • read(amt) for a specified number of bytes. It should only return a short result at the true end of the message.

I believe all these cases already raise an exception when reading a chunked message, so it is inconsistent that an exception cannot also be relied on to validate a non-chunked response. Especially annoying since I presume HTTPSConnection is still vulnerable to TLS truncation attack (#72002).

@electricworry
Copy link

I'd like to chime in with my two pence. I was alerted to this issue by @vadmium yesterday when I was troubleshooting issue #129264.

What brought me here was Ansible ansible.builtin.get_url tasks failing to receive the whole file when the source file is served over HTTPS. Because Ansible is using urllib, truncated file transfers do not cause an exception and Ansible assumes that the task has succeeded when it should fail. I made a quick script to confirm the problem occurs when using urllib + shutil (which is what Ansible uses).

Ultimately I found that I've got problems with my ISP; some sort of connection optimisation platform is sitting between me and the server and is closing connections before all of the data has been transmitted. Ultimately I need to work with the ISP to get these issues resolved (not convinced that I'll have much luck there!) but I do fundamentally hold that a truncated HTTP response should not be taken as good and should throw some sort of exception.

@illia-v
Copy link
Contributor Author

illia-v commented Jan 31, 2025

Thank you for the feedback!

The case with readline returning incomplete lines does look like something that should be included in the PR


  • read(amt) for a specified number of bytes. It should only return a short result at the true end of the message.

Somebody considered that this can break compatibility, I wonder if such a change is acceptable for a feature release

cpython/Lib/http/client.py

Lines 480 to 483 in d89a5f6

if not s and amt:
# Ideally, we would raise IncompleteRead if the content-length
# wasn't satisfied, but it might break compatibility.
self._close_conn()

@picnixz picnixz added the stdlib Python modules in the Lib dir label Jan 31, 2025
@electricworry
Copy link

FYI, I gave this PR a test, and it doesn't help in the code path Ansible takes to get a truncated read that doesn't result in IncompleteRead.

Having never submitted an Issue or PR to Python before, what would you suggest I do here? Should I propose a further change to this PR or should I submit my own? (I'm not sure of the efficacy of either since this PR is nearly a year old.)

@H2CO3
Copy link

H2CO3 commented Feb 9, 2025

I have experienced a related issue today, similar to #129264, which is now unfortunately closed. I too am trying to download a file through HTTP POST, by doing shutil.copyfileobj(urlstream, filestream). When I do this, the last 542 bytes (out of approx. 1.5 MB) of the file are lost.

If I instead do filestream.write(urlstream.read()), the problem goes away.

I can reliably reproduce this error on both my home network (optical until Wi-Fi AP) and broadband mobile connection (5G, tethered via my cell phone), corresponding to different ISPs/operators.

Tested with Python 3.12.8 and 3.13.0
OS: macOS Sequoia (15.2, arm64).

Minimal reproducer below:

import json
import shutil

from urllib.request import Request, urlopen
from tempfile import NamedTemporaryFile
from zipfile import ZipFile

USE_COPYFILEOBJ_BUG = True # set to False to see the difference

headers = {
    'Accept': 'application/zip',
    'Content-Type': 'application/json; charset=UTF-8',
}
body = {
  "accessions": ["GCF_000010285.1", "GCF_007677595.1"],
  "include_annotation_type": ["GENOME_FASTA"]
}
data = json.dumps(body, indent=2).encode("UTF-8")
req = Request("https://api.ncbi.nlm.nih.gov/datasets/v2/genome/download", headers=headers, data=data)

with NamedTemporaryFile() as tempfile, urlopen(req) as urlstream, open(tempfile.name, "wb") as filestream:
    if USE_COPYFILEOBJ_BUG:
        shutil.copyfileobj(urlstream, filestream)
    else:
        filestream.write(urlstream.read())

    with open(tempfile.name, 'rb') as tempstream:
        print(f'{len(tempstream.read())=}') # prints either 1572864 (wrong) or 1573406 (correct)

    with ZipFile(tempfile.name) as zipfile: # this raises when using copyfileobj
        pass

illia-v added a commit to illia-v/cpython that referenced this issue Feb 10, 2025
illia-v added a commit to illia-v/cpython that referenced this issue Mar 23, 2025
illia-v added a commit to illia-v/cpython that referenced this issue Apr 1, 2025
illia-v added a commit to illia-v/cpython that referenced this issue Apr 10, 2025
illia-v added a commit to illia-v/cpython that referenced this issue Apr 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants