`read1` and `readline` of `http.client.HTTPResponse` do not raise `IncompleteRead` #115997

illia-v · 2024-02-27T18:38:15Z

Bug report

Bug description:

Unlike http.client.HTTPResponse.read, read1 and readline do not raise IncompleteRead when the content length is know and a connection is closed before everything has been read.

FYI, these two methods have had a common issue in the past #113199.

CPython versions tested on:

3.8, 3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux, macOS, Windows

Linked PRs

gh-115997: Make HTTPResponse.read1 and readline raise IncompleteRead #115998

The text was updated successfully, but these errors were encountered:

…mpleteRead`

serhiy-storchaka · 2024-06-06T17:01:48Z

I am not sure that they should raise IncompleteRead.

AFAIK, only read() without argument raises IncompleteRead, because it needs to read the whole content. read1() and readline() without argument have different semantic, they do not read to the end of the stream. readline() reads to the first end of line, and read1() returns the amount of data depending on buffering. Raising IncompleteRead with second argument equal to self.length would be incorrect, because we cannot guarantee that read1() and readline() would return the specified number of bytes if the connection was not closed.

vadmium · 2025-01-30T23:23:29Z

Illia described the proposal as “Make [read1 and readline] raise IncompleteRead instead of returning zero bytes if a connection is closed before an expected number of bytes has been read.”

I’m in favour of raising an exception. If the transport is shut down before the end of a non-chunked HTTP/1.1 message with Content-Length specified, it looks like read1 and readline will eventually return an empty byte string, which normally indicates EOF or end-of-stream. Since the HTTP protocol has broken down and the end of the HTTP message has not been received, it is better to raise an exception.

There are at least two more cases that should raise an exception. (There may be others; I haven’t looked at every case.)

readline where an incomplete line is returned (zero or more bytes but not ending in a newline), and the HTTP transport has shut down before Content-Length bytes received. Readline should only return an incomplete line if it has read to the true end of the HTTP message (or it has reached the size limit passed to the method).
read(amt) for a specified number of bytes. It should only return a short result at the true end of the message.

I believe all these cases already raise an exception when reading a chunked message, so it is inconsistent that an exception cannot also be relied on to validate a non-chunked response. Especially annoying since I presume HTTPSConnection is still vulnerable to TLS truncation attack (#72002).

electricworry · 2025-01-31T18:18:47Z

I'd like to chime in with my two pence. I was alerted to this issue by @vadmium yesterday when I was troubleshooting issue #129264.

What brought me here was Ansible ansible.builtin.get_url tasks failing to receive the whole file when the source file is served over HTTPS. Because Ansible is using urllib, truncated file transfers do not cause an exception and Ansible assumes that the task has succeeded when it should fail. I made a quick script to confirm the problem occurs when using urllib + shutil (which is what Ansible uses).

Ultimately I found that I've got problems with my ISP; some sort of connection optimisation platform is sitting between me and the server and is closing connections before all of the data has been transmitted. Ultimately I need to work with the ISP to get these issues resolved (not convinced that I'll have much luck there!) but I do fundamentally hold that a truncated HTTP response should not be taken as good and should throw some sort of exception.

illia-v · 2025-01-31T19:35:48Z

Thank you for the feedback!

The case with readline returning incomplete lines does look like something that should be included in the PR

read(amt) for a specified number of bytes. It should only return a short result at the true end of the message.

Somebody considered that this can break compatibility, I wonder if such a change is acceptable for a feature release

cpython/Lib/http/client.py

Lines 480 to 483 in d89a5f6

    
           if not s and amt: 
        
               # Ideally, we would raise IncompleteRead if the content-length 
        
               # wasn't satisfied, but it might break compatibility. 
        
               self._close_conn()

electricworry · 2025-02-03T17:31:02Z

FYI, I gave this PR a test, and it doesn't help in the code path Ansible takes to get a truncated read that doesn't result in IncompleteRead.

Having never submitted an Issue or PR to Python before, what would you suggest I do here? Should I propose a further change to this PR or should I submit my own? (I'm not sure of the efficacy of either since this PR is nearly a year old.)

H2CO3 · 2025-02-09T13:12:27Z

I have experienced a related issue today, similar to #129264, which is now unfortunately closed. I too am trying to download a file through HTTP POST, by doing shutil.copyfileobj(urlstream, filestream). When I do this, the last 542 bytes (out of approx. 1.5 MB) of the file are lost.

If I instead do filestream.write(urlstream.read()), the problem goes away.

I can reliably reproduce this error on both my home network (optical until Wi-Fi AP) and broadband mobile connection (5G, tethered via my cell phone), corresponding to different ISPs/operators.

Tested with Python 3.12.8 and 3.13.0
OS: macOS Sequoia (15.2, arm64).

Minimal reproducer below:

import json
import shutil

from urllib.request import Request, urlopen
from tempfile import NamedTemporaryFile
from zipfile import ZipFile

USE_COPYFILEOBJ_BUG = True # set to False to see the difference

headers = {
    'Accept': 'application/zip',
    'Content-Type': 'application/json; charset=UTF-8',
}
body = {
  "accessions": ["GCF_000010285.1", "GCF_007677595.1"],
  "include_annotation_type": ["GENOME_FASTA"]
}
data = json.dumps(body, indent=2).encode("UTF-8")
req = Request("https://api.ncbi.nlm.nih.gov/datasets/v2/genome/download", headers=headers, data=data)

with NamedTemporaryFile() as tempfile, urlopen(req) as urlstream, open(tempfile.name, "wb") as filestream:
    if USE_COPYFILEOBJ_BUG:
        shutil.copyfileobj(urlstream, filestream)
    else:
        filestream.write(urlstream.read())

    with open(tempfile.name, 'rb') as tempstream:
        print(f'{len(tempstream.read())=}') # prints either 1572864 (wrong) or 1573406 (correct)

    with ZipFile(tempfile.name) as zipfile: # this raises when using copyfileobj
        pass

…115997

illia-v added the type-bug An unexpected behavior, bug, or error label Feb 27, 2024

illia-v added a commit to illia-v/cpython that referenced this issue Feb 27, 2024

pythongh-115997: Make HTTPResponse.read1 and readline raise `Inco…

53fe375

…mpleteRead`

illia-v added a commit to illia-v/cpython that referenced this issue Feb 27, 2024

pythongh-115997: Make HTTPResponse.read1 and readline raise `Inco…

9878730

…mpleteRead`

bedevere-app bot mentioned this issue Feb 27, 2024

gh-115997: Make HTTPResponse.read1 and readline raise IncompleteRead #115998

Open

illia-v added a commit to illia-v/cpython that referenced this issue Mar 27, 2024

Merge branch 'main' into pythongh-115997

d1cf02f

illia-v added a commit to illia-v/cpython that referenced this issue Jun 6, 2024

Merge branch 'main' into pythongh-115997

e87b120

vadmium mentioned this issue Jan 30, 2025

Files retrieved with urllib over https are truncated #129264

Closed

picnixz added the stdlib Python modules in the Lib dir label Jan 31, 2025

illia-v added a commit to illia-v/cpython that referenced this issue Feb 10, 2025

Merge branch 'main' into pythongh-115997

1e7b633

illia-v added a commit to illia-v/cpython that referenced this issue Feb 10, 2025

Merge remote-tracking branch 'illia-v/pythongh-115997' into pythongh-…

74f3f9f

…115997

illia-v added a commit to illia-v/cpython that referenced this issue Mar 23, 2025

Merge branch 'main' into pythongh-115997

646bb31

illia-v added a commit to illia-v/cpython that referenced this issue Apr 1, 2025

Merge branch 'main' into pythongh-115997

99c1aab

illia-v added a commit to illia-v/cpython that referenced this issue Apr 10, 2025

Merge branch 'main' into pythongh-115997

0920750

illia-v added a commit to illia-v/cpython that referenced this issue Apr 19, 2025

Merge branch 'main' into pythongh-115997

f0fbf68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`read1` and `readline` of `http.client.HTTPResponse` do not raise `IncompleteRead` #115997

`read1` and `readline` of `http.client.HTTPResponse` do not raise `IncompleteRead` #115997

illia-v commented Feb 27, 2024 •

edited by bedevere-app bot

Loading

serhiy-storchaka commented Jun 6, 2024

Uh oh!

vadmium commented Jan 30, 2025

Uh oh!

electricworry commented Jan 31, 2025

Uh oh!

illia-v commented Jan 31, 2025

Uh oh!

electricworry commented Feb 3, 2025

Uh oh!

H2CO3 commented Feb 9, 2025

Uh oh!

Uh oh!

read1 and readline of http.client.HTTPResponse do not raise IncompleteRead #115997

read1 and readline of http.client.HTTPResponse do not raise IncompleteRead #115997

Comments

illia-v commented Feb 27, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

serhiy-storchaka commented Jun 6, 2024

Uh oh!

vadmium commented Jan 30, 2025

Uh oh!

electricworry commented Jan 31, 2025

Uh oh!

illia-v commented Jan 31, 2025

Uh oh!

electricworry commented Feb 3, 2025

Uh oh!

H2CO3 commented Feb 9, 2025

Uh oh!

`read1` and `readline` of `http.client.HTTPResponse` do not raise `IncompleteRead` #115997

`read1` and `readline` of `http.client.HTTPResponse` do not raise `IncompleteRead` #115997

illia-v commented Feb 27, 2024 •

edited by bedevere-app bot

Loading