-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
Enhance HTTPResponse.readline() performance #63209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some applications require reading http response data in "long" polls as it becomes available. This is used, e.g. to receive "notifications" over a HTTP stream. |
This sounds ok on the principle. I suppose one can't simply wrap the "fp" inside a BufferedReader? |
The problem is that self.fp is already a Buffered stream, and such streams are documented to have their read() and readinto() calls make multiple system calls to fullfill the request. My original goal was actually to make response.read(amt) not try to make multiple read() calls, so that one could have other delimiters than newline. It is simple for the chunked case, but I don't know how to bypass it for response.fp, since it is already a buffered file that has no guaranteed such behaviour.... wait, one can simply call peek() on it and know how much one can get :) But anyway, the use case I have actually uses newlines as record separators in the http stream, so this enhancement seems less intrusive. I'll add unit tests. There ought to be ready-made HTTPResponse tests already that I can use as templates. |
Or you can use read1()? |
See also bpo-13541? |
Intersting. I didn't know about that. My excuse is that I never use 3.x except when I'm porting some CCP enhancements for cpython. Here's a thought: HTTPResponse inherits from RawIOBase. Only the BufferedIO classes have read1() and are documented to have more than one system calls on their read() methods. Yet, HTTPResponse will happily behave in that way. |
I should add, I fully support the use case that response.read(amt=None) needs to read to the end of the response. It is only the read(amt=bufsize) use case I'm thinking of, and that could be handled with a read1() method. |
Ok, I'm adding a more comprehensive patch. It adds support for The real work is done by the _read1_or_peek_chunked() method. the code for both cases is almost identical so a dual purpose method is warranted. Added test cases to test this. The test cases don't attempt to test that we don't block on partial resopnses, though. I'm not sure how I would write such a test. I'd probably have to extend FakeSocket to work on some stream that can be appended to... |
After adding read1() and peek() what stop us from inheriting HTTPResponse from BufferedIOBase? I suggest split _read1_or_peek_chunked() by two parts. First part calculates n bounded by chunk_left (it can read the next chunk size and close a connection if needed). Perhaps it can be reused in _readall_chunked() and _readinto_chunked(() too. Second part which is controlled by boolean parameter should be moved out in read1() and peek(). |
Ok, I have refactored this a bit. I'm not sure what you mean by inheriting from the buffered class. Do we gain anything by doing that, would it change the code? or would it merely be for the benefit of ABC? Note that the chunk protocol was wrong and I fixed the unittests: The final chunk is a _valid_ zero length chunk, i.e. 0\r\n\r\n. It contains two eol tokens like other chunks, one at the end of the length, the other at the end of the(null) payload. |
Good. It is even better than I expected.
Yes, it for the benefit of ABC. Some code tests isinstance(file, io.BufferedIOBase) and wraps stream in BufferedReader if it is false.
How new code processes invalid final chunk 0\r\n? We need test to check that there is no degradation. Perhaps it relates to behavior change which I mention in the comment on Rietveld. |
Ok, I can change the base class inheritance and see if it changes things. Note for the different interpretation of the final chunk: |
I think what Serhiy is proposing is that we also succeed when the |
Ok, I can make it resilient. I was just pointing out that resilience in the face of RFC violations can be a bad thing. E.g. Internet explorer and how they allowed internet servers of the world to be lax in how they served their data. |
I'm afraid the ship has sailed a long time ago on that. Poor HTTP
Well, the unit tests already exist, which is why I think we shouldn't |
Very well, let's support both then :) |
Ok, Here is a new patch. I was wrong about the final chunk, it is in fact 0\r\n. A final \r\n is then added to signal the end of the optional trailers. Added a bunch of tests for chunked encoding, plus tests to verify http synchronization for keepalive connections. |
Kristján, did you notice my comments on Rietveld? |
Ah no actually. Thanks . I'll respond soon. |
A new patch set, in response to Serhiy's comments on Rietveld. |
LGTM. |
New changeset 49017c391564 by Kristján Valur Jónsson in branch 'default': |
If this is committed, should the issue be closed? |
Sure. If there are issues we'll just reopen. Closing. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: