-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Cache final file from resume retry process #13441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache final file from resume retry process #13441
Conversation
d81fb05
to
17f3703
Compare
Ah, this doesn't look that bad. Thanks for working on this! The main limitation with this approach is that we load the entire file in memory before writing it to the cache. This used to be an annoying issue (I would link examples, but I'm on a phone), but a few years ago, the caching implementation was rewritten to stream the response body. It'd be preferable to avoid reintroducing the memory usage issue, which means the synthetic response needs to be a streaming response. I can take a look at retooling this in a few days if you don't get to it. |
I'll take another look, my initial naive approach didn't work, which is why I rewrote it to read the whole file at once. But if I don't respond with anything in the next couple of days feel free to take a go. |
5076580
to
ba1ee52
Compare
Okay, I added streaming support by extending the functionality of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking quite good! Thank you so much for working on this, especially with such a quick turn-around!
I have some suggestions to harden the implementation among other changes.
tests/unit/test_network_download.py
Outdated
# Mock the session's adapters to have a cache controller | ||
mock_adapter = MagicMock(spec=CacheControlAdapter) | ||
mock_controller = MagicMock() | ||
mock_adapter.controller = mock_controller | ||
mock_controller.cache_url = MagicMock(return_value="cache_key") | ||
mock_controller.serializer = MagicMock() | ||
mock_controller.serializer.dumps = MagicMock(return_value=b"serialized_data") | ||
|
||
# Mock the cache to be a SafeFileCache | ||
mock_cache = MagicMock(spec=SafeFileCache) | ||
mock_adapter.cache = mock_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just not create a pip session with caching already enabled? PipSession(cache=str(tmpdir))
I get this is an unit test, but practically, I'd prefer if the we tested the actual implementation to avoid mismatches that hide issues down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm now using a PipSession
object to test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@ichard26 Thanks for your quick review! I ran some more verification scenarios locally (such as starting with no internet connection and then losing it intermitently during the download), and I couldn't find any edge case it failed on. Going to merge it now rather than waiting any longer. |
Fixes #13440
Not sure how elegant this fix is, I don't have a lot of expertise here, but it was fairly easy to put together.