-
Notifications
You must be signed in to change notification settings - Fork 73
Do more robust handling of download failures. #631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2501ca4
d9656f0
b5a1e3b
a812e4c
72ca9cc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,7 +7,11 @@ | |
| import pytest | ||
| from vcr_config import my_vcr | ||
|
|
||
| from ricecooker.utils.pipeline.context import FileMetadata | ||
| from ricecooker.utils.pipeline.exceptions import InvalidFileException | ||
| from ricecooker.utils.pipeline.file_handler import FileHandler | ||
| from ricecooker.utils.pipeline.transfer import DiskResourceHandler | ||
| from ricecooker.utils.pipeline.transfer import DownloadStageHandler | ||
| from ricecooker.utils.pipeline.transfer import ( | ||
| get_filename_from_content_disposition_header, | ||
| ) | ||
|
|
@@ -291,3 +295,36 @@ def test_disk_transfer_non_file_protocol(): | |
| "os.path.exists", return_value=False | ||
| ): # Ensure it doesn't try to check a web URL | ||
| assert not handler.should_handle(path), "Handler should not handle HTTP URLs" | ||
|
|
||
|
|
||
| class DummyPassthroughHandler(FileHandler): | ||
| """A dummy handler that passes through the original path without transferring to storage. | ||
|
|
||
| This simulates the bug where a download handler fails to actually download/transfer | ||
| the file but returns the original URL as the path. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this means. Previously, the bug was what the file wasn't getting transferred/downloaded actually, and the bug was that it seemed like it was, because there was a return value that was a path? but that the path was not the local path download location, but rather the original URL?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that was what was actually happening for me - but because of the way that the handlers work, it is possible that this would cause an issue - basically, this just makes sure that download handlers always return a local file path in storage after they have finished, because otherwise they could cause issues for every other handler. |
||
| """ | ||
|
|
||
| def should_handle(self, path: str) -> bool: | ||
| return path.startswith("http://dummy-test-url.com") | ||
|
|
||
| def handle_file(self, path, **kwargs): | ||
| # Intentionally don't use write_file context manager | ||
| # This simulates a handler that fails to transfer the file to storage | ||
| return FileMetadata(original_filename="test.txt") | ||
|
|
||
|
|
||
| def test_download_stage_handler_catches_failed_transfer(): | ||
| """Test that DownloadStageHandler catches when files aren't transferred to storage. | ||
|
|
||
| This is a regression test for the issue where download handlers would sometimes | ||
| log "saved to [original URL]" instead of the actual storage path, indicating | ||
| that the file wasn't actually transferred to storage. | ||
| """ | ||
| # Create a DownloadStageHandler with our dummy passthrough handler | ||
| download_handler = DownloadStageHandler(children=[DummyPassthroughHandler()]) | ||
|
|
||
| dummy_url = "http://dummy-test-url.com/test.txt" | ||
|
|
||
| # The handler should raise an InvalidFileException when the file isn't transferred to storage | ||
| with pytest.raises(InvalidFileException, match="failed to transfer to storage"): | ||
| download_handler.execute(dummy_url) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly a curiosity question -- how did you decide what timeout values to use here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't - Claude chose them, and they seemed fine to me!