-
Notifications
You must be signed in to change notification settings - Fork 72
Do more robust handling of download failures. #631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do more robust handling of download failures. #631
Conversation
16a0b26 to
d9656f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No concerns just two small questions, then i'll do a re-read and approve.
| """A dummy handler that passes through the original path without transferring to storage. | ||
| This simulates the bug where a download handler fails to actually download/transfer | ||
| the file but returns the original URL as the path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but returns the original URL as the path
I'm not sure what this means. Previously, the bug was what the file wasn't getting transferred/downloaded actually, and the bug was that it seemed like it was, because there was a return value that was a path? but that the path was not the local path download location, but rather the original URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that was what was actually happening for me - but because of the way that the handlers work, it is possible that this would cause an issue - basically, this just makes sure that download handlers always return a local file path in storage after they have finished, because otherwise they could cause issues for every other handler.
| # Use explicit timeout to prevent hanging downloads | ||
| # (connection_timeout, read_timeout) - connection timeout for establishing connection, | ||
| # read timeout for time between receiving data chunks (prevents stuck downloads) | ||
| r = config.DOWNLOAD_SESSION.get(path, stream=True, timeout=(30, 60)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly a curiosity question -- how did you decide what timeout values to use here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't - Claude chose them, and they seemed fine to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions resolved! Thanks, Richard
Summary
During testing of 0.8 with the KA integration script, I was still seeing cases where a download would fail, but not trigger an error from the file pipeline.
Adds a check that any output file from the pipeline must be in the ricecooker storage, as otherwise, the pipeline has failed to execute properly.