Do more robust handling of download failures. #631

rtibbles · 2025-09-25T16:34:33Z

Summary

During testing of 0.8 with the KA integration script, I was still seeing cases where a download would fail, but not trigger an error from the file pipeline.
Adds a check that any output file from the pipeline must be in the ricecooker storage, as otherwise, the pipeline has failed to execute properly.

…ine.

marcellamaki

No concerns just two small questions, then i'll do a re-read and approve.

marcellamaki · 2025-10-10T14:10:30Z

tests/pipeline/test_transfer.py

+    """A dummy handler that passes through the original path without transferring to storage.
+
+    This simulates the bug where a download handler fails to actually download/transfer
+    the file but returns the original URL as the path.


but returns the original URL as the path

I'm not sure what this means. Previously, the bug was what the file wasn't getting transferred/downloaded actually, and the bug was that it seemed like it was, because there was a return value that was a path? but that the path was not the local path download location, but rather the original URL?

I don't think that was what was actually happening for me - but because of the way that the handlers work, it is possible that this would cause an issue - basically, this just makes sure that download handlers always return a local file path in storage after they have finished, because otherwise they could cause issues for every other handler.

marcellamaki · 2025-10-10T14:11:30Z

ricecooker/utils/pipeline/transfer.py

+        # Use explicit timeout to prevent hanging downloads
+        # (connection_timeout, read_timeout) - connection timeout for establishing connection,
+        # read timeout for time between receiving data chunks (prevents stuck downloads)
+        r = config.DOWNLOAD_SESSION.get(path, stream=True, timeout=(30, 60))


mostly a curiosity question -- how did you decide what timeout values to use here?

I didn't - Claude chose them, and they seemed fine to me!

marcellamaki

Questions resolved! Thanks, Richard

rtibbles added 2 commits September 25, 2025 15:11

Ensure all file downloads actually put a file in storage.

2501ca4

Fail faster on downloads, add timeout handling.

d9656f0

rtibbles force-pushed the output_path_required branch from 16a0b26 to d9656f0 Compare September 25, 2025 22:11

rtibbles added 3 commits September 26, 2025 06:49

Add regression test and fix for threaded race condition in file pipel…

b5a1e3b

…ine.

Update youtube cassette for latest yt-dlp release.

a812e4c

Update StudioFile to always return a file dict.

72ca9cc

rtibbles assigned marcellamaki Oct 7, 2025

marcellamaki reviewed Oct 10, 2025

View reviewed changes

marcellamaki approved these changes Oct 23, 2025

View reviewed changes

marcellamaki merged commit 75dd992 into learningequality:develop Oct 23, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do more robust handling of download failures. #631

Do more robust handling of download failures. #631

Uh oh!

rtibbles commented Sep 25, 2025

Uh oh!

marcellamaki left a comment

Uh oh!

marcellamaki Oct 10, 2025

Uh oh!

rtibbles Oct 22, 2025

Uh oh!

marcellamaki Oct 10, 2025

Uh oh!

rtibbles Oct 22, 2025

Uh oh!

marcellamaki left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Do more robust handling of download failures. #631

Do more robust handling of download failures. #631

Uh oh!

Conversation

rtibbles commented Sep 25, 2025

Summary

Uh oh!

marcellamaki left a comment

Choose a reason for hiding this comment

Uh oh!

marcellamaki Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

rtibbles Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

marcellamaki Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

rtibbles Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

marcellamaki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants