Skip to content

Conversation

@liamhuber
Copy link
Member

@liamhuber liamhuber commented Jul 11, 2025

To guarantee the fulfillment of the executor contract: shutdown will not complete while waiting for futures to finish.

This closes #709, but I'm not confident it's the right or best solution.

The tests passed locally, but we'll see how it goes here. It would be good to test this, too, maybe in test_cache_fileexecutor_serial.py, but I'm not yet familiar with direct instantiation of the scheduler classes and the test class from #709 isn't in the code base yet.

Summary by CodeRabbit

  • New Features

    • Improved task management by tracking all submitted tasks, allowing for collective cancellation and waiting during shutdown.
  • Bug Fixes

    • Ensured all submitted tasks are properly handled and cleaned up during shutdown, reducing the risk of orphaned or incomplete tasks.

To guarantee the fulfillment of the executor contract: shutdown will not complete while waiting for futures to finish.

Signed-off-by: liamhuber <[email protected]>
Signed-off-by: liamhuber <[email protected]>
@liamhuber liamhuber requested a review from jan-janssen as a code owner July 11, 2025 18:13
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 11, 2025

Walkthrough

The TaskSchedulerBase class now tracks all submitted Future objects in a set. The submit method adds each new Future to this set, and the shutdown method can now cancel or wait for all tracked futures before clearing the set, ensuring proper management and completion of all tasks during shutdown.

Changes

File(s) Change Summary
executorlib/task_scheduler/base.py Added _futures set to track submitted futures; updated submit to add futures; updated shutdown to cancel/wait on futures and clear the set.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TaskSchedulerBase

    User->>TaskSchedulerBase: submit(fn, *args, **kwargs)
    TaskSchedulerBase->>TaskSchedulerBase: Create Future
    TaskSchedulerBase->>TaskSchedulerBase: Add Future to _futures
    TaskSchedulerBase->>TaskSchedulerBase: Enqueue task

    User->>TaskSchedulerBase: shutdown(wait, cancel_futures)
    alt cancel_futures == True
        TaskSchedulerBase->>TaskSchedulerBase: Cancel all futures in _futures
    end
    alt wait == True
        TaskSchedulerBase->>TaskSchedulerBase: Wait for all futures in _futures
    end
    TaskSchedulerBase->>TaskSchedulerBase: Clear _futures
Loading

Assessment against linked issues

Objective Addressed Explanation
Ensure all submitted futures are tracked and properly cancelled or awaited during shutdown (#709)
Prevent hanging or indefinite waiting on futures after shutdown for file-based execution (#709)

Poem

A rabbit tracks each future’s fate,
In sets they gather, never late.
On shutdown’s call, they’re canceled, cleared—
No hanging tasks to be feared!
With every hop, the code’s more sure,
🐇✨ Now futures’ endings are secure!


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 957170b and c8b932a.

📒 Files selected for processing (1)
  • executorlib/task_scheduler/base.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unittest_win
  • GitHub Check: notebooks
🔇 Additional comments (5)
executorlib/task_scheduler/base.py (5)

9-11: LGTM! Clean import with appropriate aliasing.

The import of wait with alias wait_for_futures correctly avoids naming conflicts with the wait parameter in the shutdown method.


131-131: LGTM! Proper future tracking implementation.

The future is correctly added to the tracking set, ensuring all created futures are monitored for shutdown purposes.


161-161: LGTM! Proper cleanup prevents memory leaks.

The futures set is correctly cleared after shutdown operations complete, preventing memory leaks by removing references to completed futures.


37-37: Python 3.9+ Support Confirmed

The project’s pyproject.toml specifies
• requires-python = ">=3.9, <3.14"
• Classifiers include Python 3.9–3.13

Since the minimum supported Python version is 3.9, using the built-in generic set[Future] annotation is fully compatible. No changes required.


151-155: Shutdown logic is correct; consider adding timeouts and synchronizing concurrent calls

File: executorlib/task_scheduler/base.py Lines 151–155

    if cancel_futures:
        for f in self._futures:
            f.cancel()
    if wait:
        wait_for_futures(self._futures)

The implementation properly cancels all pending futures and then waits for their completion before cleaning up. A couple of optional improvements to keep in mind:

  • Indefinite wait:
    wait_for_futures(self._futures) will block indefinitely if any future never completes. You may want to introduce a timeout and handle any remaining futures (e.g. log or force-cancel them) to avoid a potential hang.

  • Race condition with concurrent submissions:
    If submit is called concurrently with shutdown, newly added futures could escape cancellation or waiting. Consider protecting submit/shutdown with a lock or preventing new submissions once shutdown begins.

No changes required for core functionality, but please verify that this behavior aligns with your requirements and consider adding timeouts or synchronization as needed.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Jul 11, 2025

Codecov Report

Attention: Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.

Project coverage is 96.76%. Comparing base (957170b) to head (c8b932a).

Files with missing lines Patch % Lines
executorlib/task_scheduler/base.py 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #710      +/-   ##
==========================================
- Coverage   96.89%   96.76%   -0.13%     
==========================================
  Files          29       29              
  Lines        1320     1329       +9     
==========================================
+ Hits         1279     1286       +7     
- Misses         41       43       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@jan-janssen jan-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue.join() can only be executed when the execution of the future finished. So I would like to fix this issue in the file based executor rather than the Base executor, especially as you report it works fine for the SingleNodeExecutor:

There is something I'm not understanding deeply enough though, because it works fine for the SingleNodeExecutor even when it's using the filesystem cache.

@liamhuber
Copy link
Member Author

The queue.join() can only be executed when the execution of the future finished.

Then I suppose somehow we are triggering an __exit__ before the queue is fully properly populated?

So I would like to fix this issue in the file based executor rather than the Base executor, especially as you report it works fine for the SingleNodeExecutor:

There is something I'm not understanding deeply enough though, because it works fine for the SingleNodeExecutor even when it's using the filesystem cache.

I have two objections to fixing it outside of BaseExecutor. The first is that this appears to be being caused by slowness in the file-based approach this time, but fundamentally the BaseExecutor is not guaranteeing the contract of concurrent.futures._base.Executor.shutdown(wait=True) that shutdown will be delayed until all .submit futures are resolved. It's not my call to make, but I advise that it's preferable to resolve the promise with brutality in BaseExecutor than to cover over this particular realization of the problem elsewhere.

The second is that I don't know how 😂 so go for it, but I'm afraid we'll have to close this and try elsewhere. FileTaskScheduler doesn't touch submit or shutdown, but I know enough about how things are operating here to follow it over to the Thread(target=execute_tasks_h5. But I don't see anything obvious like a future_queue.put call following some particularly expensive operation, and I otherwise don't know how to read it well enough to find other problems. Similarly, I don't see anything in TaskSchedulerBase.submit that would behave differently when FileTaskScheduler is inheriting and invoking it. So I'm afraid a more specialized solution will need to come from you rather than me.

@liamhuber
Copy link
Member Author

#712 (comment)

@liamhuber liamhuber closed this Jul 11, 2025
@jan-janssen jan-janssen deleted the wait_for_futures branch July 13, 2025 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] file-based execution violates futures contract

3 participants