Skip to content

[Bug] file-based execution violates futures contract #709

@liamhuber

Description

@liamhuber

I'm still working on getting stuff integrated with pyiron_workflow, and ran into a case where the SlurmClusterExecutor was hanging indefinitely. I was able to reproduce one of my problems locally using LocalFileExecutor:

import os
import time
from concurrent.futures import ThreadPoolExecutor

from executorlib import SingleNodeExecutor
from pyiron_workflow.executors.wrapped_executorlib import LocalFileExecutor

def foo(x):
    time.sleep(x)
    return x

for executor_class, kwargs in (
    (ThreadPoolExecutor, {}),
    (SingleNodeExecutor, {"cache_directory": "cache_sne"}),
    (LocalFileExecutor, {"cache_directory": "cache_lfe"}),
):
    print(executor_class.__name__)
    with executor_class(**kwargs) as executor:
        future = executor.submit(foo, 1)
        print(future)
    print(future.result())
    print(future, "\n")

Gives

ThreadPoolExecutor
<Future at 0x108cc0860 state=running>
1
<Future at 0x108cc0860 state=finished returned int> 

SingleNodeExecutor
<Future at 0x108c71bb0 state=pending>
1
<Future at 0x108c71bb0 state=finished returned int> 

LocalFileExecutor
<Future at 0x10fecc440 state=pending>

And then hangs indefinitely. I can see that cache_lfe/..._i.h5 gets written, it's just not running. This can be avoided if the relevant function is fast enough, or adding a long enough sleep to the with body clause, or (obviously) by moving the future.result() into the with clause. I believe what's happening is that the file-based approach takes some extra time somewhere and gets in a race with TaskSchedulerBase.shutdown such that the clauses checking on self._future_queue and self._process are not behaving appropriately. There is something I'm not understanding deeply enough though, because it works fine for the SingleNodeExecutor even when it's using the filesystem cache.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions