Skip to content

Conversation

@bwhitt7
Copy link
Contributor

@bwhitt7 bwhitt7 commented Aug 28, 2025

This PR fixes #109, hoping to address the issue of tmp_path not being thread safe. The introduced changes will patch tmp_path (and tmpdir), creating sub-directories for each thread/iteration within the original path returned by tmp_path. The tmp_path fixture will then be set to the new path, so that tests using the fixture will have a new directory for each thread and iteration. This should prevent threads from reading/writing into the same files and causing conflicts.

This PR also introduces a change to the closure function in wrap_function_parallel, with functions that handle setup for patching fixtures. The thread_index and iteration_index code was moved to these setup functions, and the patching for tmp_path/tmpdir is handled here too.

@ngoldbaum
Copy link
Collaborator

Awesome. @lysnikolaou assuming you're back next week, mind looking this over? Since you haven't been involved in the discussions around this.

Copy link
Member

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Good work @bwhitt7!

I've left some fairly minor comments inline.

@bwhitt7
Copy link
Contributor Author

bwhitt7 commented Sep 5, 2025

@lysnikolaou Apologies for the late push, hopefully this should address the changes you suggested! Decided to return the values that were originally in the data var as a tuple. Saving things in kwargs can get a little tricky since pytest gets upset when you have extra kwargs that don't match with anything in the test function.

Also, I tested out if directories still get cleaned up with the tmp_path value modified, and it looks like everything gets handled normally. You can look into the pytest temporary directories and see that the thread directories get created and deleted properly.

@ngoldbaum
Copy link
Collaborator

If you edit the PR description to include the string “fixes #109”, that issue will auto-close when this PR is merged. Just an FYI for future PRs that fix an issue.

@lysnikolaou
Copy link
Member

The changes look good @bwhitt7! Thanks!

However, because the changes not only affect collection time, but also runtime, I benchmarked using SciPy. I ran the tests in scipy/io/tests. And here are the results:

With pytest-run-parallel v0.6.1 (release on NumPy) I'm getting two test failures with 376/561 items running in parallel. On average (out of 5 times), the whole test suite lasted 83.994s. With this branch I'm getting no test failures (🎉) with 376/561 items running in parallel (same number of collected tests), but the test suite takes significantly more to run in an average of 93.542s. That's a runtime increase of 11.37%, which is significant, especially for large test suites.

I think we should spend more time trying to benchmark and optimize this. @ngoldbaum What do you think?

@ngoldbaum
Copy link
Collaborator

I think we should spend more time trying to benchmark and optimize this. @ngoldbaum What do you think?

Seems reasonable. I'll circle back with @bwhitt7. Thanks for checking!

@ngoldbaum
Copy link
Collaborator

@lysnikolaou I tried to reproduce that - I see on this PR that the scipy/io/tests tests run slightly faster with --parallel-threads=auto, finishing in 3.49 seconds vs 3.53 seconds using pytest-run-parallel 0.6.1.

Can you share a little bit more about how you set up your test? Maybe something specific about your test machine?

Maybe also if you're on Linux you can get a samply profile comparing this PR vs pytest-run-parallel 0.6.1, happy to help out explaining how to do that. Any kind of profiling you can do on your setup where you're seeing a really big slowdown would help us understand what's going on.

@lysnikolaou
Copy link
Member

Can you share a little bit more about how you set up your test?

The tests above are with a debug build of CPython 3.13.3t. I just tested this with a 3.14.0rc2t build as well (non-debug) and the regression is even more acute at ~31%.

My machine is a MacBook Pro with an M1 pro with 10 CPU cores. I'm running macOS Sequoia 15.6.1. The test invocation I'm using is --parallel-threads=10 --iterations=20. That's probably the difference we're seeing, as I'm seeing the same results (a very slight speedup) without the --iterations part. I think having an --iterations is much closer to real-world usage and because wrap_setup_iteration is called inside the --iterations loop, performance slows down linearly to the number of iterations. Can you confirm that? If not, I can try to get a profile to delve deeper into this.

@ngoldbaum
Copy link
Collaborator

Can you confirm that? If not, I can try to get a profile to delve deeper into this.

Thanks for the hint about --iterations. I can reproduce the slowdown now.

@bwhitt7
Copy link
Contributor Author

bwhitt7 commented Sep 10, 2025

Thanks for the feedback @lysnikolaou! Since the drastic slowdown happens with --iterations, me and Nathan agreed that it probably would be a good idea to remove iteration support for tmp_path for now since it adds a large number of temporary directories to create and remove, but I wanted to know what your opinion of this is. I suppose if users wanted to run iterations with tmp_path, they can edit their tests to be iteration-safe.

@lysnikolaou
Copy link
Member

Can we try to create all of the directories before the actual --iterations loop and benchmark that as well? It feels like that might be much closer to our current performance.

And let's also try to inline both functions. Function calls in Python do add some overhead, though I'm not sure how much of a difference that will make, especially if it's once per thread.

@bwhitt7
Copy link
Contributor Author

bwhitt7 commented Sep 12, 2025

Made the functions inline. With the iterations, looks like creating the directories outside of the iterations loop doesn't improve performance, instead resulting in significant performance cost similar to your testing @lysnikolaou . I think the performance cost is mainly the creation and destruction of so many directories anyways.

Copy link
Member

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work @bwhitt7! Thanks for all the patience with the reviews!

I'm approving, but I left one minor comment. After that's addressed, it's certainly good to go.


@pytest.mark.parametrize("parallel, passing", parallel_threads)
def test_tmp_path_tmpdir(pytester: pytest.Pytester, parallel, passing):
# ensures we can delete files in each tmpdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not relevant, should probably be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yep, forgot to change that. Just pushed a commit that fixes that.
And no worries!

@bwhitt7
Copy link
Contributor Author

bwhitt7 commented Sep 16, 2025

Hmmmm looks like the tests are failing now, seems to be related to that test_tmp_path_tmpdir test. Everything runs fine locally for me, will look into this.

@lysnikolaou
Copy link
Member

Hmmmm looks like the tests are failing now, seems to be related to that test_tmp_path_tmpdir test. Everything runs fine locally for me, will look into this.

This seems to be the relevant part of the exception:

E           and: '    | Traceback (most recent call last):'
E           and: '    |   File "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/threading.py", line 1043, in _bootstrap_inner'
E           and: '    |     self.run()'
E           and: '    |     ~~~~~~~~^^'
E           and: '    |   File "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/threading.py", line 994, in run'
E           and: '    |     self._target(*self._args, **self._kwargs)'
E           and: '    |     ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^'
E           and: '    |   File "/home/runner/work/pytest-run-parallel/pytest-run-parallel/.tox/3.13/lib/python3.13/site-packages/pytest_run_parallel/plugin.py", line 70, in closure'
E           and: '    |     kwargs["tmpdir"] = kwargs["tmpdir"].mkdir('
E           and: '    |                        ~~~~~~~~~~~~~~~~~~~~~~^'
E           and: '    |         f"thread_{thread_index!s}"'
E           and: '    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^'
E           and: '    |     )'
E           and: '    |     ^'
E           and: '    |   File "/home/runner/work/pytest-run-parallel/pytest-run-parallel/.tox/3.13/lib/python3.13/site-packages/_pytest/_py/path.py", line 889, in mkdir'
E           and: '    |     error.checked_call(os.mkdir, os.fspath(p))'
E           and: '    |     ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^'
E           and: '    |   File "/home/runner/work/pytest-run-parallel/pytest-run-parallel/.tox/3.13/lib/python3.13/site-packages/_pytest/_py/error.py", line 111, in checked_call'
E           and: '    |     raise cls(f"{func.__name__}{args!r}")'
E           and: "    | py.error.EEXIST: [File exists]: mkdir('/tmp/pytest-of-runner/pytest-0/basetemp/test_both0/thread_1',)"
E           and: '    | '

If both tmp_path and tmpdir are in the fixtures and they both point to the same directory, we try to create the same thread-specific dir twice.

@ngoldbaum
Copy link
Collaborator

This is happening because we merged PR #126 and no one noticed in earlier rounds of review that one of the tests you added raises an exception. Pytest treats unhandled exceptions in a thread as a warning, so it wasn't failing the tests until #126 got merged.

@bwhitt7
Copy link
Contributor Author

bwhitt7 commented Sep 16, 2025

Made it so if tmp_path and tmpdir are used at the same time, they won't raise warnings if they create the same directory. Also made the test_tmp_path_tmpdir test a little more useful hehe. Looks like this fixed the test failures!

@ngoldbaum
Copy link
Collaborator

Thanks @bwhitt7!

@ngoldbaum ngoldbaum merged commit 3d60fde into Quansight-Labs:main Sep 16, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tmp_path-fixtured tests generally do not expect neither parallelism nor repetition

3 participants