Skip to content

Make short test runs faster #1073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JukkaL opened this issue Dec 13, 2015 · 8 comments
Closed

Make short test runs faster #1073

JukkaL opened this issue Dec 13, 2015 · 8 comments

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 13, 2015

Currently we create a new Python process for each unit test suite, which is wasteful when running just a small number of test cases, as the interpreter startup overhead has to be paid many times. Instead, we could use long-running processes that run multiple unit test tasks that get passed via a pipe or something from the main test runner process.

The goal is to make something like this run faster:

$ time ./runtests.py unit-test -a "missingtestname"
PARALLEL 8
SUMMARY  13 tasks selected
SUMMARY  all 13 tasks and 0 tests passed                                        
*** OK ***

real    0m0.584s
user    0m2.556s
sys 0m0.187s

If we can get this to ~0.3s on typical hardware I'd be happy.

@JukkaL JukkaL added the feature label Dec 13, 2015
@ddfisher ddfisher added this to the Future milestone Mar 2, 2016
@gvanrossum gvanrossum removed this from the Future milestone Mar 29, 2017
@elazarg
Copy link
Contributor

elazarg commented Aug 21, 2017

pytest is also slow for zero tests (4 seconds on my machine). Most of the time is spent on collecting tests. The problem is that collecting the tests involves setting them up, including creation of temporary files. This can be fixed (taking it down to 0.5 seconds) but requires refactoring.

@emmatyping
Copy link
Member

@elazarg do you have suggestions of how this could be accomplished? I often find myself running just a few tests, so trimming down the overhead of startup would be great.

@elazarg
Copy link
Contributor

elazarg commented Aug 6, 2018

The collection process needs to do only one thing: find the data and the names of all the tests. Instead, it currently open files, parse them, open other files, write new files, etc. All this work should be done and the time of test setup, not at test collection.

I have implemented it several times; strangely enough, even though it worked very well at the past (which I abandoned due to the high code churn), it did not work when I re-tried it recently.

Another thing that I'm not sure about is why do we need the temporary files at all. Almost everything should be possible to perform using StringIO.


One can also imagine having a cache of name->testdata, which will make the collection trivial for testing the same feature repeatedly (effectively bypassing the decision to put several tests in the same file instead of storing the as different files on the same folder), but that's probably overkill.

@elazarg
Copy link
Contributor

elazarg commented Aug 6, 2018

One difference between my first and second implementation is that in the first I have used itertools.groupby() and regexes for the initial parsing of the files. Perhaps I should try doing that again.

@emmatyping
Copy link
Member

Another thing that I'm not sure about is why do we need the temporary files at all. Almost everything should be possible to perform using StringIO.

Mostly due to the way mypy is structured. I've wanted to allow passing a StringIO to the api for while, though that requires a bit of restructuring of build process, and figuring out what that means (for example, it won't have a file name, which we usually assume sources have, we likely will need to translate that StringIO into a BuildSource, etc, etc).

@Michael0x2a
Copy link
Collaborator

One idea might be to cleanly divorce the filesystem interaction code from the actual build process entirely -- e.g. pull out all of the IO logic into some sort of "filesystem" object that understands how to maps file names to module names and vice versa and returns IO objects on request.

We could then swap that object with something that uses StringIO instead of reading temp files for the tests.

Probably the trickiest part would be making sure this abstraction works cleanly with both incremental and daemon mode?

I guess I'm also not entirely sure if there actually is a one-to-one correspondence between the module name and the file name -- you could have files that have the same name living in different places, for example.

@emmatyping
Copy link
Member

I guess I'm also not entirely sure if there actually is a one-to-one correspondence between the module name and the file name

There is always AFAIK a one-to-one with the fully resolved name to the file path, so I can have foo.bar.baz and foo.rab.baz.

I think a refactoring of the build would be great. It would also make it easier for editors to interface with mypy.

There's already #4365, so we should probably continue over there.

@elazarg
Copy link
Contributor

elazarg commented Aug 12, 2018

I managed to get it from 2.88 to 1.4. The collection itself seems to be taking much of the time - I think we'll need caching to take it down all the way to 0.3

msullivan pushed a commit that referenced this issue Sep 26, 2018
Parse tests on collection only enough to find the name, so small number of tests run faster

On my machine, `pytest -n0 -k testAttrsSimple` takes at least 2.24 seconds to finish on master, and at most 0.95 seconds to finish with this PR.

- 'skip-cache' is changed to 'only_when_nocache' and similarly 'skip-nocache'
- I have replace while loops with "if True" to make the diff simpler. Further cleanup and optimizations are also possible 

Fixes #1073
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants