Skip to content
This repository was archived by the owner on Sep 9, 2020. It is now read-only.

Run structured tests in parallel #362

Closed
sdboyer opened this issue Apr 1, 2017 · 20 comments
Closed

Run structured tests in parallel #362

sdboyer opened this issue Apr 1, 2017 · 20 comments

Comments

@sdboyer
Copy link
Member

sdboyer commented Apr 1, 2017

It seems like we should be able to run our declarative disk-driven tests in parallel. This would cut down considerably on overall test running time, especially if they all reuse a SourceManager between them - that way, reused remote repos will only need to be cloned down once. (Note that it might not be quite safe to do this until sdboyer/gps#196 is in and merged to dep)

When I quickly tried adding a t.Parallel() call to TestIntegration(), though, it panicked on a missing temp dir, suggesting that temp dir state is being shared between tests. We should be able to work on fixing that before gps is ready for prime-time concurrency.

@sdboyer sdboyer changed the title Add parallelism to structured tests Run structured tests in parallel Apr 1, 2017
@JackyChiu
Copy link
Contributor

I'd like to take a look into this!

@sdboyer
Copy link
Member Author

sdboyer commented Apr 6, 2017

@JackyChiu awesome! Anything I can do to help you get started?

@JackyChiu
Copy link
Contributor

JackyChiu commented Apr 6, 2017

@sdboyer I'm quite new to the repo so I'll have to look around more, I'll let you know if I have any questions then, thanks!

@JackyChiu
Copy link
Contributor

JackyChiu commented Apr 7, 2017

@sdboyer Okay I think I have a better grasp of what's going on, I'll start off by looking into this following issue and trying to solve it:

When I quickly tried adding a t.Parallel() call to TestIntegration(), though, it panicked on a missing temp dir, suggesting that temp dir state is being shared between tests.

Just to clarify, the temp dir's shouldn't be shared across each testcase correct?

@sdboyer
Copy link
Member Author

sdboyer commented Apr 7, 2017

Correct - each test case should get its own temp dir.

@JackyChiu
Copy link
Contributor

@sdboyer The source of the issue I see is that when running in parallel after a few tests, in NewTestCase() the working directory (wd) is left still in one of the temp dirs (it's supposed to return to the dir where the tests are being run). Then followed by trying to read a <wd>/testdata/harness_test/<name>/testcase.json there is where it causes a panic.

Any ideas for where to start looking this occurring condidtion?

@sdboyer
Copy link
Member Author

sdboyer commented Apr 7, 2017

@JackyChiu i suspect @tro3 could probably give a better pointer there than i 😄

@JackyChiu
Copy link
Contributor

@sdboyer Okay thanks!

@tro3
Copy link
Contributor

tro3 commented Apr 7, 2017

Guys - on vacation right now without access to code except thru smartphone.

The temp dirs should certainly be decoupled between tests. I wonder if the issue is he copying of the initial files - there are a lot of ignored errors in the suite, where the files are known to exist. But throw in concurrency, and maybe ignoring the errors is no longer valid?

@tro3
Copy link
Contributor

tro3 commented Apr 11, 2017

Okay - the issue here is the finding of the initial setup files from os.Getwd(). (integration_testcase.go:38). When running in parallel, that wd is leaking between cases - when a previously-started case cd's to the testcase directory, then next case uses that temp directory to look for the setup files, rather than (root)/testdata.

While a patch for this particaular error would be to record the wd before the beginning of the parallelism, the fact that this is happening at all means that the parallel threads will be operating in differing working directories. I'll poke around, but adding thread-safety with all these external git processes being lauched during tests sounds daunting. Maybe an initial wd recording will apply to those as well.

@tro3
Copy link
Contributor

tro3 commented Apr 11, 2017

In fact, poking around further, I find test/test.go:109 which talks about this explicitly. (Before my time - I hadn't seen this function, nor had I ever set h.parallel.)

@tro3
Copy link
Contributor

tro3 commented Apr 12, 2017

Guys - I took a shot at getting parallelism running today. I got the dep tests themselves thread-safe, but the use of git still breaks the thread safety. Tests all pass individually, but when run in parallel, one starts to get fails due to incorrect repos being downloaded. Most likely, multiple git processes running in parallel is a bad thing. Don't know if we can get to integration test parallelism without an in-memory mock SourceManager in gps.

@sdboyer
Copy link
Member Author

sdboyer commented Apr 14, 2017

To be clear, these failures are arising not from any of the direct git invocations by the dep testing harness (e.g. setting up GOPATH), but rather by the git calls being made from the SourceManager?

If so, then I think those issues should probably go away once we update gps again. Famous last words, but sdboyer/gps#196 should have resolved the parallelism issues.

@tro3
Copy link
Contributor

tro3 commented Apr 14, 2017

Can't tell if it is the git processes, the go get processes, or the SourceManagers. All I know is, when I modify the test harness to run in parallel, it no longer panics and all temp directories are created and cleaned up in parallel properly. But the tests fail with incorrect repos being populated. Different failures every time I run, and if run individually (with the -run option), each test passes.

We could keep the mods to allow parallelism, but not add the t.Parallel() call, I suppose. That would at least allow for future parallelism if someone digs further.

@tro3
Copy link
Contributor

tro3 commented Apr 14, 2017

Looking at the logs, I get the following errors when running in parallel:

  • git submodule update --init --recursive package github.com/sdboyer/deptest: exit status 1
  • panic: canary - why is checkout/whatever failing: ... (Generated by gps/source.go:160)
  • And sometimes there is no error - just differing results from the non-parallel case.

Does that provide a clue? Still sounds like external processes behaving badly to me.

@sdboyer
Copy link
Member Author

sdboyer commented Apr 14, 2017

Ah yeah, if you're getting that panic, then it's almost certainly the SourceManager calls, which are what'll be fixed by the new stuff.

@tro3
Copy link
Contributor

tro3 commented Apr 14, 2017 via email

@sdboyer
Copy link
Member Author

sdboyer commented Apr 14, 2017

@tro3 sure, sounds reasonable.

@JackyChiu sorry, I know you were hoping to work on this one. Please don't be dispirited - there's plenty of other issues that could use work 😄

@JackyChiu
Copy link
Contributor

No worries at all! I was busy with exams, I'll look around for other issues to contribute to!

@sdboyer
Copy link
Member Author

sdboyer commented May 13, 2017

oh whoops this is super done :D

@sdboyer sdboyer closed this as completed May 13, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants