Skip to content

NSTestLock: Obtain lock before waking up main thread for test #1261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 12, 2017

Conversation

spevans
Copy link
Contributor

@spevans spevans commented Oct 12, 2017

This should prevent a race where the main thread completes its tests before the 2nd thread has obtained the lock.

@spevans
Copy link
Contributor Author

spevans commented Oct 12, 2017

@swift-ci please test

@spevans
Copy link
Contributor Author

spevans commented Oct 12, 2017

@swift-ci please test and merge

@alblue
Copy link
Contributor

alblue commented Oct 12, 2017

@spevans can you investigate? Looks like it's deadlocked:

Test Case 'TestNSLock.test_threadsAndLocks' started at 2017-10-12 12:49:06.229

https://ci.swift.org/job/swift-corelibs-foundation-PR-Linux/254/console

@swift-ci swift-ci merged commit 6d862d6 into swiftlang:master Oct 12, 2017
@spevans
Copy link
Contributor Author

spevans commented Oct 12, 2017

@alblue Looks like it recovered:

Test Case 'TestNSLock.test_threadsAndLocks' started at 2017-10-12 12:49:06.229
Test Case 'TestNSLock.test_threadsAndLocks' passed (1642.673 seconds)

That test creates 1000 threads and has them sleep for random intervals upto 1/10th second so if there was a high load it may have caused it to take the extra time, although Ive never seen it take that long before.

@alblue
Copy link
Contributor

alblue commented Oct 12, 2017

If we see a similarly high delay in subsequent builds we'll need to come back and change these values. It was in this section for almost half an hour :(

Is there a way of having a time limit, such that if the limit is passed then we kill off threads without failing the test?

@spevans
Copy link
Contributor Author

spevans commented Oct 12, 2017

It currently creates 1000 threads, might be possible to modify it to stop creating threads after 5 seconds from the start of the test. Could also make the sleep a fixed small amount rather than a random number. The idea with the Thread.sleep was to try and stop the load spiking if 1000 threads were just started and then exited.

I think the bigger problem could be that its hard to recreate the issue as it doesn't occur that often. In which case simply trimming the threads down to say 50, and making the sleep 0.01sec may be the simplest fix.

@spevans
Copy link
Contributor Author

spevans commented Oct 12, 2017

@alblue Are there any graphs or other stats that show the cpu load, men usage etc of the CI servers over time? Might help to correlate and debug these issues in the future.

@alblue
Copy link
Contributor

alblue commented Oct 12, 2017

@shahmishal would be the person who would be able to answer that question.

Doesn't this line sleep for 8 seconds Thread.sleep(forTimeInterval: 8):

https://github.com/apple/swift-corelibs-foundation/blob/6d862d694b4ea6f14ac5ffea0c094acc7a7010e2/TestFoundation/TestNSLock.swift#L52

The concern I have is that the build agents are on meaty, but loaded, machines. If we spin up 1000 threads, and they're all going to sleep, there's every possibility that they'll be starved from running on the box due to conflicting demands.

One possibility is that we have the threads set small (e.g. 10) for running in CI builds, and then use a larger number when we're doing a more deep burn-in test (say, not running in CI). I'm not sure if there's a way of identifying whether the build process is running in a CI instance or not - but @shahmishal should know if there is.

@spevans
Copy link
Contributor Author

spevans commented Oct 12, 2017

The 8 second sleep is for testing the lock(before:) methods, unfortunately testing timeouts is always tricky.

The 1000 thread test, which caused the 20 minute delay, is a separate one and it should be possible to do a check for environment variables to see if its running under Jenkins or perhaps parse /proc/loadavg and not use so many threads if the load > 2 or so.

@shahmishal
Copy link
Member

Are you able access environment variable? This is not the best solution but checking for ‘JENKINS_URL’ might show its running on CI.

@alblue
Copy link
Contributor

alblue commented Oct 12, 2017

@spevans The total CPU time expected to run the test is going to be ~ 0.5 * 10 * 1000 * 100 / 100 seconds, right? On a single-core CPU that would be 83 minutes. Isn't that a bit excessive anyway? Can't we bring them down for the average machine to deal with? Even assuming a 4-core machine, it's still going to be in the order of 20 minutes run-time (which might be how we ended up with being here).

@parkera
Copy link
Contributor

parkera commented Oct 12, 2017

I agree with @alblue ; creating 1000 threads may not even work on all platforms we want to support.

@spevans
Copy link
Contributor Author

spevans commented Oct 12, 2017

I'll do a test PR and see if I can detect CI reliably and cap it to 10 threads. Btw, they should all run in parallel so it should never be 83minutes!

@alblue
Copy link
Contributor

alblue commented Oct 12, 2017

I'm pretty sure that even on non-CI environments we don't want to spawn 1000 threads. And note that even if they run concurrently, there's still a lot of processing work to be done (83 mins in CPU time). In comparison, the libdispatch tests run in the order of seconds.

Can we change this to something like 10 threads, and then permit a debug flag or environment variable to override that default, for example?

@spevans
Copy link
Contributor Author

spevans commented Oct 12, 2017

@alblue Ive opened #1262 to lower the 1000 threads to 10. btw, it never took 83minutes of CPU unless you had a very broken sleep implementation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants