NSTestLock: Obtain lock before waking up main thread for test #1261

spevans · 2017-10-12T11:37:23Z

This should prevent a race where the main thread completes its tests before the 2nd thread has obtained the lock.

spevans · 2017-10-12T11:37:45Z

spevans · 2017-10-12T12:17:58Z

@swift-ci please test and merge

alblue · 2017-10-12T13:01:27Z

@spevans can you investigate? Looks like it's deadlocked:

Test Case 'TestNSLock.test_threadsAndLocks' started at 2017-10-12 12:49:06.229

https://ci.swift.org/job/swift-corelibs-foundation-PR-Linux/254/console

spevans · 2017-10-12T14:02:46Z

@alblue Looks like it recovered:

Test Case 'TestNSLock.test_threadsAndLocks' started at 2017-10-12 12:49:06.229
Test Case 'TestNSLock.test_threadsAndLocks' passed (1642.673 seconds)

That test creates 1000 threads and has them sleep for random intervals upto 1/10th second so if there was a high load it may have caused it to take the extra time, although Ive never seen it take that long before.

alblue · 2017-10-12T15:45:31Z

If we see a similarly high delay in subsequent builds we'll need to come back and change these values. It was in this section for almost half an hour :(

Is there a way of having a time limit, such that if the limit is passed then we kill off threads without failing the test?

spevans · 2017-10-12T15:52:00Z

It currently creates 1000 threads, might be possible to modify it to stop creating threads after 5 seconds from the start of the test. Could also make the sleep a fixed small amount rather than a random number. The idea with the Thread.sleep was to try and stop the load spiking if 1000 threads were just started and then exited.

I think the bigger problem could be that its hard to recreate the issue as it doesn't occur that often. In which case simply trimming the threads down to say 50, and making the sleep 0.01sec may be the simplest fix.

spevans · 2017-10-12T15:55:07Z

@alblue Are there any graphs or other stats that show the cpu load, men usage etc of the CI servers over time? Might help to correlate and debug these issues in the future.

alblue · 2017-10-12T16:10:02Z

@shahmishal would be the person who would be able to answer that question.

Doesn't this line sleep for 8 seconds Thread.sleep(forTimeInterval: 8):

https://github.com/apple/swift-corelibs-foundation/blob/6d862d694b4ea6f14ac5ffea0c094acc7a7010e2/TestFoundation/TestNSLock.swift#L52

The concern I have is that the build agents are on meaty, but loaded, machines. If we spin up 1000 threads, and they're all going to sleep, there's every possibility that they'll be starved from running on the box due to conflicting demands.

One possibility is that we have the threads set small (e.g. 10) for running in CI builds, and then use a larger number when we're doing a more deep burn-in test (say, not running in CI). I'm not sure if there's a way of identifying whether the build process is running in a CI instance or not - but @shahmishal should know if there is.

spevans · 2017-10-12T16:18:56Z

The 8 second sleep is for testing the lock(before:) methods, unfortunately testing timeouts is always tricky.

The 1000 thread test, which caused the 20 minute delay, is a separate one and it should be possible to do a check for environment variables to see if its running under Jenkins or perhaps parse /proc/loadavg and not use so many threads if the load > 2 or so.

shahmishal · 2017-10-12T16:24:31Z

Are you able access environment variable? This is not the best solution but checking for ‘JENKINS_URL’ might show its running on CI.

alblue · 2017-10-12T16:28:36Z

@spevans The total CPU time expected to run the test is going to be ~ 0.5 * 10 * 1000 * 100 / 100 seconds, right? On a single-core CPU that would be 83 minutes. Isn't that a bit excessive anyway? Can't we bring them down for the average machine to deal with? Even assuming a 4-core machine, it's still going to be in the order of 20 minutes run-time (which might be how we ended up with being here).

parkera · 2017-10-12T17:18:12Z

I agree with @alblue ; creating 1000 threads may not even work on all platforms we want to support.

spevans · 2017-10-12T17:27:33Z

I'll do a test PR and see if I can detect CI reliably and cap it to 10 threads. Btw, they should all run in parallel so it should never be 83minutes!

alblue · 2017-10-12T18:04:05Z

I'm pretty sure that even on non-CI environments we don't want to spawn 1000 threads. And note that even if they run concurrently, there's still a lot of processing work to be done (83 mins in CPU time). In comparison, the libdispatch tests run in the order of seconds.

Can we change this to something like 10 threads, and then permit a debug flag or environment variable to override that default, for example?

spevans · 2017-10-12T18:24:00Z

@alblue Ive opened #1262 to lower the 1000 threads to 10. btw, it never took 83minutes of CPU unless you had a very broken sleep implementation :)

NSTestLock: Obtain lock before waking up main thread for test

5e392a4

spevans mentioned this pull request Oct 12, 2017

Remove previous hack for SR-5206 #1259

Merged

swift-ci merged commit 6d862d6 into swiftlang:master Oct 12, 2017

alblue mentioned this pull request Oct 12, 2017

Added build-android script #1249

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NSTestLock: Obtain lock before waking up main thread for test #1261

NSTestLock: Obtain lock before waking up main thread for test #1261

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

shahmishal commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017 •

edited

Loading

Uh oh!

parkera commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

Uh oh!

NSTestLock: Obtain lock before waking up main thread for test #1261

NSTestLock: Obtain lock before waking up main thread for test #1261

Uh oh!

Conversation

spevans commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

shahmishal commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

parkera commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

alblue commented Oct 12, 2017

Uh oh!

spevans commented Oct 12, 2017

Uh oh!

Uh oh!

alblue commented Oct 12, 2017 •

edited

Loading