-
Notifications
You must be signed in to change notification settings - Fork 1.2k
NSTestLock: Obtain lock before waking up main thread for test #1261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test |
@swift-ci please test and merge |
@spevans can you investigate? Looks like it's deadlocked:
https://ci.swift.org/job/swift-corelibs-foundation-PR-Linux/254/console |
@alblue Looks like it recovered:
That test creates 1000 threads and has them sleep for random intervals upto 1/10th second so if there was a high load it may have caused it to take the extra time, although Ive never seen it take that long before. |
If we see a similarly high delay in subsequent builds we'll need to come back and change these values. It was in this section for almost half an hour :( Is there a way of having a time limit, such that if the limit is passed then we kill off threads without failing the test? |
It currently creates 1000 threads, might be possible to modify it to stop creating threads after 5 seconds from the start of the test. Could also make the sleep a fixed small amount rather than a random number. The idea with the I think the bigger problem could be that its hard to recreate the issue as it doesn't occur that often. In which case simply trimming the threads down to say 50, and making the sleep 0.01sec may be the simplest fix. |
@alblue Are there any graphs or other stats that show the cpu load, men usage etc of the CI servers over time? Might help to correlate and debug these issues in the future. |
@shahmishal would be the person who would be able to answer that question. Doesn't this line sleep for 8 seconds The concern I have is that the build agents are on meaty, but loaded, machines. If we spin up 1000 threads, and they're all going to sleep, there's every possibility that they'll be starved from running on the box due to conflicting demands. One possibility is that we have the threads set small (e.g. 10) for running in CI builds, and then use a larger number when we're doing a more deep burn-in test (say, not running in CI). I'm not sure if there's a way of identifying whether the build process is running in a CI instance or not - but @shahmishal should know if there is. |
The 8 second sleep is for testing the The 1000 thread test, which caused the 20 minute delay, is a separate one and it should be possible to do a check for environment variables to see if its running under Jenkins or perhaps parse |
Are you able access environment variable? This is not the best solution but checking for ‘JENKINS_URL’ might show its running on CI. |
@spevans The total CPU time expected to run the test is going to be ~ 0.5 * 10 * 1000 * 100 / 100 seconds, right? On a single-core CPU that would be 83 minutes. Isn't that a bit excessive anyway? Can't we bring them down for the average machine to deal with? Even assuming a 4-core machine, it's still going to be in the order of 20 minutes run-time (which might be how we ended up with being here). |
I agree with @alblue ; creating 1000 threads may not even work on all platforms we want to support. |
I'll do a test PR and see if I can detect CI reliably and cap it to 10 threads. Btw, they should all run in parallel so it should never be 83minutes! |
I'm pretty sure that even on non-CI environments we don't want to spawn 1000 threads. And note that even if they run concurrently, there's still a lot of processing work to be done (83 mins in CPU time). In comparison, the libdispatch tests run in the order of seconds. Can we change this to something like 10 threads, and then permit a debug flag or environment variable to override that default, for example? |
This should prevent a race where the main thread completes its tests before the 2nd thread has obtained the lock.