-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Annotate test cases which are timing sensitive #130363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sounds like it could be a new resource, see |
I think it makes sense to hide the tests that clearly depend on timings (e.g. |
Agreed - we've a short list of known flaky tests, and need to triage it to identify the ones which are genuinely sensitive vs those with subtle race conditions which should be fixed. I will prototype a new resource and see how it works. |
If it helps, to add a new resource, you just need to add it here: cpython/Lib/test/libregrtest/utils.py Lines 35 to 36 in d8ce092
And then you can annotate the tests with |
How about a
I suggest adding |
I like the |
Let's open a separate issue for the flaky tests as it could be orthogonal to tests that require an idle system due to stats gathering.
I suggested "stats" because I don't think we should have a test that doesn't check against a population that some hypothesis is verified. However, I'm not sure we have any test that would test something else than timings like this. Memory cannot really be flaky unless the kernel is doing some weird stuff and network can fail but generally it's a timeout issue, not a data issue itself. So |
Issue's here: #130474
Yeah, I'm not sure if we assert anything else besides timings that is similarly stochastic.. maybe we can stick with some time-related name for now and adjust it if something else comes up? |
…system Some tests are very sensitive to timing and will fail on a loaded system, typically because there are small windows for timeouts to trigger in. Some of these are poorly implemented tests which can be improved, but others may genuinely have strict timing requirements. Add a test resource so that these tests can be marked as such, and only ran when the system is known to be idle.
…system Some tests are very sensitive to timing and will fail on a loaded system, typically because there are small windows for timeouts to trigger in. Some of these are poorly implemented tests which can be improved, but others may genuinely have strict timing requirements. Add a test resource so that these tests can be marked as such, and only ran when the system is known to be idle.
Posted a minimal PR: #130508. This just adds an "idle" resource. |
I'm not comfortable with this issue. "Skip tests because they fail on our CI" sounds too specific to me. IMO a "skip list" for a specific CI is perfectly fine. Your patches use the description:
Well, yes, the cause should be determined. 0001-test_storlines-skip-due-to-load-variability.patch: it would be interesting to investigate why this specific test "intermittently fails on the Yocto AB when a worker is under heavy load". We already have the "walltime" resources for "slow tests". You might start by skipping this resource? |
Slow tests isn't a problem, it's tests that take 1s but have precise timing restrictions that are. |
And I will absolutely review every skip we have to identify why it fails under load before tagging it with 'idle'. |
In general, tests should not fail if they take longer than a maximum timing: tests should not have maximum duration to handle well heavy loaded systems. TimerfdTests which has an hardcoded precision deserves its own solution. I dislike blindly skip TimerfdTests because the CI may be too slow. |
What would the solution for TimerFdTests be? Another recent failure that we're not yet patching out is
|
For example, add a command line option to specify the timer precision. Or change the test to only use a precision of 1 second. I'm not sure what's the best solution here. |
From experience, when you say "let's just make this timeout longer" you're actually just saying "let's just have this test fail less frequently". A heavily loaded system will surprise you with exceptional delays at the most annoying moments. |
I'm open to rework timerfd tests to remove "maximum timing" tests. |
Another example that our CI just failed on:
Given the test code contains a hardcoded short duration sleep:
I'm guessing the problem here is that the test assumes one second is enough for the forked process to be ready, and it normally is, but under heavy load at just the right time it isn't. What would the right solution be here? Make the |
A solution for subprocess synchronization would be to create a pipe and read/write into this pipe to synchronize. For example, the child process can write when it's ready. See Lib/test/_test_eintr.py for examples. |
Feature or enhancement
Proposal:
Many tests in the cpython test suite are sensitive to timing and assume an entirely unloaded system. For example, the
TimerfdTests
check for 1ms accuracy in expected durations. Some people (eg people building and testing python) may be running the test suite on build machines which are potentially heavily loaded so this sort of expectation isn't feasible.It seems like a good solution to this would be to annotate the tests which are timing sensitive in some way, so they can be skipped easily.
For reference, we're manually patching python to skip tests which are failing under load but we're now having to rebase these patches on upgrades:
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: