-
-
Notifications
You must be signed in to change notification settings - Fork 34k
test: fix flaky test-fs-stat-bigint #58043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix flaky test-fs-stat-bigint #58043
Conversation
the current allowable delta values cause some flakiness in the test-fs-stat-bigint tests, so the changes here incremente such values by an arbitrary number to fix such flakiness
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #58043 +/- ##
==========================================
- Coverage 90.28% 90.26% -0.02%
==========================================
Files 630 630
Lines 186150 186328 +178
Branches 36468 36503 +35
==========================================
+ Hits 168059 168186 +127
- Misses 10975 11013 +38
- Partials 7116 7129 +13 🚀 New features to boost your workflow:
|
RaisinTen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a magic number feels more like a mitigation than a fix.
Logically, it does make sense to expect the numbers to be between the start and end times.
I'm curious if this was flakey for all of these subtests or if it can be isolated to only some of the:
- fs calls like stat / fstat / lstat
- conditions like sync / promise / callback
- stat object properties like atime / mtime / ctime / birthtime
Investigating that might help in figuring out what's really causing this flakiness.
| // this number is computed by taking the difference between the start and end | ||
| // times of the stat calls, converting it from nanoseconds to milliseconds, | ||
| // and adding a small buffer to account for potential timing variations | ||
| const allowableTimeDelta = Math.ceil(Number(endTime - startTime) / 1e6) + 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number needs to be passed through common.platformTimeout() to translate to equivalent numbers on other platforms
RaisinTen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if measuring the start time before file creation (the getFilename() call creates the file) gets rid of the flakiness?
Sure, I've just checked 👍 (on the sync test), it does not seem to help with the flakiness 🤔 |
Thanks for the helpful suggestion 🙂 , however I've tested all the isolated possibilities above and I could not see any version that would not include flakiness 😓 I've run subsets of the tests/checks (only the promise tests, only the callback test, only the stat tests, only the fstat tests, only the atime checks, only the mtime checks, etc...) and run them with a high repetition (300), I did not run combinations of them (only atime checks in fstat promise tests, only mtime checks in lstat callback tests, etc...) since given the above I don't imagine that combinations would yield any different result 😕 I've also not checked the level of flakiness of any of the isolated versions, however by running then and manually looking at the results it feels to me like there is not much difference between the flakiness of each version (or the full test file as a whole) |
|
@RaisinTen, could we query Jenkins results to understand if the flakiness is present for all platforms or just a subset? (or if it's even failing in CI) |
|
The Jenkins results are logged in the reliability repo but I wasn't able to find anything useful there because the linked CI logs are pretty old, so the data is gone. However, there are some interesting bits of information about this in the main issue tracker. Check this out - #45612 and the discussions on James' attempted fix which is similar to yours. Based on the data there, the time-based flakiness in this test is known to repro only on M1 macOS due to the unexpectedly large difference in the 2 |
|
Ah, sorry I forgot to check if there was an issue already tracking this 😓 Also sorry I made a mistake in my testing earlier and it does seem that the flakiness only applies to My testingBasically with this change: function verifyStats(bigintStats, numStats, allowableDelta) {
// allowableDelta: It's possible that the file stats are updated between the
// two stat() calls so allow for a small difference.
for (const key of Object.keys(numStats)) {
const val = numStats[key];
+ if (key === 'atimeMs') continue;
if (isDate(val)) {
const time = val.getTime();
const time2 = bigintStats[key].getTime();
assert(I can see that the flakiness goes away |
|
One solution here could be to check, if the key is But again that might be seen as a mitigation rather than an actual fix. |
|
I feel like this might just be highly dependent on the OS/platform scheduling and something outside of the control of node 😕 |
|
Based on the docs (I hope it's the same on your M1 macOS too), these are the causes for the atime to change: Can you check what's causing the atime to go outside the expected window? You can try to wildly simplify the test locally, make sure the flake repros and use commands like fs_usage, dtruss, etc. to see what's causing this change. |
Thank you so very much for the fantastic suggestion @RaisinTen 🫶 Of course, I will look into this as soon as I can Anyways, what do you think of this current PR? should I maybe close it and we can keep the conversation in #45612? (I'm asking just to make sure that this PR doesn't create unnecessary noise in the repo 🙂) |
|
Since the current approach of this PR is similar to James' approach in #45925 which was closed, I don't think this should land in its current state. The investigation happening here is very useful and I think we can figure out the root cause and solve this. I don't have any strong opinions on whether the conversation should happen here or in #45612, as long as it happens somewhere. So I'm fine with whichever channel you prefer. Thanks a lot for looking into this. :) |
Yes I think that the content of this PR would anyways likely significantly change, so I might as well close it and reopen one with the correct/agreed approach when we come up with one
Ok 🙂 , if it's all ok to you then I'll just close this one and continue the conversation in #45612 when I have updates 🙂
It's my pleasure 😄 Thank you so very much for all the support and helpful suggestions! 🫶 |
As I've been running the repo's tests I notice that on my machine1
test-fs-stat-bigintseems ratherflaky and such flakiness is caused by the allowable deltas in the test which seem to generally be very
low (
1or2), so here I'm proposing to increment such values by502Besides the above I've applies some minor code refactoring.
Note
I'd like to point out that I am adding this
50magic number, before a magic number5was being used by was removed in #31726, naturally I do agree that magic numbers are not great but I'd argue that the presence of a magic number is anyways preferable compared to that of flakinessFootnotes
macOS Sequoia 15.4Darwin XXX 24.4.0 Darwin Kernel Version 24.4.0: Wed Mar 19 21:16:34 PDT 2025; root:xnu-11417.101.15~1/RELEASE_ARM64_T6000 arm64↩50is a completely arbitrary number, I have tried with various other numbers, such as5and10and they were significantly reducing but not completely getting rid of the flakiness, so I figured that the number might as well be increased to eliminate the flakiness completely, my guess is that in the grand scheme of things something like a 50 milliseconds is a very small and acceptable delta anyways, but this is really a wild guess of mine, I'm waiting to see if anyone pushes back on it ↩