Skip to content

Fix intermittency in test_timer::alarm_fires #1626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 2, 2022

Conversation

asomers
Copy link
Member

@asomers asomers commented Jan 2, 2022

The test was disabling the signal handler before disabling the timer.
Fix intermittent failures but reversing the cleanup order.

Also, speed up the timer to make the test suite complete faster.

@asomers asomers requested a review from rtzoeller January 2, 2022 21:37
@asomers
Copy link
Member Author

asomers commented Jan 2, 2022

cc @blt

@asomers asomers force-pushed the alarm_fires_intermittency branch from 31f09c9 to 5786b3b Compare January 2, 2022 21:38
// 2) Replace the old signal handler now that we've completed the test. If
// the test fails this process panics, so the fact we might not get here
// is okay.
drop(timer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to address the issue on my NetBSD VM. I'm not entirely sure why, but I think it might be related to the following section of the timer_delete man page:

Pending notification events (signals) may or may not be delivered.

Copy link
Collaborator

@rtzoeller rtzoeller Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a 10ms delay after dropping but before removing the signal handler seems to provide enough time for things to resolve on my VM. I'm unsure if there is a more deterministic way to do this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. From the FreeBSD man page: "Pending signals for the deleted timer are cleared." and from OpenGroup: "The disposition of pending signals for the deleted timer is unspecified." So I guess OpenGroup is the lowest common denominator. How do we ensure that all pending signals are dealt with?

Copy link
Collaborator

@rtzoeller rtzoeller Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried thread::yield_now() without success, so I think your current timing solution is as good as we'll get for now.

I'm going to leave the tests running for a bit, and if they don't fail with the current timeout for a few minutes I'll merge this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hmm. I suppose this implementation could destroy the timer, replace the old signal handler and then call sigpending. If the result of sigpending turns up empty the previously deleted timer didn't leave any trailing signals behind on systems that don't clean up. I'm not totally convinced that the OpenGroup wording doesn't allow for some unexpected ordering of operations in that proposed implementation, that said. The implementation that was merged up here makes good sense to me.

Apologies for not noticing this myself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think sigpending will work, because of this line from the man page:

Signals may be pending because they are currently
masked, or transiently before delivery (although the latter case is not
normally detectable).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha, I believe you are correct.

The test was disabling the signal handler before disabling the timer.
Fix intermittent failures by:
* Reversing the cleanup order.
* Sleeping for a while before removing the signal handler, since POSIX
  does not guarantee that timer_delete will clear pending signals.

Also, speed up the timer to make the test suite complete faster.
@asomers asomers force-pushed the alarm_fires_intermittency branch from 5786b3b to e5b9b97 Compare January 2, 2022 22:07
@rtzoeller
Copy link
Collaborator

bors r+

@bors bors bot merged commit 5359b8e into nix-rust:master Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants