-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Fix flakiness on StochasticDepth test #4758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 59c1c8b (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some highlights on the changes:
@pytest.mark.parametrize("p", [0.2, 0.5, 0.8]) | ||
@pytest.mark.parametrize("mode", ["batch", "row"]) | ||
def test_stochastic_depth(self, mode, p): | ||
def test_stochastic_depth_random(self, seed, mode, p): | ||
torch.manual_seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We maintain the original test because it allows us to check that the different mode
values operate as expected. Using p values in the interval (0, 1) is critical because for p=0 and p=1 the two modes behave the same. The mitigation for the flakiness here is to set the seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my laptop the test passed over 97/100 seeds so there's a very small change it will fail in the future. (first failure on the 11th seed lol)
I agree it's fine to leave as-is for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because I increased the p-value threshold to 1%. You kind of expect false positives around that rate now, hopefully we won't see it due to the seed setting.
@@ -1173,7 +1175,22 @@ def test_stochastic_depth(self, mode, p): | |||
num_samples += batch_size | |||
|
|||
p_value = stats.binom_test(counts, num_samples, p=p) | |||
assert p_value > 0.0001 | |||
assert p_value > 0.01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Significantly increase the threshold since now we only check 10 seeds. We can reduce if flakiness continues.
@pytest.mark.parametrize("seed", range(10)) | ||
@pytest.mark.parametrize("p", (0, 1)) | ||
@pytest.mark.parametrize("mode", ["batch", "row"]) | ||
def test_stochastic_depth(self, seed, mode, p): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding an additional test with p=0 and p=1 to confirm it works as expected for the extreme values.
noise = noise.bernoulli_(survival_rate).div_(survival_rate) | ||
noise = noise.bernoulli_(survival_rate) | ||
if survival_rate > 0.0: | ||
noise.div_(survival_rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually a bug; the previous code produced nans! Though it really isn't something that users will face. Setting p=1 to the operator means that you will always drop the block (set it to 0). I'm not sure that this is something many users would like to do, but worth fixing anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @datumbox!
@pytest.mark.parametrize("p", [0.2, 0.5, 0.8]) | ||
@pytest.mark.parametrize("mode", ["batch", "row"]) | ||
def test_stochastic_depth(self, mode, p): | ||
def test_stochastic_depth_random(self, seed, mode, p): | ||
torch.manual_seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my laptop the test passed over 97/100 seeds so there's a very small change it will fail in the future. (first failure on the 11th seed lol)
I agree it's fine to leave as-is for now
Summary: * Fix flakiness on the TestStochasticDepth test. * Fix minor bug when p=1.0 * Remove device and dtype setting. Reviewed By: datumbox Differential Revision: D32064694 fbshipit-source-id: 4107800cb6f8e56bcd85db31176afae394b86a21
* Fix flakiness on the TestStochasticDepth test. * Fix minor bug when p=1.0 * Remove device and dtype setting.
Resolves partially #4506
cc @pmeier