-
Notifications
You must be signed in to change notification settings - Fork 261
WIP: figure out appveyor issue #367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It's a Win32 precision issue, I believe caused by the upgrade to NumPy 1.10. It does have messages, which is a big sequence of tuples of failing conditions. I suspect updating this line with a more lenient threshold could resolve the issue. Failing that, updating the print statements to be more informative might be a good step. |
👍 will work on that, thanks for the tips! |
So it seems to be this case that's actually the problem. Nice cleanup. Still don't have ideas on how to fix it, but it does look more tractable, now. |
On my machine, whenever I can't figure out where |
Actually, that's not always true, sorry. |
Slope and intercept will be coming from |
Added frequency information.
Absolute fails 2/10000 times, so we should be able to look at individual cases. Relative fails ~75% of the time, but it's a very small error. Possibly picking an arbitrary case or two to print more details of would be instructive? |
Rebase issue? |
So the last commit seems to solve the issue. It's hard to know if that's the right fix--I don't know the new threshhold value is ( Besides that, I'm not sure how much of the test code changes we want to keep. If that's the bug, just two lines really needed to be changed. The others are all printing / cleanup. Which I think is good, except I'm not 100% sure I understood the code well enough to be sure those changes are right. I think they improve the code, but perhaps I've just misunderstood. @matthew-brett If you have a moment to look things over, that'd be great! |
Just linking to the old commit (83cbfbf) discussion's been happening on. Do you think this is going to be resolved soon? Since it seems to be a numpy 1.10 issue, it would be good to have it in for 2.0.2. |
I will take another look at this, this morning. I've downloaded a Windows 32-bit virtualbox image, and am downloading Python now. |
So... I was able to reproduce this locally. I played around a bit, but ran out of time before really debugging if nibabel can do any of the scaling computations differently to reduce the error (I was poking around in |
Thanks, Ben. @matthew-brett How do you feel about blocking 2.0.2 on this PR? I think this is my only block, at the moment. |
Failure is a bit puzzling - we don't seem to be getting it on the buildbots... |
But then again - if the failure is due to numpy 1.10, we wouldn't get it. I'll try to get to this tomorrow. |
OK - I can't replicate on a Windows 7 64-bit bit numpy 1.10.1 from Christoph Gohlke, for the 2.0.2 branch, so this seems to be specific to conda. Is that possible? |
It's the 32 bit test that's failing. (Apologies if I'm missing something and uselessly stating the obvious.) |
OK - the 32-bit test does fail with Christoph's numpy 1.10.1 wheel - I'll investigate. |
@matthew-brett Looks like the latest commit on master resolved the appveyor error. Should we close this PR and backport to 2.0.2? |
Unfortunately, that commit is not a fix, just a cleanup, and it fixes the failure by accident - just pumping another series of random numbers through. There still seems to be an issue on 32 bit - still looking at it... |
Still exploring, it's slow I'm afraid. |
OK - I think I know what's going on now. I don't know why numpy-1.10.1 exposes it, but what is happening is that the values being tested are right at the extreme end of values that can be saved as np.uint64 after subtracting the minimum. The 32-bit builds we are using are built with MSVC, and they have a bug in converting to np.uint64, worked around here : https://github.com/nipy/nibabel/blob/master/nibabel/casting.py#L17 In practice what this means is the maximum np.uint64 value we can generate from a float is a lot smaller on these systems than on any other system, and we are more likely to hit this threshold with our random numbers. At the moment I'm checking whether we can get away with this kind of intercept-only scaling here : https://github.com/nipy/nibabel/blob/master/nibabel/arraywriters.py#L562 I need to think a little whether I should relax the threshold or make the check tighter to flip to full slope and intercept scaling in this situation. Will try and do that tomorrow. |
No, sorry, I was wrong - it's a numpy + MKL + Windows bug: So, I think we can close this. |
Thanks for figuring this out @matthew-brett; I agree. |
Chris - Ben - would you mind sending me your public ssh keys over email? I want to give you access to the buildbot machines to make it easier for you to debug if you need to. |
We have a test failure in appveyor. The test is within a loop, has no message, and doesn't reproduce locally, so it's hard to debug.
I pushed this change to my repo (use
yield
for the test, add a message to be clear what test(s) are failing), and it seems to pass. Bizarre. Submitting a PR as a placeholder for exploration / discussion, and so we can solve this issue in appveyor.