WIP: figure out appveyor issue #367

bcipolli · 2015-10-26T14:35:01Z

We have a test failure in appveyor. The test is within a loop, has no message, and doesn't reproduce locally, so it's hard to debug.

I pushed this change to my repo (use yield for the test, add a message to be clear what test(s) are failing), and it seems to pass. Bizarre. Submitting a PR as a placeholder for exploration / discussion, and so we can solve this issue in appveyor.

effigies · 2015-10-26T14:45:13Z

It's a Win32 precision issue, I believe caused by the upgrade to NumPy 1.10. It does have messages, which is a big sequence of tuples of failing conditions.

I suspect updating this line with a more lenient threshold could resolve the issue. Failing that, updating the print statements to be more informative might be a good step.

bcipolli · 2015-10-26T15:00:10Z

👍 will work on that, thanks for the tips!

bcipolli · 2015-10-26T18:13:42Z

Here's the updated output.

I'm not sure changing the ulp is going to matter here, as error is expected to be zero for ints, it seems.

effigies · 2015-10-26T18:23:08Z

So it seems to be this case that's actually the problem. Nice cleanup.

Still don't have ideas on how to fix it, but it does look more tractable, now.

bcipolli · 2015-10-26T18:36:17Z

On my machine, whenever slope=1, inter=0. However, in this example, inter looks like it's underflowed somehow (it's huge).

I can't figure out where dataobj.slope and dataobj.inter are coming from, however.

bcipolli · 2015-10-26T18:37:27Z

On my machine, whenever slope=1, inter=0. However, in this example, inter looks like it's underflowed somehow (it's huge).

Actually, that's not always true, sorry.

effigies · 2015-10-26T18:58:33Z

Slope and intercept will be coming from AnalyzeImage.to_file_map, I think.

bcipolli · 2015-10-26T20:14:21Z

I tweaked the output; it didn't make much sense to me (seemed to be querying absolute differences on relative failure cases, and vice verse).

This seems more interpretable to me, but still not making sense.

effigies · 2015-10-29T16:52:29Z

Added frequency information.

Test ID: 0.05; in_type=<i8, out_type=<u8
    slope=1.00000e+00, inter=-3.82645e+18
    ABS FAIL: exp_abs_mx_e=5.12000e+02 < abs_mx_e=6.40000e+02; max_diff=1.28000e+02; num=2/10000
    REL FAIL: rel_thresh  =0.00000e+00 < rel_mx_e=7.51984e-13; max_diff=7.51984e-13; num=7438/10000

Absolute fails 2/10000 times, so we should be able to look at individual cases. Relative fails ~75% of the time, but it's a very small error. Possibly picking an arbitrary case or two to print more details of would be instructive?

effigies · 2015-11-05T18:01:07Z

Rebase issue?

bcipolli · 2015-11-05T18:06:37Z

So the last commit seems to solve the issue. It's hard to know if that's the right fix--I don't know the new threshhold value is (ulp(scaling_type(inter))), but I'm pretty sure it's a bug in the test code.

Besides that, I'm not sure how much of the test code changes we want to keep. If that's the bug, just two lines really needed to be changed. The others are all printing / cleanup. Which I think is good, except I'm not 100% sure I understood the code well enough to be sure those changes are right. I think they improve the code, but perhaps I've just misunderstood.

@matthew-brett If you have a moment to look things over, that'd be great!

effigies · 2015-11-06T14:18:07Z

Just linking to the old commit (83cbfbf) discussion's been happening on.

Do you think this is going to be resolved soon? Since it seems to be a numpy 1.10 issue, it would be good to have it in for 2.0.2.

bcipolli · 2015-11-09T13:23:59Z

I will take another look at this, this morning. I've downloaded a Windows 32-bit virtualbox image, and am downloading Python now.

bcipolli · 2015-11-09T17:52:13Z

So... I was able to reproduce this locally. I played around a bit, but ran out of time before really debugging if nibabel can do any of the scaling computations differently to reduce the error (I was poking around in arraywriters.py). Not sure when I can get back to it; I'm off to New Orleans in the morning until next week.

effigies · 2015-11-09T18:28:50Z

Thanks, Ben.

@matthew-brett How do you feel about blocking 2.0.2 on this PR? I think this is my only block, at the moment.

matthew-brett · 2015-11-10T03:06:11Z

Failure is a bit puzzling - we don't seem to be getting it on the buildbots...

matthew-brett · 2015-11-10T07:31:57Z

But then again - if the failure is due to numpy 1.10, we wouldn't get it. I'll try to get to this tomorrow.

matthew-brett · 2015-11-11T02:44:22Z

OK - I can't replicate on a Windows 7 64-bit bit numpy 1.10.1 from Christoph Gohlke, for the 2.0.2 branch, so this seems to be specific to conda. Is that possible?

effigies · 2015-11-11T02:51:46Z

It's the 32 bit test that's failing. (Apologies if I'm missing something and uselessly stating the obvious.)

matthew-brett · 2015-11-11T03:18:43Z

OK - the 32-bit test does fail with Christoph's numpy 1.10.1 wheel - I'll investigate.

effigies · 2015-11-11T21:02:00Z

@matthew-brett Looks like the latest commit on master resolved the appveyor error. Should we close this PR and backport to 2.0.2?

matthew-brett · 2015-11-12T00:54:19Z

Unfortunately, that commit is not a fix, just a cleanup, and it fixes the failure by accident - just pumping another series of random numbers through. There still seems to be an issue on 32 bit - still looking at it...

matthew-brett · 2015-11-12T09:02:27Z

Still exploring, it's slow I'm afraid.

matthew-brett · 2015-11-13T05:17:40Z

OK - I think I know what's going on now. I don't know why numpy-1.10.1 exposes it, but what is happening is that the values being tested are right at the extreme end of values that can be saved as np.uint64 after subtracting the minimum. The 32-bit builds we are using are built with MSVC, and they have a bug in converting to np.uint64, worked around here : https://github.com/nipy/nibabel/blob/master/nibabel/casting.py#L17

In practice what this means is the maximum np.uint64 value we can generate from a float is a lot smaller on these systems than on any other system, and we are more likely to hit this threshold with our random numbers.

At the moment I'm checking whether we can get away with this kind of intercept-only scaling here : https://github.com/nipy/nibabel/blob/master/nibabel/arraywriters.py#L562

I need to think a little whether I should relax the threshold or make the check tighter to flip to full slope and intercept scaling in this situation. Will try and do that tomorrow.

matthew-brett · 2015-11-14T04:59:14Z

No, sorry, I was wrong - it's a numpy + MKL + Windows bug:

numpy/numpy#6685

So, I think we can close this.

bcipolli · 2015-11-14T05:18:44Z

Thanks for figuring this out @matthew-brett; I agree.

matthew-brett · 2015-11-14T05:23:51Z

Chris - Ben - would you mind sending me your public ssh keys over email? I want to give you access to the buildbot machines to make it easier for you to debug if you need to.

effigies mentioned this pull request Nov 3, 2015

Bug fix release #373

Closed

Ben Cipollini and others added 6 commits November 5, 2015 10:02

figure out appveyor issue

38c2251

Add error frequency information

3d6e7f6

STY: minor code cleanup.

4fd1682

RF: put bad parameter checks into check_params.

556c6f8

TST: Print extra test failure info.

25db687

Relative threshhold should not be zero!

ff8c46e

bcipolli closed this Nov 14, 2015

bcipolli deleted the appveyor branch November 14, 2015 05:18

effigies mentioned this pull request Nov 14, 2015

MRG: Release candidate 2.0.2 #377

Merged

matthew-brett mentioned this pull request Apr 20, 2016

TST: Add testing on Python 3.5 on Appveyor. #446

Merged

WIP: figure out appveyor issue #367

WIP: figure out appveyor issue #367

Uh oh!

Conversation

bcipolli commented Oct 26, 2015

Uh oh!

effigies commented Oct 26, 2015

Uh oh!

bcipolli commented Oct 26, 2015

Uh oh!

bcipolli commented Oct 26, 2015

Uh oh!

effigies commented Oct 26, 2015

Uh oh!

bcipolli commented Oct 26, 2015

Uh oh!

bcipolli commented Oct 26, 2015

Uh oh!

effigies commented Oct 26, 2015

Uh oh!

bcipolli commented Oct 26, 2015

Uh oh!

effigies commented Oct 29, 2015

Uh oh!

effigies commented Nov 5, 2015

Uh oh!

bcipolli commented Nov 5, 2015

Uh oh!

effigies commented Nov 6, 2015

Uh oh!

bcipolli commented Nov 9, 2015

Uh oh!

bcipolli commented Nov 9, 2015

Uh oh!

effigies commented Nov 9, 2015

Uh oh!

matthew-brett commented Nov 10, 2015

Uh oh!

matthew-brett commented Nov 10, 2015

Uh oh!

matthew-brett commented Nov 11, 2015

Uh oh!

effigies commented Nov 11, 2015

Uh oh!

matthew-brett commented Nov 11, 2015

Uh oh!

effigies commented Nov 11, 2015

Uh oh!

matthew-brett commented Nov 12, 2015

Uh oh!

matthew-brett commented Nov 12, 2015

Uh oh!

matthew-brett commented Nov 13, 2015

Uh oh!

matthew-brett commented Nov 14, 2015

Uh oh!

bcipolli commented Nov 14, 2015

Uh oh!

matthew-brett commented Nov 14, 2015

Uh oh!

Uh oh!