NF nib-diff #617

chrispycheng · 2018-04-05T15:50:34Z

Initial commits are duplicated with PR: RF nib-ls move which has since been merged

Example:

(dev) Chriss-MacBook:20170901 chrischeng$ nib-diff -t spmT_0001.nii.gz bold_1slice.nii.gz
Field      spmT_0001.nii.gz                             bold_1slice.nii.gz                           
dim        [3, 121, 145, 121, 1, 1, 1, 1]               [4, 40, 20, 1, 121, 1, 1, 1]                 
datatype   16                                           4                                            
bitpix     32                                           16                                           
pixdim     [-1.0, 1.5, 1.5, 1.5, 1.0, 1.0, 1.0, 1.0]    [-1.0, 3.0999999, 3.75, 3.75, 2.5, 0.0, 0.0, 0.0]
cal_max    0.0                                          2623.0                                       
descrip    b'SPM{T_[1066.0]} - contrast 1: AA-GG'       b'FSL5.0'                                    
qform_code 2                                            1                                            
sform_code 2                                            1                                            
qoffset_x  90.0                                         60.44999694824219                            
qoffset_y  -126.0                                       -35.625                                      
qoffset_z  -72.0                                        0.0                                          
srow_x     [-1.5, 0.0, 0.0, 90.0]                       [-3.0999999, 0.0, 0.0, 60.449997]            
srow_y     [0.0, 1.5, 0.0, -126.0]                      [0.0, 3.75, 0.0, -35.625]                    
srow_z     [0.0, 0.0, 1.5, -72.0]                       [0.0, 0.0, 3.75, 0.0]

TODOs:

robust tests

Future diff (to be done in separate PR)

allow for numeric differences in the values ( of header fields, or data)

…two different headers

…nto nf-nib-diff

…uing errors

…cts method

…ent, implemented yarik suggestions

…been addressed

yarikoptic

I left some comments around but I think we need to step back a little and now that you have coded this logic for awhile try to take a fresh look, since I think code could be greatly improved to avoid

ambiguity (all the try/except for Attribute/Value Errors)
redundancy (comment about diff_headers and then diff_header_fields)
and also make it shorter, cleaner, and pretty ;-)

Let's start with the "core" function diff_values. The task -- make it cuter

it should accept not just 2 values but any number of them (diff_values(*values))
you build your logic using any/all and map or generators or list comprehensions, while converting following human language statements into python
- value0 is the first value
- if any of the types different from the type of value0 - there is difference
- if value0 is an array, then all should be equal to it in every element of an array
- if value0 is not an array, then all should be equal to it
If tests pass now, they should pass after this refactoring

After you are done with that, try to express in Python the ultimate diff_headers which would

get a set of all header fields across all dicts provided to it
return a dict mapping field to a list of values if diff_values given those values returned True

Then given such a dict you would do all the string conversions only for print out, not for logic of comparison (like done now in diff_header_fields)

Makes sense? ;)

yarikoptic · 2018-07-05T18:56:37Z

nibabel/cmdline/diff.py

+                    keyed_inputs.append(list(field_value))
+
+        except UnboundLocalError:
+            continue


this doesn't look "kosher" -- try to make code explicit to not have some undefined local variables used

yarikoptic · 2018-07-05T18:57:51Z

nibabel/cmdline/diff.py

+            continue
+
+    for i in range(len(keyed_inputs)):
+        keyed_inputs[i] = str(keyed_inputs[i])


here and in general in Python try to avoid explicit indexing. Eg. here it is just a list comprehension (or even a map):

keyed_inputs = [str(x) for x in keyed_inputs]

or

# list() because map in PY3 is a generator keyed_inputs = list(map(str, keyed_inputs)))

yarikoptic · 2018-07-05T19:00:01Z

nibabel/cmdline/diff.py

+
+                if data_diff:
+                    break
+            except ValueError:


I have no idea ValueError could/should happen here (no comment etc) but I do not think it should be considered as "there is no difference"

yarikoptic · 2018-07-05T19:03:49Z

nibabel/cmdline/diff.py

+        return headers
+
+
+def diff_header_fields(header_field, files):


since it is just a single header_field, better rename function to correspond ;)

Also you also have get_headers_diff which is just the one which goes over multiple header fields.
So, please make names and signature consistent between them and more descriptive, also while renaming files into something closer to the current state of affairs, e.g. diff_header_fields(headers, fields) and diff_header_field(headers, field) ?

adjust also docstring to correspond. e.g.

headers: list of dict Header records to be compared field: str Name of the header field to compare

In PyCharm you could easily rename functions using Refactor -> Rename function.

But then I also spotted diff_headers above ;-) so it seems that there is duplicate functionality:

diff_headers now just returns header fields which differ

then you go again and get the values for those fields

Why not to make get_diff_headers do what diff_headers does now?
what you need is just to go through dicts and see which fields differ (or present in one but absent in another) and return a dict with those fields which differ.

yarikoptic · 2018-07-05T19:18:41Z

nibabel/cmdline/diff.py

+                        other_field = other_field.tolist()
+
+                except AttributeError:
+                    continue


why continue? if one misses smth when others don't -- they differ

chrispycheng · 2018-07-12T01:59:57Z

Travis should be clear after this commit. AppVeyor has problems relating to its Python environments - some tests that weren't touched by this pull request started crapping out. Otherwise, aside from maybe a few more tests, hopefully this function is ready to be pulled into NiBabel.

yarikoptic

Let's polish it up, and hopefully @matthew-brett could chime in as well!
Then IMHO could be merged and improved upon later when/if needed/desired

yarikoptic · 2018-07-13T15:47:04Z

nibabel/cmdline/tests/test_utils.py

+from os.path import (dirname, join as pjoin, abspath)
+
+
+DATA_PATH = abspath(pjoin(dirname(__file__), '../../tests/data'))


there is already one defined, just use

from nibabel.testing import data_path

note that in above '/' within the path might be *nix specific and might (?) not work on Windows. That is why we use all the path.join

yarikoptic · 2018-07-13T15:55:09Z

nibabel/cmdline/tests/test_utils.py

+    assert_equal(actual_difference["qoffset_z"], expected_difference["qoffset_z"])
+    np.testing.assert_array_equal(actual_difference["srow_x"], expected_difference["srow_x"])
+    np.testing.assert_array_equal(actual_difference["srow_y"], expected_difference["srow_y"])
+    np.testing.assert_array_equal(actual_difference["srow_z"], expected_difference["srow_z"])


although I like when tests are "explicit" like this, I hate code duplication more ;-) and apparently (thanks google) we could just use np.testing.assert_array_equal(actual_difference, expected_difference)

yarikoptic · 2018-07-13T15:56:36Z

nibabel/tests/test_scripts.py

+              for f in ('example4d.nii.gz', 'example4d.nii.gz')]
+    code, stdout, stderr = run_command(['nib-diff'] + fnames2, check_code=False)
+    assert_equal(stdout, "These files are identical.")
+


what about comparing different files now? at least a basic test using those nibabel/tests/data/spmT_0001*.nii.gz ?

Don't you think that testing that get_headers_diff works is enough? My concern is that testing text output here has traditionally really messed up with Travis and AppVeyor because their Python environments for some reason print fewer significant figures than my computer does. What do you think?

in principle, you don't have to check output for 1-to-1 matching, you could check for the important segments to be there in the output (that fields you expect are listed, etc)

ideally we should have output consistent across machines.

as I've outlined in a separate email - if all the printing is moved to another function, then may be it should be tested in a dedicated to that function test. Here we just would need to check that script is indeed running and exiting as expected when we give the identical or different files, with only minimal testing of output (to verify that it is there)

raamana · 2018-07-14T14:00:11Z

nibabel/cmdline/diff.py

+        print("{:<15}".format('Field'), end="")
+
+        for f in files:
+            output = ""


Is output simply the file name or something more complicated? If not, using os.path.basename might be easier, readable and generic across OSes to obtain the filename?

raamana · 2018-07-14T14:08:57Z

nibabel/cmdline/diff.py

+        print()
+
+        for key, value in diff.items():
+            print("{:<15}".format(key), end="")


I would suggest capturing string lengths (15, 55) into variables, as hardcoding them is not a good idea, and having them in variables allows us to adapt for different environments. For example, for terminals with smaller widths etc..

And eventually the table Formatter which was used by nib-ls could be used to make it all dynamic without hardcoring much... And then eventually pyout by @kyleam could be used for the most flexible tabulator ;-)
But we thought here to get it to some viable state first and then improve upon

raamana · 2018-07-14T14:09:57Z

nibabel/cmdline/diff.py

+                item_str = str(item)
+                # Value might start/end with some invisible spacing characters so we
+                # would "condition" it on both ends a bit
+                item_str = re.sub('^[ \t]+', '<', item_str)


how is this different from using str.strip()?

Place those indicators only if anything was replaced at any end

raamana · 2018-07-14T14:11:51Z

nibabel/cmdline/diff.py

+        for key, value in diff.items():
+            print("{:<15}".format(key), end="")
+
+            for item in value:


Another idea that might be interesting to consider is whether layout differences vertically, which can scale better with number of images. Although horizontal layout looks nice for 2/3 images, which can be the default, but for a 10 images, it might be easier print each differing field into a separate block of values from all images.

Indeed, but most frequent case is actually two files and many fields differing

yarikoptic

do not catch SystemExit
remove test specific handling, just use assert_raises outside

yarikoptic · 2018-07-24T15:18:49Z

nibabel/cmdline/diff.py

+
+    except SystemExit:
+        opts = None
+        files = None


Why do you expect SystemExit to be raised anywhere in above code... by --help etc? I do not think that this function should continue going forward if that happens... would also cause breakage (e.g. files here become None but later you test len(files).
Anyways -- I do not think you want to catch anything at this point

yarikoptic · 2018-07-24T15:25:19Z

nibabel/cmdline/diff.py

+            sys.stdout.write(display_diff(files, diff))
+            raise SystemExit(1)
+        else:
+            return diff  # this functionality specifically for testing main


ideally there should be no test specific code in the main code.

chrispycheng · 2018-07-27T17:39:49Z

OK so a (-0.004%) decrease in coverage is probably not going to mean the end of the world. nib-diff in its current stage is at a good starting point for functionality in NiBabel and I think is ready for merge. @yarikoptic Moving forwards, there are several functionalities to implement that could improve it:

Layout across different terminals
TODOs listed in the code itself
Table formatting

... and more!

yarikoptic

almost there! but see comments ~~~above~~~ below and:

old comment "what about comparing different files now? at least a basic test using those nibabel/tests/data/spmT_0001*.nii.gz ?" if you aren't using them -- remove.
also add a test where you have more than 2 files, I don't think you have any. E.g. one with the same file specified 3 times (no changes), and then one where 3rd is different. See that you get correct result overall

that "dropped" test coverage even though small, shows that some critical pieces were never ran, and that there is a good chance they could not work correctly/as expected... that would lead to even more frustration if we had to fix up for it later . so let's push just a bit more in tuning/extending tests

yarikoptic · 2018-07-27T19:24:36Z

nibabel/cmdline/diff.py

+    return difference
+
+
+def get_data_md5sums(files):


I know that you would hate continued review @chrispycheng but, by seeing the function name I got confused why below it returns an empty list if there is only one unique value. So, please

rename to e.g. get_data_diff

add a docstring describing the output like you nicely did for the above get_header_diff

yarikoptic · 2018-07-27T19:27:32Z

nibabel/cmdline/diff.py

+            if np.any(value0 != value):  # special test for ndarray
+                return True
+            else:
+                return False


you can just return np.any(...) here. no need for all the return True/False dance since np.any returns a single bool here

yarikoptic · 2018-07-27T19:28:44Z

nibabel/cmdline/diff.py

+    ]
+
+    if len(set(md5sums)) == 1:
+        return []


that might be one contributor to your .000?% coverage miss (do you have codecov extension to the browser installed to see what lines aren't covered?). Apparently there is no test which verifies that you do get empty list in output whenever two files have the same data? you could make a dedicated test for this function and feed it

yarikoptic · 2018-07-27T19:30:01Z

nibabel/cmdline/diff.py

+
+    file_headers = [nib.load(f).header for f in files]
+
+    if opts:  # will almost always have a header field


I think you don't need this if opts (so no else is needed either) since they always will be there!

yarikoptic · 2018-07-27T19:30:20Z

nibabel/cmdline/diff.py

+            # TODO: header fields might vary across file types, thus prior sensing would be needed
+            header_fields = file_headers[0].keys()
+        else:
+            header_fields = opts.header_fields.split(',')


apparently has no test case to test this!

How do you test an intermediary if/else statement within a function?

invoke the command/function with options where you specify your list of fields to be used for comparison

So that would be the command parser itself? I don't think any other NiBabel function has a test for that, maybe that could be a separate PR?

yarikoptic · 2018-07-27T19:30:34Z

nibabel/cmdline/diff.py

+        raise SystemExit(1)
+
+    else:
+        out.write("These files are identical.\n")


and again no test to test this?

simple/integration test: provide two identical files for comparison!

"advanced"/unit/- logic- test: mock out the diff calls so they return no difference, and then invoke the function to get here

In my latest commit I added the simple/integration test. I'm not sure how I would go about the advanced logic test though?

chrispycheng · 2018-07-28T12:58:06Z

I'd like to state for the record that the AppVeyor fail was because of some other file! It wasn't me!

yarikoptic · 2018-08-01T19:23:23Z

@matthew-brett @effigies any final remarks to let this PR finally get in to mature further in the real world of nibabel? ;)

effigies

I made a quick pass, and no serious objections. A couple quibbly questions/comments.

effigies · 2018-07-06T13:04:24Z

nibabel/cmdline/diff.py

+import re
+import sys
+from collections import OrderedDict
+from optparse import OptionParser, Option


I wouldn't hold this PR up for this, but just FYI optparse has been deprecated, and argparse is the supported argument parser.

yeah... we should convert all the cmdline tools which still use optparse in some one PR ;)

effigies · 2018-08-01T19:30:11Z

nibabel/cmdline/diff.py

+    md5sums = [
+        hashlib.md5(np.ascontiguousarray(nib.load(f).get_data(), dtype=np.float32)).hexdigest()
+        for f in files
+    ]


Is there a reason you're using MD5 and not something more collision-resistant such as SHA256?

since MD5 is sufficient and shorter. It is unlikely that in our lifetime we would see any user who would run into a collision in this use case ;-)

effigies · 2018-08-01T19:36:19Z

nibabel/cmdline/diff.py

+
+    if diff:
+        out.write(display_diff(files, diff))
+        raise SystemExit(1)


Is this now preferred to sys.exit (or just returning a return code and having the entry point exit with that code)?

I guess it is where python violated its own Zen:

In [4]: import this The Zen of Python, by Tim Peters ... There should be one-- and preferably only one --obvious way to do it. ... In [2]: sys.exit? Docstring: exit([status]) Exit the interpreter by raising SystemExit(status).

so seems to be exactly the same thing... but it is easier to explain that "we will raise exception and then test in the test that it was raised" ;)

effigies · 2018-08-02T13:11:37Z

Okay. Since nothing's likely to depend on two separate versions of nib-diff, if we ever decide to use a stronger hash, that shouldn't cause problems.

👍 to merge.

yarikoptic · 2018-08-02T13:58:58Z

thanks! Let's merge then!
FWIW -- checksum there was just a proxy to visualize which files differ in data -- since nib-diff could consume more than 2 files, there is no easy way (besides adding a full matrix of pairs of files) to show which are actually different

yarikoptic and others added 30 commits November 3, 2017 11:56

RF: moved all the functionality from nib-ls under nibabel.cmdline

328d3bb

ENH: added a skeleton for nib-diff command

d293a20

TODO1 attempt 1: processed data type, data shape, and data offset of …

5491af4

…two different headers

tweaked to remove AnalyzeHeader but currently still has problems

949762c

added nib-diff to setup.py

93b5e09

first attempt at nib-diff that doesnt work

22804f1

removed incorrectly committed changes

f81a78b

BK: stab at the test_dict_diff

fe9c052

first attempt at diff_dicts method and diff testing file

5eb4477

Merge branch 'nf-nib-diff' of https://github.com/yarikoptic/nibabel i…

e2defb0

…nto nf-nib-diff

tried something else with header fields

5e3a767

latest attempt: restructured diff_dicts() method, troubleshooting ens…

7febf65

…uing errors

corrected misplacement of cmdline files and latest attempt at diff_di…

23a43ba

…cts method

progress! tweaked bugs, corrected rookie mistakes like cmdline placem…

fae491d

…ent, implemented yarik suggestions

got rid of proc file and function works at a basic level

3e87d81

tweaked diff_dicts to be compatible with tests

a3b35d9

got rid of None, troubleshot tests

1491c61

introduced hypothesis to use for testing with pretty sexy results

397bc03

noted hypothesis need for tests, refactored diff_dicts name

f192f65

attempt at TODO#2: allowing specification of header fields

92553a2

now functional for several header files.

7a70d56

tweak to make hypothesis work with a list, but problem above has not …

f5e930d

…been addressed

tweaked names and code as suggested!

df82a51

bug fix

6d706f5

cosmetic tweaks

774ce3b

cleaned up code

911d781

promoted generic programming and got test to work again

0458694

tried to clean code but couldnt get comprehension going

fed70e9

comment and docstring

0b59dfb

added options for text, json, yaml but still have to implement

2920abf

chrispycheng added 2 commits July 5, 2018 00:15

hypothesis: all my problems were due to that one test

10c2c42

whoops missed this

7989563

yarikoptic requested changes Jul 5, 2018

View reviewed changes

chrispycheng added 5 commits July 11, 2018 03:54

i think im going to cry. code cleaned and made more pythonic

ae74339

added and fixed tests

82b1457

fixing up appveyor and travis problems

45d0edf

fixed a fringe use case

c1f553f

style tweak for travis

2f89242

yarikoptic requested changes Jul 13, 2018

View reviewed changes

raamana reviewed Jul 14, 2018

View reviewed changes

chrispycheng added 3 commits July 15, 2018 22:35

removed duplication, made things more generic

6613522

moved functionality outside to test and increase coverage

a311d7b

boosting coverage by testing main

2cd69b5

yarikoptic requested changes Jul 24, 2018

View reviewed changes

chrispycheng added 3 commits July 27, 2018 04:44

main test corrected for max coverage

414da00

imported StringIO from six instead of io

59006b0

added a test for cmdline function

672661e

yarikoptic requested changes Jul 27, 2018

View reviewed changes

changes per Yariks comments!

baf6cdc

effigies approved these changes Aug 1, 2018

View reviewed changes

yarikoptic merged commit 0408783 into nipy:master Aug 2, 2018

		from os.path import (dirname, join as pjoin, abspath)


		DATA_PATH = abspath(pjoin(dirname(__file__), '../../tests/data'))


		file_headers = [nib.load(f).header for f in files]

		if opts: # will almost always have a header field

NF nib-diff #617

NF nib-diff #617

Uh oh!

Conversation

chrispycheng commented Apr 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrispycheng commented Jul 12, 2018

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yarikoptic Jul 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrispycheng commented Jul 27, 2018

Uh oh!

yarikoptic left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrispycheng commented Apr 5, 2018 •

edited

Loading

yarikoptic Jul 14, 2018 •

edited

Loading

yarikoptic left a comment •

edited

Loading