Skip to content

NF nib-diff #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
Aug 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
328d3bb
RF: moved all the functionality from nib-ls under nibabel.cmdline
yarikoptic Nov 3, 2017
d293a20
ENH: added a skeleton for nib-diff command
yarikoptic Nov 3, 2017
5491af4
TODO1 attempt 1: processed data type, data shape, and data offset of …
chrispycheng Nov 6, 2017
949762c
tweaked to remove AnalyzeHeader but currently still has problems
chrispycheng Nov 16, 2017
93b5e09
added nib-diff to setup.py
chrispycheng Nov 17, 2017
22804f1
first attempt at nib-diff that doesnt work
chrispycheng Nov 17, 2017
f81a78b
removed incorrectly committed changes
chrispycheng Nov 17, 2017
fe9c052
BK: stab at the test_dict_diff
yarikoptic Nov 17, 2017
5eb4477
first attempt at diff_dicts method and diff testing file
chrispycheng Nov 28, 2017
e2defb0
Merge branch 'nf-nib-diff' of https://github.com/yarikoptic/nibabel i…
chrispycheng Nov 28, 2017
5e3a767
tried something else with header fields
chrispycheng Nov 30, 2017
7febf65
latest attempt: restructured diff_dicts() method, troubleshooting ens…
chrispycheng Nov 30, 2017
23a43ba
corrected misplacement of cmdline files and latest attempt at diff_di…
chrispycheng Dec 1, 2017
fae491d
progress! tweaked bugs, corrected rookie mistakes like cmdline placem…
chrispycheng Dec 1, 2017
3e87d81
got rid of proc file and function works at a basic level
chrispycheng Dec 2, 2017
a3b35d9
tweaked diff_dicts to be compatible with tests
chrispycheng Dec 3, 2017
1491c61
got rid of None, troubleshot tests
chrispycheng Dec 9, 2017
397bc03
introduced hypothesis to use for testing with pretty sexy results
chrispycheng Dec 15, 2017
f192f65
noted hypothesis need for tests, refactored diff_dicts name
chrispycheng Dec 16, 2017
92553a2
attempt at TODO#2: allowing specification of header fields
chrispycheng Dec 18, 2017
7a70d56
now functional for several header files.
chrispycheng Dec 21, 2017
f5e930d
tweak to make hypothesis work with a list, but problem above has not …
chrispycheng Dec 23, 2017
df82a51
tweaked names and code as suggested!
chrispycheng Dec 24, 2017
6d706f5
bug fix
chrispycheng Dec 27, 2017
774ce3b
cosmetic tweaks
chrispycheng Jan 5, 2018
911d781
cleaned up code
chrispycheng Jan 7, 2018
0458694
promoted generic programming and got test to work again
chrispycheng Jan 13, 2018
fed70e9
tried to clean code but couldnt get comprehension going
chrispycheng Jan 13, 2018
0b59dfb
comment and docstring
chrispycheng Jan 15, 2018
2920abf
added options for text, json, yaml but still have to implement
chrispycheng Jan 21, 2018
1e57409
work in progress with all outputs
chrispycheng Feb 3, 2018
fd6c474
simplified code but now bizarrely doesnt run with two files
chrispycheng Mar 14, 2018
497ad2a
beautified text output, next up json and yaml
chrispycheng Mar 16, 2018
df0aa79
modified the tests for the new diff_values
chrispycheng Mar 17, 2018
c23143c
commenting out json and yaml, cosmetic mod to ls.py
chrispycheng Mar 18, 2018
db16d85
BF: all those flags are just boolean flags and no need to repeat thei…
yarikoptic Mar 26, 2018
feca439
progress towards table formatting
chrispycheng Mar 29, 2018
acf667b
table now works and looks p good
chrispycheng Mar 29, 2018
bb3fbf0
slight tweak to table formatting
chrispycheng Mar 29, 2018
8a92010
very inefficient but successfully removed dtype when its identical
chrispycheng Apr 4, 2018
a9a572a
removed extra blank line lol
chrispycheng Apr 5, 2018
3290a66
boosting coverage
chrispycheng Apr 5, 2018
92e4ed0
Merge remote-tracking branch 'nibabel/master' into nf-nib-diff
chrispycheng Apr 5, 2018
06e8dd7
data comparison function added
chrispycheng Apr 12, 2018
8fd6995
implemented test-nib-diff
chrispycheng Apr 24, 2018
df8bc04
added another test
chrispycheng Apr 25, 2018
45d3fbf
changed travis appveyor utils per test errors
chrispycheng Apr 25, 2018
1cbf5b3
test files
chrispycheng Apr 26, 2018
45bdf64
wasnt included in last commit by accident
chrispycheng Apr 26, 2018
c600746
maybe this will work instead
chrispycheng Apr 26, 2018
e26adb5
maybe this will work instead
chrispycheng Apr 26, 2018
3802919
test fixed
chrispycheng May 5, 2018
0ce86df
Merge branch 'nf-nib-diff' of https://github.com/yarikoptic/nibabel i…
chrispycheng May 5, 2018
72bc800
fixed to be more generic
chrispycheng May 6, 2018
0cf2a8c
fixed file name issue in nib diff
chrispycheng May 24, 2018
5db2654
fixed changes as yarik requested
chrispycheng May 24, 2018
50a480e
fixed problems noted by Travis CI
chrispycheng May 31, 2018
9e155df
building in difflib to see whats going on
chrispycheng May 31, 2018
3c0c90c
Merge branch 'master' into nf-nib-diff
effigies Jun 3, 2018
f8c32b8
latest attempt at compliance changes dict to list
chrispycheng Jun 26, 2018
51733b0
Merge branch 'nf-nib-diff' of https://github.com/yarikoptic/nibabel i…
chrispycheng Jun 26, 2018
41caade
Remove unrelated to PR difference
yarikoptic Jun 27, 2018
f1cee5f
RF+BF: report that diff if any header of data differs, return dict of…
yarikoptic Jun 27, 2018
676ac70
combined into one line
chrispycheng Jun 28, 2018
f476c48
new structure, names, tests
chrispycheng Jul 4, 2018
10c2c42
hypothesis: all my problems were due to that one test
chrispycheng Jul 5, 2018
7989563
whoops missed this
chrispycheng Jul 5, 2018
ae74339
i think im going to cry. code cleaned and made more pythonic
chrispycheng Jul 11, 2018
82b1457
added and fixed tests
chrispycheng Jul 11, 2018
45d0edf
fixing up appveyor and travis problems
chrispycheng Jul 11, 2018
c1f553f
fixed a fringe use case
chrispycheng Jul 12, 2018
2f89242
style tweak for travis
chrispycheng Jul 12, 2018
6613522
removed duplication, made things more generic
chrispycheng Jul 16, 2018
a311d7b
moved functionality outside to test and increase coverage
chrispycheng Jul 17, 2018
2cd69b5
boosting coverage by testing main
chrispycheng Jul 24, 2018
414da00
main test corrected for max coverage
chrispycheng Jul 27, 2018
59006b0
imported StringIO from six instead of io
chrispycheng Jul 27, 2018
672661e
added a test for cmdline function
chrispycheng Jul 27, 2018
baf6cdc
changes per Yariks comments!
chrispycheng Jul 28, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cache:
- $HOME/.cache/pip
env:
global:
- DEPENDS="six numpy scipy matplotlib h5py pillow pydicom"
- DEPENDS="six numpy scipy matplotlib h5py pillow pydicom hypothesis"
- OPTIONAL_DEPENDS=""
- INSTALL_TYPE="setup"
- EXTRA_WHEELS="https://5cf40426d9f06eb7461d-6fe47d9331aba7cd62fc36c7196769e4.ssl.cf2.rackcdn.com"
Expand Down Expand Up @@ -95,7 +95,7 @@ before_install:
- source venv/bin/activate
- python --version # just to check
- pip install -U pip wheel # needed at one point
- retry pip install nose flake8 mock # always
- retry pip install nose flake8 mock hypothesis # always
- pip install $EXTRA_PIP_FLAGS $DEPENDS $OPTIONAL_DEPENDS
- if [ "${COVERAGE}" == "1" ]; then
pip install coverage;
Expand Down
3 changes: 1 addition & 2 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ install:
- SET PATH=%PYTHON%;%PYTHON%\Scripts;%PATH%

# Install the dependencies of the project.
- pip install numpy scipy matplotlib nose h5py mock
- pip install pydicom
- pip install numpy scipy matplotlib nose h5py mock hypothesis pydicom
- pip install .
- SET NIBABEL_DATA_DIR=%CD%\nibabel-data

Expand Down
17 changes: 17 additions & 0 deletions bin/nib-diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!python
# emacs: -*- mode: python-mode; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
#
# See COPYING file distributed along with the NiBabel package for the
# copyright and license terms.
#
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
"""
Quick diff summary for a set of neuroimaging files
"""

from nibabel.cmdline.diff import main

if __name__ == '__main__':
main()
1 change: 1 addition & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
-r requirements.txt
nose
mock
hypothesis
207 changes: 207 additions & 0 deletions nibabel/cmdline/diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
#!python
# emacs: -*- mode: python-mode; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
#
# See COPYING file distributed along with the NiBabel package for the
# copyright and license terms.
#
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
"""
Quick summary of the differences among a set of neuroimaging files
"""
from __future__ import division, print_function, absolute_import

import re
import sys
from collections import OrderedDict
from optparse import OptionParser, Option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't hold this PR up for this, but just FYI optparse has been deprecated, and argparse is the supported argument parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... we should convert all the cmdline tools which still use optparse in some one PR ;)


import numpy as np

import nibabel as nib
import nibabel.cmdline.utils
import hashlib
import os


def get_opt_parser():
# use module docstring for help output
p = OptionParser(
usage="%s [OPTIONS] [FILE ...]\n\n" % sys.argv[0] + __doc__,
version="%prog " + nib.__version__)

p.add_options([
Option("-v", "--verbose", action="count",
dest="verbose", default=0,
help="Make more noise. Could be specified multiple times"),

Option("-H", "--header-fields",
dest="header_fields", default='all',
help="Header fields (comma separated) to be printed as well (if present)"),
])

return p


def are_values_different(*values):
"""Generically compares values, returns true if different"""
value0 = values[0]
values = values[1:] # to ensure that the first value isn't compared with itself

for value in values:
try: # we sometimes don't want NaN values
if np.any(np.isnan(value0)) and np.any(np.isnan(value)): # if they're both NaN
break
elif np.any(np.isnan(value0)) or np.any(np.isnan(value)): # if only 1 is NaN
return True

except TypeError:
pass

if type(value0) != type(value): # if types are different, then we consider them different
return True
elif isinstance(value0, np.ndarray):
return np.any(value0 != value)

elif value0 != value:
return True

return False


def get_headers_diff(file_headers, names=None):
"""Get difference between headers

Parameters
----------
file_headers: list of actual headers (dicts) from files
names: list of header fields to test

Returns
-------
dict
str: list for each header field which differs, return list of
values per each file
"""
difference = OrderedDict()
fields = names

if names is None:
fields = file_headers[0].keys()

# for each header field
for field in fields:
values = [header.get(field) for header in file_headers] # get corresponding value

# if these values are different, store them in a dictionary
if are_values_different(*values):
difference[field] = values

return difference


def get_data_diff(files):
"""Get difference between md5 values

Parameters
----------
files: list of actual files

Returns
-------
list
np.array: md5 values of respective files
"""

md5sums = [
hashlib.md5(np.ascontiguousarray(nib.load(f).get_data(), dtype=np.float32)).hexdigest()
for f in files
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you're using MD5 and not something more collision-resistant such as SHA256?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since MD5 is sufficient and shorter. It is unlikely that in our lifetime we would see any user who would run into a collision in this use case ;-)


if len(set(md5sums)) == 1:
return []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that might be one contributor to your .000?% coverage miss (do you have codecov extension to the browser installed to see what lines aren't covered?). Apparently there is no test which verifies that you do get empty list in output whenever two files have the same data? you could make a dedicated test for this function and feed it


return md5sums


def display_diff(files, diff):
"""Format header differences into a nice string

Parameters
----------
files: list of files that were compared so we can print their names
diff: dict of different valued header fields

Returns
-------
str
string-formatted table of differences
"""
output = ""
field_width = "{:<15}"
value_width = "{:<55}"

output += "These files are different.\n"
output += field_width.format('Field')

for f in files:
output += value_width.format(os.path.basename(f))

output += "\n"

for key, value in diff.items():
output += field_width.format(key)

for item in value:
item_str = str(item)
# Value might start/end with some invisible spacing characters so we
# would "condition" it on both ends a bit
item_str = re.sub('^[ \t]+', '<', item_str)
item_str = re.sub('[ \t]+$', '>', item_str)
# and also replace some other invisible symbols with a question
# mark
item_str = re.sub('[\x00]', '?', item_str)
output += value_width.format(item_str)

output += "\n"

return output


def main(args=None, out=None):
"""Getting the show on the road"""
out = out or sys.stdout
parser = get_opt_parser()
(opts, files) = parser.parse_args(args)

nibabel.cmdline.utils.verbose_level = opts.verbose

if nibabel.cmdline.utils.verbose_level < 3:
# suppress nibabel format-compliance warnings
nib.imageglobals.logger.level = 50

assert len(files) >= 2, "Please enter at least two files"

file_headers = [nib.load(f).header for f in files]

# signals "all fields"
if opts.header_fields == 'all':
# TODO: header fields might vary across file types, thus prior sensing would be needed
header_fields = file_headers[0].keys()
else:
header_fields = opts.header_fields.split(',')

diff = get_headers_diff(file_headers, header_fields)
data_diff = get_data_diff(files)

if data_diff:
diff['DATA(md5)'] = data_diff

if diff:
out.write(display_diff(files, diff))
raise SystemExit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now preferred to sys.exit (or just returning a return code and having the entry point exit with that code)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is where python violated its own Zen:

In [4]: import this
The Zen of Python, by Tim Peters
...
There should be one-- and preferably only one --obvious way to do it.
...

In [2]: sys.exit?
Docstring:
exit([status])

Exit the interpreter by raising SystemExit(status).

so seems to be exactly the same thing... but it is easier to explain that "we will raise exception and then test in the test that it was raised" ;)


else:
out.write("These files are identical.\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and again no test to test this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • simple/integration test: provide two identical files for comparison!
  • "advanced"/unit/- logic- test: mock out the diff calls so they return no difference, and then invoke the function to get here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my latest commit I added the simple/integration test. I'm not sure how I would go about the advanced logic test though?

raise SystemExit(0)
Loading