Skip to content

Streamlines API #243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 56 commits into from
Closed

Streamlines API #243

wants to merge 56 commits into from

Conversation

MarcCote
Copy link
Contributor

This is an attempt at providing a general API to manage different streamlines file format. This is related to the BIAP5.

Right now, only the TRK format is supported but I can easily add support for both TCK and VTK. One difference you might found with nibabel.trackvis is that it loads the streamlines as they were saved. There is no points_space argument but instead some functions should be used to change the space where streamlines live: change_space(streamlines, ref=affine/nifti), trackvis2mrtrix(streamlines, ref=affine/nifti), etc.

I can't wait to know what you guys think about this. At some point, I really see the TractConverter being part of NiBabel.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.55%) when pulling b55e6c8 on MarcCote:streamlines into 8261397 on nipy:master.

@matthew-brett
Copy link
Member

@MrBago - any thoughts on this?

@Garyfallidis
Copy link
Member

Hello nibabel devs. Any feedback on this? It's been here from July. The way we save trk files now it's ugly and the current API does not suit with the rest of nibabel. This PR I think is a great improvement. I hope you can have a look at this soon. Dipy would greatly benefit from this as we use the trk format all the time. Thx!

@matthew-brett
Copy link
Member

Sorry - I was hoping for some feedback from streamlines users. @MrBago - any thoughts?

I will have a look in the next few days.

@MrBago
Copy link

MrBago commented Sep 16, 2014

This looks good to me overall and we've needed to improve this for some time now so thanks for working on this.

In my opinion, the hardest part of doing this right is dealing with the coordinate systems. Specifically I think that if a user saves a '.trk' file, for example, they expect the points to be remapped so that "Trackvis" will display the streamlines correctly with respect to their background images. I know this is a PITA and I agree that this is a flaw in the ".trk" format (the format should use the affine information in the header to figure out the coordinate system, but that's just not how this format is used). I don't think points need to get remapped when they are read, as long there is a consistent way of getting the coordinate information like the get_affine method provided by nibabel's images, (nibabel provides a get_affine field for all files (afak) even if they don't have an affine filed in the header).

I guess my point is that I'd like to push the abstraction here just a little bit more so that users can save and load files without knowing anything about the implementation of the specific file format they are using. I think you've already 90% of the way there, but we should really get to this tipping point because it'll make the code that much better.

@MrBago
Copy link

MrBago commented Sep 29, 2014

@MarcCote, what's the status of this PR? Can I help in some way?

@MarcCote
Copy link
Contributor Author

I'm not convince yet that '.trk' should always be saved as "Trackvis" wants them. Maybe for 'trk' and 'tck' format it is clear they are associated to 'Trackvis' and 'MRtrix' but some softwares save their streamlines using the 'vtk' format (e.g. ExploreDTI) and it is not clear what space their streamlines live in.

That is why I preferred loading/saving streamlines as is and was planning on using name of the software to create transformation functions for streamlines: trackvis2mrtrix, exploredti2trackvis, etc. I agree that it will be more user friendly to do it automatically instead of having to load, transform then save the streamlines manually. For example, we could add different loading/saving functions: load_trackvis, load_mrtrix, save_trackvis, save_mrtrix, etc. But in any case, those functions will require passing some sort of reference image from which to get the affine/dimension/size from.

As proposed by @jchoude, maybe it is better to modify the streamlines.load and streamlines.save functions so the user is force to provide where the streamlines come from or will be send to using the software's name, an affine/nifti or the keyword "as-is".

@MrBago what do you think about the last approach?

@MrBago
Copy link

MrBago commented Nov 22, 2014

I think the goal in support existing file formats (especially file formats that seems kind of half baked to begin with) is interoperability with other software packages. To that end, I think we should strive to create files that conform to written specs and unwritten norms that have formed for the given extension types.

For example if we choose to write ".tck" files, I think those files should be usable in any software that knows about and can use ".tck" files. We don't have to support all the features of a file format, but I don't think we should write files which will be misinterpreted by other software tools.

Also when reading streamlines, I think a user should be able to use the streamlines without knowing what type of file they came from. Much the way that nibabel current allows me to read an analyze, nifti or MGH image and then use the image object pretty much the same way.

Please let me know if I've misunderstood what you're proposing.

@Garyfallidis
Copy link
Member

Knock! Knock!

@MrBago
Copy link

MrBago commented Jan 9, 2015

On Mon, Sep 29, 2014 at 7:50 AM, Marc-Alexandre Côté <
[email protected]> wrote:

I'm not convince yet that '.trk' should always be saved as "Trackvis"
wants them. Maybe for 'trk' and 'tck' format it is clear they are
associated to 'Trackvis' and 'MRtrix' but some softwares save their
streamlines using the 'vtk' format (e.g. ExploreDTI) and it is not clear
what space their streamlines live in.

I'm not sure what you mean here. If we save ".trk" file that Trackvis
cannot properly interpret, presumably anyone else reading it will also
misinterpret it. When there is well defined standard, we should comply with
that standard. If there is no standard we should ask ourselves why we're
supporting that given file type and go from there. In the latter case, I
believe the answer will almost always be "so that we can pass data between
dipy and other software packages". If that's the case, we need to write
files those other packages can read properly.

That is why I preferred loading/saving streamlines as is and was planning
on using name of the software to create transformation functions for
streamlines: trackvis2mrtrix, exploredti2trackvis, etc. I agree that it
will be more user friendly to do it automatically instead of having to
load, transform then save the streamlines manually. For example, we could
add different loading/saving functions: load_trackvis, load_mrtrix,
save_trackvis, save_mrtrix, etc. But in any case, those functions will
require passing some sort of reference image from which to get the
affine/dimension/size from.

I have to disagree here. This type of thing becomes very hard to support
long term. We've seen this recently with mrtrix when they changed their SH
basis between mrtrix2 and mrtrix3. If there are N software packages you
could need up to N**2 conversion functions and every time they change their
minds about how they do things, you need to go back and deal with it. I
think a much better approach is to have one consistent internal
dipy/nibabel representation and convert in and out of that representation
on read/write.

Sorry it took me so long to get back to you.

@MarcCote
Copy link
Contributor Author

Sorry, I didn't realize you were waiting for my reply.

I agree with what you said. Using a consistent internal representation is the approach I used when implementing the TractConverter. It required having to specify a reference point either as an affine matrix, a nifti image or the convention's name (e.g. LPS) so we can project the streamlines in a predetermined space. Some users found it a little bit confusing because depending on the format you don't have to specify a reference point (TRK vs TCK).

We should determine and use an internal representation. What is the most convenient representation to manipulate streamlines when used in Dipy? Voxel or mm, RAS or something else, the coordinate (0,0,0) represents the center of voxel or one of its corner.

@MrBago
Copy link

MrBago commented Jan 12, 2015

In dipy we've been using an affine to represent the mapping between voxel
indices and streamline points. I like this because it gives us maximum
flexibility in terms of supporting different coordinate systems and such.
It also means that we can save and load streamlines "as-is" as long as we
can encode/decode the affine on read write. For formats that don't support
encoding the affine, I think we should map the points into some space that
the format does support when saving.

Bago

@Jan-Schreiber
Copy link

This sounds similar to reading and writing csv files with different encodings, delimiters, etc.
What about using a standard interface and defining dialects for different applications as well as other parameters like coordinate systems, comments, ascii/ binary, etc.?
nib.load_streamlines()
nib.save_streamlines()

Similar to loading volume data one could define functions like get_affine(), get_vertex_values(), get_streamline_values(), etc. whatever the file formats support.

I guess, when writing streamline files we can specify to be compatible with some common software packages and when reading files there might be guesswork. If the file format does not tell which coordinate system it uses one has to guess or follow a given dialect. And it would be great to have a standard way to transform coordinates after loading.

Personally, for vtk and tck files I would prefer the world coordinate system (because this is what e.g. ParaView and mrview use) but internally we should get voxel coordinates for best interaction with volumes.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.27%) to 93.05% when pulling 229859a on MarcCote:streamlines into 96d474c on nipy:master.

@MarcCote
Copy link
Contributor Author

MarcCote commented Mar 4, 2015

Following @MrBago advices, I modified the streamlines API so Streamlines objects use one consistent internal representation and have an affine matrix to bring them back to the world space (RAS+).

Specifically, when loading from a file, streamlines will be projected in voxel space. I agree with @Jan-Schreiber that interactions with volumes are simpler that way (which is probably the main reason someone would want to load streamlines using NiBabel).

It worth mentioning that in voxel space, the X coordinate of any streamlines point will lie between [-0.5, width-0.5) (and similarly for the Y and Z coordinates) since a coordinate with integer value, e.g. (0,0,0), represents the centre of a voxel. This means that one will have to add 0.5 then cast to int every coordinates before interacting with volumes. If I'm not mistaken functions in dipy take already that in count though.

On another point, I'm starting to add support for other streamlines file format (e.g. TCK) and I was wondering what metadata should be kept when converting streamlines from one format to another? Right now, a Streamlines object has a header property (a Python dict) which is filled with available metadata. I tried to standardize keys used in header for common metadata (you can see the current list here). I guess my question is simply should we transfer metadata that is specific to a format? This would allow to not loose information if we decide to convert a streamlines file back to its original format.

@MrBago
Copy link

MrBago commented Mar 4, 2015

Just to let you guys know how things work in dipy, streamlines are allowed to be represented in an arbitrary coordinate system. Dipy functions that require streamlines to interact with volumes require the streamlines and the affine which maps the volume voxels to the the streamlines coordinate system.

For example, if the streamlines are in the (RAS+) coordinate system of a nifti file, than the affine of the nifti file should be used.

Dipy tracking classes also take a data volume and an affine. They return the streamlines in the affine coordinate system. IE a point in the center of voxel [1, 2, 3] would be at dot(affine, [1, 2, 3, 1]).

I think on reading returning the streamlines in voxel-coordinates is ok, but I think returning them in any other coordinate system would also be fine as long as the affine is also returned.

Does this PR require that streamlines to be saved are moved to voxel coordinates before they are given to the save functions?

@MarcCote
Copy link
Contributor Author

MarcCote commented Mar 4, 2015

Yes, I agree with you, returning streamlines in any coordinate system would be fine as long you provide the affine matrix (volume voxel coordinates -> streamlines coordinates). I just think having them already in voxel coordinates would be simpler for interacting with volume (no need to apply the inverse affine matrix). Right now, nib.streamlines.load has a reference space parameter (either a matrix or a nifti file), that use to project the streamlines back to voxel space if needed (depends on the file format).

Also, streamlines can be created from a list of points if desired (no need to be saved). For example, creating 10 random streamlines each having 20 points in 3D would look like this

>>> s = Streamlines(points=np.random.rand(10, 20, 3))
>>> s.header[Field.VOXEL_TO_WORLD]
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

>>> len(s)
10

So Streamlines objects are not force to be in voxel space and contain an affine matrix in their header (default is identity matrix). However, nib.streamlines.save function assumes Streamlines object are in voxel space and has a parameter allowing to specify a reference space (either using a matrix or a nifti file).

Whenever you have time I would appreciate some feedback about the two main files base_format.py and header.py
https://github.com/nipy/nibabel/pull/243/files#diff-555096850fc1b3fb1dfe6aa5644e3809
https://github.com/nipy/nibabel/pull/243/files#diff-3088f83d47b55090ba45b250ca5639af

Do you think it is a good approach having a Streamlines class regrouping all information about a streamline (scalars, properties, affine matrix, etc.)?
Once we reach consensus, I think it will be easier to support other format.

@MrBago
Copy link

MrBago commented Mar 5, 2015

Will take a closer look next week. It's a pretty hefty PR, 2000+ lines but
I'll do my best :)

Bago

On Wed, Mar 4, 2015 at 10:56 AM, Marc-Alexandre Côté <
[email protected]> wrote:

Yes, I agree with you, returning streamlines in any coordinate system
would be fine as long you provide the affine matrix (volume voxel
coordinates -> streamlines coordinates). I just think having them already
in voxel coordinates would be simpler for interacting with volume (no need
to apply the inverse affine matrix). Right now, nib.streamlines.load has
a reference space parameter (either a matrix or a nifti file), that use to
project the streamlines back to voxel space if needed (depends on the file
format).

Also, streamlines can be created from a list of points if desired (no need
to be saved). For example, creating 10 random streamlines each having 20
points in 3D would look like this

s = Streamlines(points=np.random.rand(10, 20, 3))>>> s.header[Field.VOXEL_TO_WORLD]
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
len(s)10

So Streamlines objects are not force to be in voxel space and contain an
affine matrix in their header (default is identity matrix). However,
nib.streamlines.save function assumes Streamlines object are in voxel
space and has a parameter allowing to specify a reference space (either
using a matrix or a nifti file).

Whenever you have time I would appreciate some feedback about the two main
files base_format.py and header.py

https://github.com/nipy/nibabel/pull/243/files#diff-555096850fc1b3fb1dfe6aa5644e3809

https://github.com/nipy/nibabel/pull/243/files#diff-3088f83d47b55090ba45b250ca5639af

Do you think it is a good approach having a Streamlines class regrouping
all information about a streamline (scalars, properties, affine matrix,
etc.)?
Once we reach consensus, I think it will be easier to support other format.


Reply to this email directly or view it on GitHub
#243 (comment).

@MarcCote
Copy link
Contributor Author

@MrBago do you know if TrackVis (and the diffusion toolkit) considers the coordinate (0,0,0) of a streamline's point to be in the middle of the voxel or the corner? I know that MRtrix uses the former convention.

@MrBago
Copy link

MrBago commented Mar 20, 2015

Trackvis uses it as the corner, ie all negative points are outside the
image volume.

Bago

On Thu, Mar 19, 2015, 8:38 AM Marc-Alexandre Côté [email protected]
wrote:

@MrBago https://github.com/MrBago do you know if TrackVis (and the
diffusion toolkit) considers the coordinate (0,0,0) of a streamline's point
to be in the middle of the voxel or the corner? I know that MRtrix uses the
former convention.


Reply to this email directly or view it on GitHub
#243 (comment).

@MarcCote MarcCote changed the title WIP: Streamlines API Streamlines API Apr 17, 2015
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.68%) to 92.64% when pulling 7dfd83e on MarcCote:streamlines into 96d474c on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.77%) to 92.56% when pulling cc26ae6 on MarcCote:streamlines into 96d474c on nipy:master.

@MarcCote
Copy link
Contributor Author

@MrBago I refactored a lot of code and added a lot of comments. I'm sure it will be easier to read as it is now.

Now, a Streamlines object s contains the affine matrix (s.header.to_world_space) needed to send the streamlines back to the world space. The convention I follow is still to have the streamlines in voxel space when loaded from a file. Afterwards, if one projects the streamlines into another space, the affine matrix will be updated as long as it uses the method made for that e.g. s2 = s.transform(affine, lazy=False). So, unless modified explicitly, the affine will contain the information to bring the streamlines back to world space.

@MarcCote
Copy link
Contributor Author

@matthew-brett how can I see what coverage I'm missing exactly (which lines). When I click on the details link for "coverage/coverall", it brings me to a page where I can that the file ...ython2.7/site-packages/nibabel/testing/__init__.py has decreased in coverage. However, when I click on it, it says source not available!

@larsoner
Copy link
Contributor

@matthew-brett I struggled with that problem for a while trying symlinks and other solutions, the only solution seemed to be running from the package source directory. So for mne-python we run all tests except one Travis run from the source directory, and that last test we don't do a coverage report, but it checks to make sure everything still works as an installed package.

@MarcCote you should be able to do something like nosetests --with-coverage from the nibabel root directory, and it will print out a coverage report. You can also look into other options like --cover-html (I think?) for a more nicely formatted version.

# voxel whereas streamlines returned assume (0,0,0) to be the
# center of the voxel. Thus, streamlines are shifted of half
#a voxel.
pts -= np.array(self.header[Field.VOXEL_SIZES])/2.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? after the points have been divided by VOXEL_SIZE don't you just want to do -.5?

I think you want (pts - voxel_size / 2) / voxel_size or ( pts / voxel_size - .5).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch, you are definitively right. Clearly, it is missing unit tests!
Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For IO stuff I like to include a write-read test to at least make sure that
I can at least recreate in python what I started with :).

On Thu, May 14, 2015 at 12:58 PM Marc-Alexandre Côté <
[email protected]> wrote:

In nibabel/streamlines/trk.py
#243 (comment):

  •                break
    
  •            # Read number of points of the next streamline.
    
  •            nb_pts = struct.unpack(i4_dtype.str[:-1], nb_pts_str)[0]
    
  •            # Read streamline's data
    
  •            pts, scalars = read_pts_and_scalars(nb_pts)
    
  •            properties = read_properties()
    
  •            # TRK's streamlines are in 'voxelmm' space, we send them to voxel space.
    
  •            pts = pts / self.header[Field.VOXEL_SIZES]
    
  •            # TrackVis considers coordinate (0,0,0) to be the corner of the
    
  •            # voxel whereas streamlines returned assume (0,0,0) to be the
    
  •            # center of the voxel. Thus, streamlines are shifted of half
    
  •            #a voxel.
    
  •            pts -= np.array(self.header[Field.VOXEL_SIZES])/2.
    

Yep, you are definitively right. Clearly, it is missing unit tests!

I'll go with the -.5 fix then.


Reply to this email directly or view it on GitHub
https://github.com/nipy/nibabel/pull/243/files#r30356542.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I have that test but with an identity as the affine, so np.array(self.header[Field.VOXEL_SIZES])/2 == 0.5.

@matthew-brett
Copy link
Member

Guys - where are we on this one? Bago - would you be interested in taking charge of this one, and merging when you think it's ready?

@MarcCote
Copy link
Contributor Author

I've been improving and refactoring the API a lot and I'm afraid this PR is becoming too big for a thorough review. I propose to split it in two: 1) Tractogram 2) TractogramFile.

Tractogram
This part has the code to handle streamlines in memory using the Python class Tractogram. This class provides a mechanism to keep data along the points of every streamline. It can also keep data along each streamline. Internally, it uses what I call a CompactList to store efficiently the points of the streamlines. This was needed as @Garyfallidis and I realized that a list of numpy arrays has to much overhead in term of memory when handling >1M streamlines (loading a tractogram of 2.5Gb could take up to 5Gb in RAM!).

TractogramFile
This part has the code to save and load Tractogram on disk. Right now, as a proof of concept, only the TRK file format is supported.

def _read():
for pts, scals, props in trk_reader:
data_for_points = {k: scals[:, v] for k, v in data_per_point_slice.items()}
data_for_streamline = {k: props[v] for k, v in data_per_streamline_slice.items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Python 2.6 compatibility:

data_for_points = dict((k, scals[:, v]) for k, v in data_per_point_slice.items())
data_for_streamline = dict((k, props[v]) for k, v in data_per_streamline_slice.items())

@Garyfallidis
Copy link
Member

@MarcCote this is an excellent work. But I think it is not a good idea to push this completely new design in this older PR because now it has changed some much and most of the previous discussions are not relevant any more. That makes reviewing this PR hard.

I think to help the reviewers it will be better if you make a new PR (and close this one) and write in the description which are the main new features of this PR and the specific design decisions with a short explanation. For example, now we go to RAS+ by default, why?. Also we introduce the concept of a Tractogram which is a higher level of streamlines where you can save more info on the streamlines but we don't save header information there. This goes to the TractogramFile etc. But why we do such a thing? This will help a lot. Stay laconic but give the correct info please.

This PR is absolutely crucial for the work we do in dMRI and it will be used by many people. It also solves an important memory/speed/management issue in Dipy. So, please @jchoude, @mdesco, @gabknight, @MrBago, @Jan-Schreiber, @martcous and @arokem try this PR and give some nice reviews asap so we can finally have a stable way of reading and saving streamline-related files. We really need to get better at this ...

Also, about splitting in two files maybe it's not a good idea because it will make it difficult to try this with real data. The PR is indeed quite large but I don't see an easy way of splitting it and trying it with real data.

Finally, @matthew-brett we worked together with @MarcCote to make this PR and we tried to make the API look as close as what has been done for the Nifti files so that the API looks homogeneous. This PR should be merged before the 0.11 Dipy release because many new PRs depend on this new API which as I wrote above simplifies many current issues both of file formats for tractography and internal Dipy design issues. So, I would hope that it can be prioritized and be merged relatively soon. Of course with some quality reviews first. But as you can see by yourself @MarcCote has already done some excellent work here. The day were loading/saving streamlines is as easy as loading/saving nifti files is closer than ever :) 👍

except:
return False

return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will consume generators:

>>> def f():
...    for i in range(5):
...        yield i
...
>>> x = f()
>>> isiterable(x)
True
>>> list(x)
[]

What about:

def isiterable(val):
    try:
        iter(val)
    except TypeError:
        return False
    return True

Edit: Apparently you can also just import collections and use isinstance(val, collections.Iterable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. However, isinstance(val, collections.Iterable) simply checks if the object has an __iter__ method instead of testing that you can actually iterate through it. The same goes for testing that calling iter is not failling.

I will agree that isiterable might not be the right name for this function but I lack a better one right now (if you have suggestion let me know). I use this isiterable function to loop through all elements of a given object, catching exception if any (could happen for my objects).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I didn't look deeply into how it was being used. Maybe something like check_iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I replaced isiterable by check_iteration.

@samuelstjean
Copy link
Contributor

Ah yes, playing with different streamline format is indeed a horrible experience (and I just started playing with it), so prioritising this instead of playing with a ton of converters is a good idea.

@MarcCote MarcCote mentioned this pull request Dec 3, 2015
@MarcCote
Copy link
Contributor Author

MarcCote commented Dec 3, 2015

Agreeing with @Garyfallidis, I'm closing this one. See PR #391 instead.

... it is not a good idea to push this completely new design in this older PR because now it has changed some much and most of the previous discussions are not relevant any more. That makes reviewing this PR hard.

@MarcCote MarcCote closed this Dec 3, 2015
@MarcCote MarcCote deleted the streamlines branch December 3, 2015 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.