Selecting on `data_per_streamline` #729

arokem · 2019-02-26T21:45:05Z

I am trying to use the data_per_streamline attribute to select streamlines from within a Tractogram object. I am running into the following behavior.

I initialize a very simple tractogram object. The first streamline has the data_per_streamline foo key set to 0 and the other two streamlines have this set to 1:

import numpy as np 
import nibabel.streamlines as sl 
  
tgram = sl.Tractogram([np.array([[0, 0, 0], [1, 1, 1]]), 
                       np.array([[2, 2, 2], [3, 3, 3], [4, 4, 4]]), 
                       np.array([[5, 5, 5], [6, 6, 6]])], 
                       data_per_streamline={'foo': [[0],[1],[1]]})

I can find the right streamlines using np.where:

idx = np.where(tgram.data_per_streamline['foo'] == 1)[0] 
idx
array([1, 2])

It even looks like it's doing the right thing when I use this for indexing:

tgram[idx].streamlines                                                   
ArraySequence([array([[2, 2, 2],
       [3, 3, 3],
       [4, 4, 4]]), array([[5, 5, 5],
       [6, 6, 6]])])

But under the hood:

tgram[idx].streamlines._data                                             
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4],
       [5, 5, 5],
       [6, 6, 6]])

Why is it still holding onto all of the data? Is there some way to drop the data for the first (not selected) streamline?

The text was updated successfully, but these errors were encountered:

arokem · 2019-02-26T21:46:49Z

In case you are wondering, also:

tgram.streamlines[idx]._data   
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4],
       [5, 5, 5],
       [6, 6, 6]])

arokem · 2019-02-26T22:03:02Z

Surely, I am not supposed to do the following?

new_tgram = sl.Tractogram((tgram.streamlines._data[tgram.streamlines._offsets[i]:tgram.streamlines._offsets[i]+tgram.streamlines._offsets[i]+1] for i in idx),
                           data_per_streamline={k: [tgram.data_per_streamline[k][i] for i in idx] for k in tgram.data_per_streamline.keys()})

Though it does seem to have the expected behavior (at least in this case).

See also nipy/nibabel#729

effigies · 2019-02-27T15:00:18Z

Just to be clear, it looks like you're in the situation where you load a large number of streamlines, want to make a copy of a subset, and release the memory for all of the rest? From a quick glance through the code, this use case wasn't really coded for. Perhaps it would be best to add a method to allow for making a pruned copy?

I'll note that the ArraySequence.data property simply returns _data without applying offsets.

nibabel/nibabel/streamlines/array_sequence.py

Lines 120 to 122 in ad0b13f

    
           def data(self): 
        
               """ Elements in this array sequence. """ 
        
               return self._data

If that's a bug, it may be that, after fixing, you could do something like the following:

new_tgram = sl.Tractogram(tgram.streamlines.data)

I'll rope @MarcCote in again, in hopes he can shine some light here. I'm only familiar with this API from a few recent PRs.

MarcCote · 2019-02-27T17:41:14Z

That was the intended behavior but I agree it might be unexpected when looking under the hood. The advanced indexing on ArraySequence is a lazy operation to avoid creating more memory if we can. If you want to prune the data, you should call .copy() on the ArraySequence, like so

In [5]: tgram[idx].streamlines.copy().data
Out[5]: 
array([[2, 2, 2],
       [3, 3, 3],
       [4, 4, 4],
       [5, 5, 5],
       [6, 6, 6]])

I think we want with that approach (i.e. explicitly calling .copy()) so users are aware that operation will be creating additional memory. I'm open to suggestions to make it clearer to the user?

arokem · 2019-02-28T22:14:18Z

Thanks! That actually seems rather reasonable.

More generally, I think that a documentation tutorial on streamlines would be useful to have. Maybe that's something I could do at the DIPY workshop in a couple of weeks.

MarcCote · 2019-03-01T12:38:15Z

That would be awesome. I'd be happy to review it. Sadly, I can't make the Dipy workshop but maybe I can set some time aside and join remotely.

arokem added a commit to arokem/pyAFQ-legacy that referenced this issue Feb 26, 2019

BF: Indexing into streamline objects is ... weird.

c831510

See also nipy/nibabel#729

arokem mentioned this issue Feb 28, 2019

Documentation/tutorial on streamlines #731

Open

arokem closed this as completed Feb 28, 2019

arokem mentioned this issue May 2, 2019

ENH: Statefull tractogram, robust spatial handling and IO dipy/dipy#1812

Merged

MarcCote mentioned this issue Sep 5, 2019

stateful_tractogram_post_1.0_fixes dipy/dipy#1956

Merged

MarcCote mentioned this issue Sep 16, 2019

Fixing ArraySequence functionalities #811

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Selecting on `data_per_streamline` #729

Selecting on `data_per_streamline` #729

arokem commented Feb 26, 2019

arokem commented Feb 26, 2019

Uh oh!

arokem commented Feb 26, 2019

Uh oh!

effigies commented Feb 27, 2019

Uh oh!

MarcCote commented Feb 27, 2019

Uh oh!

arokem commented Feb 28, 2019

Uh oh!

MarcCote commented Mar 1, 2019

Uh oh!

Selecting on data_per_streamline #729

Selecting on data_per_streamline #729

Comments

arokem commented Feb 26, 2019

arokem commented Feb 26, 2019

Uh oh!

arokem commented Feb 26, 2019

Uh oh!

effigies commented Feb 27, 2019

Uh oh!

MarcCote commented Feb 27, 2019

Uh oh!

arokem commented Feb 28, 2019

Uh oh!

MarcCote commented Mar 1, 2019

Uh oh!

Selecting on `data_per_streamline` #729

Selecting on `data_per_streamline` #729