-
Notifications
You must be signed in to change notification settings - Fork 262
Fixing ArraySequence functionalities #811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3b3131a
1642ba0
5345631
7400811
1b1f516
1ab8d58
b37af09
d21a980
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,8 @@ | |
|
||
import numpy as np | ||
|
||
from ..deprecated import deprecate_with_version | ||
|
||
MEGABYTE = 1024 * 1024 | ||
|
||
|
||
|
@@ -53,6 +55,37 @@ def update_seq(self, arr_seq): | |
arr_seq._lengths = np.array(self.lengths) | ||
|
||
|
||
def _define_operators(cls): | ||
""" Decorator which adds support for some Python operators. """ | ||
def _wrap(cls, op, inplace=False, unary=False): | ||
|
||
def fn_unary_op(self): | ||
return self._op(op) | ||
|
||
def fn_binary_op(self, value): | ||
return self._op(op, value, inplace=inplace) | ||
|
||
setattr(cls, op, fn_unary_op if unary else fn_binary_op) | ||
fn = getattr(cls, op) | ||
fn.__name__ = op | ||
fn.__doc__ = getattr(np.ndarray, op).__doc__ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Beautiful! I love this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! It turned out nicely. |
||
|
||
for op in ["__add__", "__sub__", "__mul__", "__mod__", "__pow__", | ||
"__floordiv__", "__truediv__", "__lshift__", "__rshift__", | ||
"__or__", "__and__", "__xor__"]: | ||
_wrap(cls, op=op, inplace=False) | ||
_wrap(cls, op="__i{}__".format(op.strip("_")), inplace=True) | ||
|
||
for op in ["__eq__", "__ne__", "__lt__", "__le__", "__gt__", "__ge__"]: | ||
_wrap(cls, op) | ||
|
||
for op in ["__neg__", "__abs__", "__invert__"]: | ||
_wrap(cls, op, unary=True) | ||
|
||
return cls | ||
|
||
|
||
@_define_operators | ||
class ArraySequence(object): | ||
""" Sequence of ndarrays having variable first dimension sizes. | ||
|
||
|
@@ -116,9 +149,42 @@ def total_nb_rows(self): | |
return np.sum(self._lengths) | ||
|
||
@property | ||
@deprecate_with_version("'ArraySequence.data' property is deprecated.\n" | ||
"Please use the 'ArraySequence.get_data()' method instead", | ||
'3.0', '4.0') | ||
def data(self): | ||
""" Elements in this array sequence. """ | ||
return self._data | ||
view = self._data.view() | ||
view.setflags(write=False) | ||
return view | ||
|
||
def get_data(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there cases where someone might want read-only access to the original data, without performing a copy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since it's not always possible to return a view on just the data belonging to a sliced/indexed I know it is bad practice but I'm expecting Dipy's methods to deal directly with That said, maybe there exists a solution but I fail to see it. I'm happy to consider alternatives. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree, I do no think that a lot of users might want a read-only access to the original data |
||
""" Returns a *copy* of the elements in this array sequence. | ||
|
||
Notes | ||
----- | ||
To modify the data on this array sequence, one can use | ||
in-place mathematical operators (e.g., `seq += ...`) or the use | ||
assignment operator (i.e, `seq[...] = value`). | ||
""" | ||
return self.copy()._data | ||
|
||
def _check_shape(self, arrseq): | ||
""" Check whether this array sequence is compatible with another. """ | ||
msg = "cannot perform operation - array sequences have different" | ||
if len(self._lengths) != len(arrseq._lengths): | ||
msg += " lengths: {} vs. {}." | ||
raise ValueError(msg.format(len(self._lengths), len(arrseq._lengths))) | ||
|
||
if self.total_nb_rows != arrseq.total_nb_rows: | ||
msg += " amount of data: {} vs. {}." | ||
raise ValueError(msg.format(self.total_nb_rows, arrseq.total_nb_rows)) | ||
|
||
if self.common_shape != arrseq.common_shape: | ||
msg += " common shape: {} vs. {}." | ||
raise ValueError(msg.format(self.common_shape, arrseq.common_shape)) | ||
|
||
return True | ||
|
||
def _get_next_offset(self): | ||
""" Offset in ``self._data`` at which to write next rowelement """ | ||
|
@@ -320,7 +386,7 @@ def __getitem__(self, idx): | |
seq._lengths = self._lengths[off_idx] | ||
return seq | ||
|
||
if isinstance(off_idx, list) or is_ndarray_of_int_or_bool(off_idx): | ||
if isinstance(off_idx, (list, range)) or is_ndarray_of_int_or_bool(off_idx): | ||
# Fancy indexing | ||
seq._offsets = self._offsets[off_idx] | ||
seq._lengths = self._lengths[off_idx] | ||
|
@@ -329,6 +395,116 @@ def __getitem__(self, idx): | |
raise TypeError("Index must be either an int, a slice, a list of int" | ||
" or a ndarray of bool! Not " + str(type(idx))) | ||
|
||
def __setitem__(self, idx, elements): | ||
""" Set sequence(s) through standard or advanced numpy indexing. | ||
|
||
Parameters | ||
---------- | ||
idx : int or slice or list or ndarray | ||
If int, index of the element to retrieve. | ||
If slice, use slicing to retrieve elements. | ||
If list, indices of the elements to retrieve. | ||
If ndarray with dtype int, indices of the elements to retrieve. | ||
If ndarray with dtype bool, only retrieve selected elements. | ||
elements: ndarray or :class:`ArraySequence` | ||
Data that will overwrite selected sequences. | ||
If `idx` is an int, `elements` is expected to be a ndarray. | ||
Otherwise, `elements` is expected a :class:`ArraySequence` object. | ||
""" | ||
if isinstance(idx, (numbers.Integral, np.integer)): | ||
start = self._offsets[idx] | ||
self._data[start:start + self._lengths[idx]] = elements | ||
return | ||
|
||
if isinstance(idx, tuple): | ||
MarcCote marked this conversation as resolved.
Show resolved
Hide resolved
|
||
off_idx = idx[0] | ||
data = self._data.__getitem__((slice(None),) + idx[1:]) | ||
else: | ||
off_idx = idx | ||
data = self._data | ||
|
||
if isinstance(off_idx, slice): # Standard list slicing | ||
offsets = self._offsets[off_idx] | ||
lengths = self._lengths[off_idx] | ||
|
||
elif isinstance(off_idx, (list, range)) or is_ndarray_of_int_or_bool(off_idx): | ||
# Fancy indexing | ||
offsets = self._offsets[off_idx] | ||
lengths = self._lengths[off_idx] | ||
|
||
else: | ||
raise TypeError("Index must be either an int, a slice, a list of int" | ||
" or a ndarray of bool! Not " + str(type(idx))) | ||
|
||
if is_array_sequence(elements): | ||
if len(lengths) != len(elements): | ||
msg = "Trying to set {} sequences with {} sequences." | ||
raise ValueError(msg.format(len(lengths), len(elements))) | ||
|
||
if sum(lengths) != elements.total_nb_rows: | ||
msg = "Trying to set {} points with {} points." | ||
raise ValueError(msg.format(sum(lengths), elements.total_nb_rows)) | ||
|
||
for o1, l1, o2, l2 in zip(offsets, lengths, elements._offsets, elements._lengths): | ||
data[o1:o1 + l1] = elements._data[o2:o2 + l2] | ||
|
||
elif isinstance(elements, numbers.Number): | ||
for o1, l1 in zip(offsets, lengths): | ||
data[o1:o1 + l1] = elements | ||
|
||
else: # Try to iterate over it. | ||
effigies marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for o1, l1, element in zip(offsets, lengths, elements): | ||
data[o1:o1 + l1] = element | ||
|
||
def _op(self, op, value=None, inplace=False): | ||
""" Applies some operator to this arraysequence. | ||
|
||
This handles both unary and binary operators with a scalar or another | ||
array sequence. Operations are performed directly on the underlying | ||
data, or a copy of it, which depends on the value of `inplace`. | ||
|
||
Parameters | ||
---------- | ||
op : str | ||
Name of the Python operator (e.g., `"__add__"`). | ||
value : scalar or :class:`ArraySequence`, optional | ||
If None, the operator is assumed to be unary. | ||
Otherwise, that value is used in the binary operation. | ||
inplace: bool, optional | ||
If False, the operation is done on a copy of this array sequence. | ||
Otherwise, this array sequence gets modified directly. | ||
""" | ||
seq = self if inplace else self.copy() | ||
|
||
if is_array_sequence(value) and seq._check_shape(value): | ||
elements = zip(seq._offsets, seq._lengths, | ||
self._offsets, self._lengths, | ||
value._offsets, value._lengths) | ||
|
||
# Change seq.dtype to match the operation resulting type. | ||
o0, l0, o1, l1, o2, l2 = next(elements) | ||
tmp = getattr(self._data[o1:o1 + l1], op)(value._data[o2:o2 + l2]) | ||
seq._data = seq._data.astype(tmp.dtype) | ||
seq._data[o0:o0 + l0] = tmp | ||
|
||
for o0, l0, o1, l1, o2, l2 in elements: | ||
seq._data[o0:o0 + l0] = getattr(self._data[o1:o1 + l1], op)(value._data[o2:o2 + l2]) | ||
|
||
else: | ||
args = [] if value is None else [value] # Dealing with unary and binary ops. | ||
elements = zip(seq._offsets, seq._lengths, self._offsets, self._lengths) | ||
|
||
# Change seq.dtype to match the operation resulting type. | ||
o0, l0, o1, l1 = next(elements) | ||
tmp = getattr(self._data[o1:o1 + l1], op)(*args) | ||
seq._data = seq._data.astype(tmp.dtype) | ||
seq._data[o0:o0 + l0] = tmp | ||
|
||
for o0, l0, o1, l1 in elements: | ||
seq._data[o0:o0 + l0] = getattr(self._data[o1:o1 + l1], op)(*args) | ||
|
||
return seq | ||
|
||
def __iter__(self): | ||
if len(self._lengths) != len(self._offsets): | ||
raise ValueError("ArraySequence object corrupted:" | ||
|
@@ -371,7 +547,7 @@ def load(cls, filename): | |
return seq | ||
|
||
|
||
def create_arraysequences_from_generator(gen, n): | ||
def create_arraysequences_from_generator(gen, n, buffer_sizes=None): | ||
""" Creates :class:`ArraySequence` objects from a generator yielding tuples | ||
|
||
Parameters | ||
|
@@ -381,8 +557,13 @@ def create_arraysequences_from_generator(gen, n): | |
array sequences. | ||
n : int | ||
Number of :class:`ArraySequences` object to create. | ||
buffer_sizes : list of float, optional | ||
Sizes (in Mb) for each ArraySequence's buffer. | ||
""" | ||
seqs = [ArraySequence() for _ in range(n)] | ||
if buffer_sizes is None: | ||
buffer_sizes = [4] * n | ||
|
||
seqs = [ArraySequence(buffer_size=size) for size in buffer_sizes] | ||
for data in gen: | ||
for i, seq in enumerate(seqs): | ||
if data[i].nbytes > 0: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this an abstract class, instead of a decorator? Right now this produces a pretty uninformative
pydoc
entry:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I can do it with an abstract class exactly. Do you mean defining all the methods one by one in the abstract, then make
ArraySequence
subclass it? If that's the case, I might as well as defining them directly inArraySequence
. Also, I was trying to keep the number of new lines to a minimum.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess if it's only ever one class, that's one thing. Perhaps we can change them to proper methods and give them reasonable docstrings.
I've looked around at numbers and numpy to see if there was a base class we could use, but doesn't look like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies So, I'm going to remove the decorator in favor of explicitly listing all the operators. Sounds good?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine with me.
To be clear, though, what I was suggesting in my last comment was replacing the
lambda
with a full function, solambda self: self.op(op)
would becomedef fn(self): return self._op(op)
, and then you could add a docstring withfn.__doc__ = ...
before assigning it.But whatever's the least work is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using
def fn(self): return self._op(op)
. Not a big fan of lambda, but it is a personal preference...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, gotcha. That makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies done. Here's what it looks like now.