Support for jagged array #1482

mitar · 2017-07-19T08:41:45Z

Maybe I am misunderstanding something, but xarray does not support jagged arrays? I would like to store images as arrays into a multi-dimensional array, but every image is potentially of a different size. So by one dimension, every value can be a differently sizes array. Is this supported by xarray?

fmaussion · 2017-07-19T08:52:20Z

"Supported", yes, in the sense that you can create a DataArray for each of your differently sized arrays without any problem. If you want to store them all in a Dataset, you'll have to give a different dimension name for each new dimension, which can be clumsy.

However, it is true that xarray shines at handling more structured data and that most examples in the docs are those of dataset variables sharing similar dimensions. What kind of "support" exactly were you thinking of?

mitar · 2017-07-19T09:15:00Z

If you want to store them all in a Dataset, you'll have to give a different dimension name for each new dimension, which can be clumsy.

But I cannot combine multiple dimensions into same Variable, no? So if I have a dataset of multiple variables, each variable seems that it has to have uniform dimensions for all its values? Maybe I am misunderstanding dimensions concept.

What kind of "support" exactly were you thinking of?

Maybe examples how to create such jagged dataset? For example, how to have a variable which stores 2D images of different sizes.

If I understand correctly, I could batch all images of the same size into its own dimension? That might be also acceptable.

fujiisoup · 2017-07-19T09:33:41Z

I have a similar use case and I often use MultiIndex,
which (partly) enables to handle hierarchical data structure.

For example,

In [1]: import xarray as xr
   ...: import numpy as np
   ...:
   ...: # image 0, size [3, 4]
   ...: data0 = xr.DataArray(np.arange(12).reshape(3, 4), dims=['x', 'y'],
   ...:                      coords={'x': np.linspace(0, 1, 3), 
   ...:                              'y': np.linspace(0, 1, 4),
   ...:                              'image_index': 0})
   ...: # image 1, size [4, 5]
   ...: data1 = xr.DataArray(np.arange(20).reshape(4, 5), dims=['x', 'y'],
   ...:                      coords={'x': np.linspace(0, 1, 4), 
   ...:                              'y': np.linspace(0, 1, 5),
   ...:                              'image_index': 1})
   ...: 
   ...: data = xr.concat([data0.expand_dims('image_index').stack(xy=['x', 'y', 'image_index']),
   ...:                   data1.expand_dims('image_index').stack(xy=['x', 'y', 'image_index'])],
   ...:                  dim='xy')

In [2]: data
Out[2]: 
<xarray.DataArray (xy: 32)>
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,  0,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
Coordinates:
  * xy           (xy) MultiIndex
  - x            (xy) float64 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 1.0 1.0 1.0 ...
  - y            (xy) float64 0.0 0.3333 0.6667 1.0 0.0 0.3333 0.6667 1.0 ...
  - image_index  (xy) int64 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 ...

In [3]:  data.sel(image_index=0)  # gives data0
Out[3]: 
<xarray.DataArray (xy: 12)>
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Coordinates:
  * xy       (xy) MultiIndex
  - x        (xy) float64 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0
  - y        (xy) float64 0.0 0.3333 0.6667 1.0 0.0 0.3333 0.6667 1.0 0.0 ...

In [4]: data.sel(x=0.0)  # x==0.0 for both images
Out[4]: 
<xarray.DataArray (xy: 9)>
array([0, 1, 2, 3, 0, 1, 2, 3, 4])
Coordinates:
  * xy           (xy) MultiIndex
  - y            (xy) float64 0.0 0.3333 0.6667 1.0 0.0 0.25 0.5 0.75 1.0
  - image_index  (xy) int64 0 0 0 0 1 1 1 1 1

~~I think the above solution is essentially equivalent with~~

~~all images of the same size into its own dimension~~

EDIT:
I didn't understand the comment correctly.
The above corresponds to that all the images are flattened out and combined along one large dimension.

darothen · 2017-07-19T12:34:32Z

The problem is that these sorts of arrays break the common data model on top of which xarray (and NetCDF) is built.

If I understand correctly, I could batch all images of the same size into its own dimension? That might be also acceptable.

Yes, if you can pre-process all the images and align them on some common set of dimensions (maybe just xi and yi, denoting integer index in the x and y directions), and pad unused space for each image with NaNs, then you could concatenate everything into a Dataset.

mitar · 2017-07-19T12:37:43Z

Hm, padding might use a lot of extra space, no?

darothen · 2017-07-19T12:54:30Z

@mitar it depends on your data/application, right? But that information would also be helpful in figuring out alternative pathways. If you're always going to process the images individually or sequentially, then what advantage is there (aside from convenience) of dumping them in some giant array with forced dimensions/shape per slice?

mitar · 2017-08-08T08:44:36Z

then what advantage is there (aside from convenience) of dumping them in some giant array with forced dimensions/shape per slice?

I was mostly thinking of using xarray as a basic data format for reusable code. So if I build ML pipelines using reusable components, I have to pass data around. And so initially data might be in jagged arrays and then with various preprocessing before training model, I can get it to be in a more suitable format where images are of the same size so that I can try easier. I hoped I could use the same format for all of these places where I need to pass data around.

shoyer · 2017-08-08T15:46:55Z

I understand why this could be useful, but I don't see how we could possibly make it work.

The notion of "fixed dimension size" is fundamental to both NumPy arrays (upon which xarray is based) and the xarray Dataset/DataArray. There are lots of various workarounds (e.g., using padding or a MultiIndex) but first class support for jagged arrays would break our existing data model too severely.

fmfreeze · 2022-01-17T12:51:45Z

As I am not aware of implementation details I am not sure there is a useful link, but maybe progress in #3213 supporting sparse arrays can solve also the jagged array issue.

Long time ago I asked there a question about how xarray supports sparse arrays.
But what I actually meant were "Jagged Arrays". I just was not aware of that term and stumbled over it some days ago the very first time.

Material-Scientist · 2023-03-07T10:58:12Z

As I am not aware of implementation details I am not sure there is a useful link, but maybe progress in #3213 supporting sparse arrays can solve also the jagged array issue.

Long time ago I asked there a question about how xarray supports sparse arrays. But what I actually meant were "Jagged Arrays". I just was not aware of that term and stumbled over it some days ago the very first time.

I also recently came across awkward/jagged/ragged arrays, and that's exactly how I would like to operate on multi-dimensional (2D in referenced case) sparse data:

Instead of allocating memory with NaNs, empty slots are just not materialized by using pd.SparseDtype("float", np.nan) dtype.

You basically create a dense duck array from sparse dtypes, as the Pandas sparse user guide shows:

So, all the shape, dtype, and ndim requirements are satisfied, and xarray could implement this as a duck array.

And while you can already wrap sparse duck arrays with xr.Variable, I'm not sure if the wrapper maintains the dtype:

dcherian · 2023-03-07T13:54:23Z

@Material-Scientist We have decent support for pydata/sparse arrays. It seems like these would work for you

We do not support the pandas extension arrays at the moment.

spencerahill mentioned this issue Aug 9, 2017

How to handle calculations for variables not defined in longitude spencerahill/aospy#194

Open

dcherian closed this as completed Jan 13, 2019

FynnBe mentioned this issue Jun 5, 2023

Enhance axes in 0.5.0 bioimage-io/spec-bioimage-io#290

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support for jagged array #1482

Support for jagged array #1482

mitar commented Jul 19, 2017

fmaussion commented Jul 19, 2017

Uh oh!

mitar commented Jul 19, 2017

Uh oh!

fujiisoup commented Jul 19, 2017 •

edited

Loading

Uh oh!

darothen commented Jul 19, 2017

Uh oh!

mitar commented Jul 19, 2017

Uh oh!

darothen commented Jul 19, 2017

Uh oh!

mitar commented Aug 8, 2017

Uh oh!

shoyer commented Aug 8, 2017

Uh oh!

fmfreeze commented Jan 17, 2022

Uh oh!

Material-Scientist commented Mar 7, 2023

Uh oh!

dcherian commented Mar 7, 2023

Uh oh!

Uh oh!

Support for jagged array #1482

Support for jagged array #1482

Comments

mitar commented Jul 19, 2017

fmaussion commented Jul 19, 2017

Uh oh!

mitar commented Jul 19, 2017

Uh oh!

fujiisoup commented Jul 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

darothen commented Jul 19, 2017

Uh oh!

mitar commented Jul 19, 2017

Uh oh!

darothen commented Jul 19, 2017

Uh oh!

mitar commented Aug 8, 2017

Uh oh!

shoyer commented Aug 8, 2017

Uh oh!

fmfreeze commented Jan 17, 2022

Uh oh!

Material-Scientist commented Mar 7, 2023

Uh oh!

dcherian commented Mar 7, 2023

Uh oh!

fujiisoup commented Jul 19, 2017 •

edited

Loading