Can we simplify the code by always keeping the data in one data type (e.g. xr.DataArray) per modality?

In the past, `nowcasting_dataset` was designed to feed data on-the-fly into a PyTorch model.  Which meant that, as the data flowed through `nowcasting_dataset`, the data would change type:  For example, satellite data would start as an `xr.DataArray`, then get turned into a `numpy array` (because PyTorch doesn't know what to do with an `xr.DataArray`), and then get turned into a `torch.Tensor`.

But, I think we can safely say now that `nowcasting_dataset` is _just_ for pre-preparing batches (_not_ for loading data on-the-fly).  As such, we can probably simplify the code by keeping data in a single container type per modality.  For example, satellite data could _always_ live in an `xr.DataArray` for its entire life while flowing through `nowcasting_dataset`.

Sorry, I really should've thought of this earlier!  But, yeah, I think this could simplify the code quite a lot.

I haven't fully thought through the implications of this, but some changes might be:

* In the Pydantic models, each field can be just a _single_ type (instead of a `Union` of types).  So, for example, instead of `sat_data: Array = Field(...` we can just do `sat_data: xr.DataArray = Field(...`
* We can get rid of the `to_numpy` function.
* For all the modalities which use xarray of pandas data types, we can use dimension _names_ instead of _indexes_.  e.g. `seq_length = len(sat_data[-4])` becomes `seq_length = len(sat_data.time)`
* Saving and load data to/from disk becomes super-simple.
* We'd no longer need the `to_xr_dataset` and `from_xr_dataset` methods (which are quite fiddly)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can we simplify the code by always keeping the data in one data type (e.g. xr.DataArray) per modality? #209

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can we simplify the code by always keeping the data in one data type (e.g. xr.DataArray) per modality? #209

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions