Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Can we simplify the code by always keeping the data in one data type (e.g. xr.DataArray) per modality? #209

@JackKelly

Description

@JackKelly

In the past, nowcasting_dataset was designed to feed data on-the-fly into a PyTorch model. Which meant that, as the data flowed through nowcasting_dataset, the data would change type: For example, satellite data would start as an xr.DataArray, then get turned into a numpy array (because PyTorch doesn't know what to do with an xr.DataArray), and then get turned into a torch.Tensor.

But, I think we can safely say now that nowcasting_dataset is just for pre-preparing batches (not for loading data on-the-fly). As such, we can probably simplify the code by keeping data in a single container type per modality. For example, satellite data could always live in an xr.DataArray for its entire life while flowing through nowcasting_dataset.

Sorry, I really should've thought of this earlier! But, yeah, I think this could simplify the code quite a lot.

I haven't fully thought through the implications of this, but some changes might be:

  • In the Pydantic models, each field can be just a single type (instead of a Union of types). So, for example, instead of sat_data: Array = Field(... we can just do sat_data: xr.DataArray = Field(...
  • We can get rid of the to_numpy function.
  • For all the modalities which use xarray of pandas data types, we can use dimension names instead of indexes. e.g. seq_length = len(sat_data[-4]) becomes seq_length = len(sat_data.time)
  • Saving and load data to/from disk becomes super-simple.
  • We'd no longer need the to_xr_dataset and from_xr_dataset methods (which are quite fiddly)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions