-
Couldn't load subscription status.
- Fork 49
Open
Labels
Milestone
Description
I realised that the in-progess ManifestStore refactor would actually allow us to separate concerns so much that we could potentially make xarray an optional dependency, where you only need xarray installed if you want to use its API to manipulate virtual zarr stores (e.g. by concatenating them).
The result could work like this:
# use a virtual reader directly - no xarray needed
ms: ManifestStore = manifeststore_from_hdf('file.nc')
# write to some virtual references format directly - no xarray needed
# this would use `IcechunkStore.set_virtual_refs()` as it currently does
ms.to_icechunk(icechunkstore)or if you want to work in xarray space you can move to it:
# xarray required to convert to virtual dataset representation
vds: xr.Dataset = ms.to_virtual_dataset(loadable_variables=...)
# (or just go straight there using our existing API)
vds: xr.Dataset = vz.open_virtual_dataset('file.nc', reader=manifeststore_from_hdf, loadable_variables=...)
# xarray required to do manipulating in xarray space
vds_combined: xr.Dataset = xr.concatenate(vds1, vds2, ...)
# write to some virtual references format - xarray required to write the non-virtual variables
# this could convert the virtual variables to a `ManifestStore` first as well as using `Dataset.to_zarr(icechunkstore)` for the loadable variables as it currently does
vds.to_icechunk(icechunkstore)Advantages:
- Total separation of concerns between virtualizing files into the zarr data model and manipulating them using the xarray data model (this would probably be helpful for
fill_valueand CF-related stuff too), - Can create virtual zarr references for data that xarray cannot even represent (e.g. multiple arrays with non-alignable dimensions in the same group).
Disadvantages:
- Might be less clear for non-expert users, because there are now two ways to read and write references. I still think we would present the xarray interface as the standard UI, we would just mention that this is possible in a developers section of the docs, as
ManifestStoreis only supposed to be developer API anyway.