Open
Description
When combining along a new dimension using coo_map
, MultiZarrToZarr
fails to concatenate dimension coordinate variables, despite concatenating coordinate variables just fine.
Minimal example:
from kerchunk.hdf import SingleHdf5ToZarr
# Set up some fake netCDF files containing the dimension coordinate 'time'
time1 = xr.Dataset(coords={'time': ('time', [1, 2, 3])})
time2 = xr.Dataset(coords={'time': ('time', [4, 5, 6])})
time1.to_netcdf('test1.nc')
time2.to_netcdf('test2.nc')
# open both files using kerchunk
single_jsons = [SingleHdf5ToZarr(filepath, inline_threshold=300).translate() for filepath in ['./test1.nc', './test2.nc']]
# combine along new dimension 'id`
mzz = MultiZarrToZarr(
single_jsons,
concat_dims=["id"],
coo_map={'id': [10, 20]},
)
combined_test_json = mzz.translate()
# open with xarray to see what the result was
combined_test = xr.open_dataset(
"reference://", engine="zarr",
backend_kwargs={
"storage_options": {
"fo": combined_test_json,
},
"consolidated": False,
}
)
combined_test
This is not what I expected - the time variable should have dimensions (time, id)
- we've lost half the time values. The variable time
should have been concatenated along id
because I did not specify it in identical_dims
.
What's weird is that this works as expected for coordinate variables, just not for dimension coordinates. In other words, if I rename the time
variable to time_renamed
, but have it still be a function of a dimension named time
, then the concatenation happens as expected:
time1 = xr.Dataset(coords={'time_renamed': ('time', [1, 2, 3])})
time2 = xr.Dataset(coords={'time_renamed': ('time', [4, 5, 6])})
Metadata
Metadata
Assignees
Labels
No labels