Skip to content

[BUG] MultiZarrToZarr fails to concatenate dimension coordinate variables #386

Open
@TomNicholas

Description

@TomNicholas

When combining along a new dimension using coo_map, MultiZarrToZarr fails to concatenate dimension coordinate variables, despite concatenating coordinate variables just fine.

Minimal example:

from kerchunk.hdf import SingleHdf5ToZarr


# Set up some fake netCDF files containing the dimension coordinate 'time'
time1 = xr.Dataset(coords={'time': ('time', [1, 2, 3])})
time2 = xr.Dataset(coords={'time': ('time', [4, 5, 6])})
time1.to_netcdf('test1.nc')
time2.to_netcdf('test2.nc')


# open both files using kerchunk
single_jsons = [SingleHdf5ToZarr(filepath, inline_threshold=300).translate() for filepath in ['./test1.nc', './test2.nc']]

# combine along new dimension 'id`
mzz = MultiZarrToZarr(
    single_jsons,
    concat_dims=["id"],
    coo_map={'id': [10, 20]},
)
combined_test_json = mzz.translate()

# open with xarray to see what the result was
combined_test = xr.open_dataset(
    "reference://", engine="zarr",
    backend_kwargs={
        "storage_options": {
            "fo": combined_test_json,
        },
        "consolidated": False,
    }
)
combined_test

Screenshot from 2023-11-01 16-55-25

This is not what I expected - the time variable should have dimensions (time, id) - we've lost half the time values. The variable time should have been concatenated along id because I did not specify it in identical_dims.

What's weird is that this works as expected for coordinate variables, just not for dimension coordinates. In other words, if I rename the time variable to time_renamed, but have it still be a function of a dimension named time, then the concatenation happens as expected:

time1 = xr.Dataset(coords={'time_renamed': ('time', [1, 2, 3])})
time2 = xr.Dataset(coords={'time_renamed': ('time', [4, 5, 6])})

Screenshot from 2023-11-01 16-59-50

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions