-
-
Notifications
You must be signed in to change notification settings - Fork 364
Description
Here I am creating an array and specifying the fill_value as raw bytes b'X'
import zarr
fv = b'X'
a = zarr.create(shape=10, dtype=bytes, zarr_version=2, fill_value=fv)
ad = a.metadata.to_dict()
print(ad)
# -> {'shape': (10,), 'fill_value': 'WA==', 'attributes': {}, 'zarr_format': 2, 'order': 'C', 'filters': None, 'dimension_separator': '.', 'compressor': None, 'chunks': (10,), 'dtype': '|S0'}
b = zarr.create(shape=10, dtype=bytes, zarr_version=3, fill_value=fv)
bd = b.metadata.to_dict()
print(bd)
# -> {'shape': (10,), 'fill_value': (88,), 'chunk_grid': {'name': 'regular', 'configuration': {'chunk_shape': (10,)}}, 'attributes': {}, 'zarr_format': 3, 'data_type': <DataType.bytes: 'bytes'>, 'chunk_key_encoding': {'name': 'default', 'configuration': {'separator': '/'}}, 'codecs': ({'name': 'vlen-bytes', 'configuration': {}},), 'node_type': 'array', 'storage_transformers': ()}
assert zarr.core.metadata.v2.ArrayV2Metadata.from_dict(ad).fill_value == fv
assert zarr.core.metadata.v3.ArrayV3Metadata.from_dict(bd).fill_value == fv
As we can see, the way this fill value is encoded looks quite different from these two. Remarkably, it gets translated back to something reasonable in both cases.
In both cases, the bytes are going through this path:
zarr-python/src/zarr/abc/metadata.py
Lines 33 to 34 in aa46b45
elif isinstance(value, Sequence): | |
out_dict[key] = tuple(v.to_dict() if isinstance(v, Metadata) else v for v in value) |
This converts the bytes to a tuple of ints.
However, for v2, #2286 added this additional special handling for fill_value:
zarr-python/src/zarr/core/metadata/v2.py
Lines 146 to 150 in aa46b45
if dtype.kind in "SV": | |
fill_value_encoded = _data.get("fill_value") | |
if fill_value_encoded is not None: | |
fill_value = base64.standard_b64decode(fill_value_encoded) | |
_data["fill_value"] = fill_value |
According to the V3 spec:
Raw data types (r)
An array of integers, with length equal to , where each integer is in the range [0, 255].
This seems in line with what is happening.
This is relevant to pydata/xarray#5475
Metadata
Metadata
Assignees
Labels
Type
Projects
Status