Description
We've encountered a situation using zarr on GCS via gcsfs where an attempt to retrieve a chunk object fails for some transient reason, but the fsspec gcsfs mapping raises a KeyError, which zarr interprets as meaning the object doesn't exist. Zarr's default behaviour in that case is to fill the chunk entirely with the fill value set for the array.
What this means is that any transient errors in GCS can result in chunks of an array being returned as filled with the fill value, rather than the actual data, without any error being raised. This is obviously a serious issue.
Currently the fsspec FSMap
class catches any exception within __getitem__
and reraises as a KeyError:
I think this should be modified in some way, such that FSMap
is able to distinguish between a genuine "this file/object does not exist" error and any other kind of error. In the case where a "this file/object does not exist" error is encountered, it should be safe to reraise that from __getitem__
as a KeyError
, but any other type of error I suggest would be safer to raise either by propagating the underlying error as-is, or by reraising as a RuntimeError
. This would mean that any transient errors would get propagated all the way up through zarr, preventing any silent replacement of data with fill values.
For the time being we have worked around this with a wrapper class that transforms all KeyError
into RuntimeError
as described here. Note that this only works for the case where we know ahead of time that all chunks of an array should exist, and therefore we never expect to experience a KeyError
for any reason.