Description
Is there a way of preventing Zarr from returning NaNs if a chunk is missing?
Background of my question: We're seeing problems with either copying data to GCS or with GCS having problems to reliably serve all chunks of a Zarr store.
In arr
below, there's two types of NaN filled chunks returned by Zarr.
from dask import array as darr
import numpy as np
arr = darr.from_zarr(""gs://pangeo-data/eNATL60-BLBT02X-ssh/sossheig/")
First, there's a chunk that is completely flagged missing in the data (chunk is over land in an Ocean dataset) but present on GCS (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.0.0) and Zarr correctly find all items marked as invalid:
np.isnan(arr.blocks[0, 0, 0]).mean().compute()
# -> 1.0
Then, there's a chunk (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.7.3) that is not present (at the time of writing this, I get a "load failed" and a tracking id from GCS) and Zarr returns all items marked invalid as well:
np.isnan(arr.blocks[0, 7, 3]).mean().compute()
# -> 1.0
How do I make Zarr raise an Exception on the latter?
cc: @auraoupa
related: pangeo-data/pangeo#691