Skip to content

How to prevent Zarr from returning NaN for missing chunks? #486

Open
@willirath

Description

@willirath

Is there a way of preventing Zarr from returning NaNs if a chunk is missing?

Background of my question: We're seeing problems with either copying data to GCS or with GCS having problems to reliably serve all chunks of a Zarr store.

In arr below, there's two types of NaN filled chunks returned by Zarr.

from dask import array as darr
import numpy as np

arr = darr.from_zarr(""gs://pangeo-data/eNATL60-BLBT02X-ssh/sossheig/")

First, there's a chunk that is completely flagged missing in the data (chunk is over land in an Ocean dataset) but present on GCS (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.0.0) and Zarr correctly find all items marked as invalid:

np.isnan(arr.blocks[0, 0, 0]).mean().compute()
# -> 1.0

Then, there's a chunk (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.7.3) that is not present (at the time of writing this, I get a "load failed" and a tracking id from GCS) and Zarr returns all items marked invalid as well:

np.isnan(arr.blocks[0, 7, 3]).mean().compute()
# -> 1.0

How do I make Zarr raise an Exception on the latter?

cc: @auraoupa
related: pangeo-data/pangeo#691

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions