-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Zarr consolidated #2559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr consolidated #2559
Changes from all commits
9cc0550
7eda4cc
0af5abd
bb6f9c2
95ac3b9
00a0efe
cfa0a08
9b4a8aa
c00ef82
f063f18
e3579a8
6ef6d63
fa9cc41
09eee44
95829f0
fe4af34
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,8 @@ Breaking changes | |
Enhancements | ||
~~~~~~~~~~~~ | ||
|
||
- Ability to read and write consolidated metadata in zarr stores (:issue:`2558`). | ||
By `Ryan Abernathey <https://github.com/rabernat>`_. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you reference the issue this is attached to: |
||
- :py:class:`CFTimeIndex` uses slicing for string indexing when possible (like | ||
:py:class:`pandas.DatetimeIndex`), which avoids unnecessary copies. | ||
By `Stephan Hoyer <https://github.com/shoyer>`_ | ||
|
@@ -56,15 +58,15 @@ Breaking changes | |
- ``Dataset.T`` has been removed as a shortcut for :py:meth:`Dataset.transpose`. | ||
Call :py:meth:`Dataset.transpose` directly instead. | ||
- Iterating over a ``Dataset`` now includes only data variables, not coordinates. | ||
Similarily, calling ``len`` and ``bool`` on a ``Dataset`` now | ||
Similarily, calling ``len`` and ``bool`` on a ``Dataset`` now | ||
includes only data variables. | ||
- ``DataArray.__contains__`` (used by Python's ``in`` operator) now checks | ||
array data, not coordinates. | ||
array data, not coordinates. | ||
- The old resample syntax from before xarray 0.10, e.g., | ||
``data.resample('1D', dim='time', how='mean')``, is no longer supported will | ||
raise an error in most cases. You need to use the new resample syntax | ||
instead, e.g., ``data.resample(time='1D').mean()`` or | ||
``data.resample({'time': '1D'}).mean()``. | ||
``data.resample({'time': '1D'}).mean()``. | ||
|
||
|
||
- New deprecations (behavior will be changed in xarray 0.12): | ||
|
@@ -101,13 +103,13 @@ Breaking changes | |
than by default trying to coerce them into ``np.datetime64[ns]`` objects. | ||
A :py:class:`~xarray.CFTimeIndex` will be used for indexing along time | ||
coordinates in these cases. | ||
- A new method :py:meth:`~xarray.CFTimeIndex.to_datetimeindex` has been added | ||
- A new method :py:meth:`~xarray.CFTimeIndex.to_datetimeindex` has been added | ||
to aid in converting from a :py:class:`~xarray.CFTimeIndex` to a | ||
:py:class:`pandas.DatetimeIndex` for the remaining use-cases where | ||
using a :py:class:`~xarray.CFTimeIndex` is still a limitation (e.g. for | ||
resample or plotting). | ||
- Setting the ``enable_cftimeindex`` option is now a no-op and emits a | ||
``FutureWarning``. | ||
``FutureWarning``. | ||
|
||
Enhancements | ||
~~~~~~~~~~~~ | ||
|
@@ -194,7 +196,7 @@ Bug fixes | |
the dates must be encoded using cftime rather than NumPy (:issue:`2272`). | ||
By `Spencer Clark <https://github.com/spencerkclark>`_. | ||
|
||
- Chunked datasets can now roundtrip to Zarr storage continually | ||
- Chunked datasets can now roundtrip to Zarr storage continually | ||
with `to_zarr` and ``open_zarr`` (:issue:`2300`). | ||
By `Lily Wang <https://github.com/lilyminium>`_. | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -224,7 +224,8 @@ class ZarrStore(AbstractWritableDataStore): | |
""" | ||
|
||
@classmethod | ||
def open_group(cls, store, mode='r', synchronizer=None, group=None): | ||
def open_group(cls, store, mode='r', synchronizer=None, group=None, | ||
consolidated=False, consolidate_on_close=False): | ||
import zarr | ||
min_zarr = '2.2' | ||
|
||
|
@@ -234,15 +235,27 @@ def open_group(cls, store, mode='r', synchronizer=None, group=None): | |
"installation " | ||
"http://zarr.readthedocs.io/en/stable/" | ||
"#installation" % min_zarr) | ||
zarr_group = zarr.open_group(store=store, mode=mode, | ||
synchronizer=synchronizer, path=group) | ||
return cls(zarr_group) | ||
|
||
def __init__(self, zarr_group): | ||
if consolidated or consolidate_on_close: | ||
if LooseVersion(zarr.__version__) <= '2.2.1.dev2': # pragma: no cover | ||
raise NotImplementedError("Zarr version 2.2.1.dev2 or greater " | ||
"is required by for consolidated " | ||
"metadata.") | ||
|
||
open_kwargs = dict(mode=mode, synchronizer=synchronizer, path=group) | ||
if consolidated: | ||
# TODO: an option to pass the metadata_key keyword | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to consider this TODO here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Anything to do here now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we feel that it's important to expose this functionality from within xarray? I don't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also don't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I propose we just leave these TODO's here as is. If anyone ever needs this feature from the xarray side, this will help guide them on how to implement it. |
||
zarr_group = zarr.open_consolidated(store, **open_kwargs) | ||
else: | ||
zarr_group = zarr.open_group(store, **open_kwargs) | ||
return cls(zarr_group, consolidate_on_close) | ||
|
||
def __init__(self, zarr_group, consolidate_on_close=False): | ||
self.ds = zarr_group | ||
self._read_only = self.ds.read_only | ||
self._synchronizer = self.ds.synchronizer | ||
self._group = self.ds.path | ||
self._consolidate_on_close = consolidate_on_close | ||
|
||
def open_store_variable(self, name, zarr_array): | ||
data = indexing.LazilyOuterIndexedArray(ZarrArrayWrapper(name, self)) | ||
|
@@ -333,11 +346,16 @@ def store(self, variables, attributes, *args, **kwargs): | |
def sync(self): | ||
pass | ||
|
||
def close(self): | ||
if self._consolidate_on_close: | ||
import zarr | ||
zarr.consolidate_metadata(self.ds.store) | ||
|
||
|
||
def open_zarr(store, group=None, synchronizer=None, auto_chunk=True, | ||
decode_cf=True, mask_and_scale=True, decode_times=True, | ||
concat_characters=True, decode_coords=True, | ||
drop_variables=None): | ||
drop_variables=None, consolidated=False): | ||
"""Load and decode a dataset from a Zarr store. | ||
|
||
.. note:: Experimental | ||
|
@@ -383,10 +401,13 @@ def open_zarr(store, group=None, synchronizer=None, auto_chunk=True, | |
decode_coords : bool, optional | ||
If True, decode the 'coordinates' attribute to identify coordinates in | ||
the resulting dataset. | ||
drop_variables: string or iterable, optional | ||
drop_variables : string or iterable, optional | ||
A variable or list of variables to exclude from being parsed from the | ||
dataset. This may be useful to drop variables with problems or | ||
inconsistent values. | ||
consolidated : bool, optional | ||
Whether to open the store using zarr's consolidated metadata | ||
capability. Only works for stores that have already been consolidated. | ||
|
||
Returns | ||
------- | ||
|
@@ -423,7 +444,7 @@ def maybe_decode_store(store, lock=False): | |
mode = 'r' | ||
zarr_store = ZarrStore.open_group(store, mode=mode, | ||
synchronizer=synchronizer, | ||
group=group) | ||
group=group, consolidated=consolidated) | ||
ds = maybe_decode_store(zarr_store) | ||
|
||
# auto chunking needs to be here and not in ZarrStore because variable | ||
|
Uh oh!
There was an error while loading. Please reload this page.