Skip to content

Cannot load Zarr store from S3 with square brackets (i.e., [ and ]) in the name #2461

@aliddell

Description

@aliddell

Zarr version

v2.18.3

Numcodecs version

v0.11.0 (Windows), v0.13.1 (Linux)

Python Version

3.10.11 (Windows), 3.11.5 (Linux)

Operating System

Windows and Linux

Installation

pip into conda env (Windows), pip into venv (Linux)

Description

See this message on the Zulip.

I am unable to load a Zarr dataset backed by s3.S3Map with square brackets in the name. Example tested is test_stream_data_to_s3[version0-None].zarr.

It appears to be unable to find .zgroup:

raceback (most recent call last):
  File "C:\Users\Alan Liddell\AppData\Roaming\Python\Python39\site-packages\zarr\storage.py", line 2462, in __getitem__
    value = self._values_cache[key]
KeyError: '.zgroup'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Alan Liddell\.conda\envs\napari-env\lib\site-packages\fsspec\mapping.py", line 143, in __getitem__
    result = self.fs.cat(k)
  File "C:\Users\Alan Liddell\.conda\envs\napari-env\lib\site-packages\fsspec\asyn.py", line 115, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "C:\Users\Alan Liddell\.conda\envs\napari-env\lib\site-packages\fsspec\asyn.py", line 100, in sync
    raise return_result
  File "C:\Users\Alan Liddell\.conda\envs\napari-env\lib\site-packages\fsspec\asyn.py", line 55, in _runner
    result[0] = await coro
  File "C:\Users\Alan Liddell\.conda\envs\napari-env\lib\site-packages\fsspec\asyn.py", line 405, in _cat
    paths = await self._expand_path(path, recursive=recursive)
  File "C:\Users\Alan Liddell\.conda\envs\napari-env\lib\site-packages\fsspec\asyn.py", line 758, in _expand_path
    out = await self._expand_path([path], recursive, maxdepth)
  File "C:\Users\Alan Liddell\.conda\envs\napari-env\lib\site-packages\fsspec\asyn.py", line 782, in _expand_path
    raise FileNotFoundError(path)
FileNotFoundError: ['zarr-test/test_stream_data_to_s3[version0-None].zarr/.zgroup']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Alan Liddell\AppData\Roaming\Python\Python39\site-packages\zarr\hierarchy.py", line 164, in __init__
    meta_bytes = store[mkey]
  File "C:\Users\Alan Liddell\AppData\Roaming\Python\Python39\site-packages\zarr\storage.py", line 2470, in __getitem__
    value = self._store[key]
  File "C:\Users\Alan Liddell\AppData\Roaming\Python\Python39\site-packages\zarr\storage.py", line 738, in __getitem__
    return self._mutable_mapping[key]
  File "C:\Users\Alan Liddell\.conda\envs\napari-env\lib\site-packages\fsspec\mapping.py", line 147, in __getitem__
    raise KeyError(key)
KeyError: '.zgroup'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\testing\s3-problems\hist.py", line 37, in <module>
    s3_group = get_group_from_store(get_s3_store())
  File "c:\testing\s3-problems\hist.py", line 31, in get_group_from_store
    return zarr.group(store=cache)
  File "C:\Users\Alan Liddell\AppData\Roaming\Python\Python39\site-packages\zarr\hierarchy.py", line 1358, in group
    return Group(store, read_only=False, chunk_store=chunk_store,
  File "C:\Users\Alan Liddell\AppData\Roaming\Python\Python39\site-packages\zarr\hierarchy.py", line 167, in __init__
    raise GroupNotFoundError(path)
zarr.errors.GroupNotFoundError: group not found at path ''

But we can confirm with s3fs that it's there:

import os, dotenv, s3fs

dotenv.load_dotenv()

endpoint = os.environ["ZARR_S3_ENDPOINT"]
bucket = os.environ["ZARR_S3_BUCKET_NAME"]
key_id = os.environ["ZARR_S3_ACCESS_KEY_ID"]
key_secret = os.environ["ZARR_S3_SECRET_ACCESS_KEY"]

store_path = "test_stream_data_to_s3[version0-None].zarr"

s3 = s3fs.S3FileSystem(
    key=key_id, secret=key_secret, client_kwargs={"endpoint_url": endpoint}
)
s3.ls(f"{bucket}/{store_path}/.zgroup")

image

Steps to reproduce

import os, dotenv, s3fs, zarr

dotenv.load_dotenv()

endpoint = os.environ["ZARR_S3_ENDPOINT"]
bucket = os.environ["ZARR_S3_BUCKET_NAME"]
key_id = os.environ["ZARR_S3_ACCESS_KEY_ID"]
key_secret = os.environ["ZARR_S3_SECRET_ACCESS_KEY"]

store_path = "test_stream_data_to_s3[version0-None].zarr"

s3 = s3fs.S3FileSystem(
    key=key_id, secret=key_secret, client_kwargs={"endpoint_url": endpoint}
)
# print(s3.ls(f"{bucket}/{store_path}/.zgroup")) # uncomment for a sanity check

store = s3fs.S3Map(root=f"{bucket}/{store_path}", s3=s3)
group = zarr.group(store=store)

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions