Skip to content

v3 core spec: Consider to drop /meta prefix, have file at URI #177

@jstriebel

Description

@jstriebel

Citing @jbms from #149 (comment):

As discussed on the community meeting, this naming scheme has the drawback in that there is not a good way to have a path directly to a non-root array. Additionally, it was noted that to better integrate with existing filesystem completion in editors, etc., it would be helpful if the path were a real filesystem path.

A few proposals were made:

  • Instead of "meta/root" + P + ".group.json" as the key for the metadata file, instead have it just be: P + ".zr3". This would lead to a key of just ".zr3" for the root. It was noted though that with a consolidated metadata extension these metadata files would not actually exist. The data could be placed in e.g. "_data" or something, where names beginning with an underscore could be reserved to prevent conflicts. This would still require a way to locate the root directory --- that could be done by storing the relative path inside the array metadata.
  • Alternatively, we could leave the directory structure alone but require an extension as part of the directory containing the root, e.g. "foo.zr3", and disallow any array or group names from ending in ".zr3". Then you could use: "path/to/root.zr3/path/to/array" as a pseudo-path to an individual array. The downside is that it may be confusing to use something that looks like a path, and where a portion corresponds to a real filesystem path but a portion does not. File completion in editors also wouldn't support this.
  • We could use a special syntax to combine both a path to a root and a path to an array into a single string, e.g. "path/to/root//path/to/array" or "path/to/root#/path/to/array". We would need to carefully choose the syntax to avoid conflicts with e.g. fsspec, and file completion in editors also wouldn't support this.

Some more comments from discussion rounds I remember:

  • .json suffix is useful to have a correct mimetype by default for many stores, e.g. S3
  • Having a URI (TBD, see v3: Define standard "URL" syntax for referencing a specific array, group, attribute within a zarr repository #132) to an array or group, it would be great to actually find a directory or file there. E.g. s3://bucket-name/key-name/name-of-the-zarr-path.zarr/hierarchy/path/my-data.array.json could be a URI to point to the my-data array at the path hierarchy/path/my-data of the zarr hierarchy which is placed under s3://bucket-name/key-name/name-of-the-zarr-path.zarr/. (Just made up a URI here as an example, feel free to discuss this in v3: Define standard "URL" syntax for referencing a specific array, group, attribute within a zarr repository #132). Using such a URI schema and dropping the /meta prefix, one could find the relevant file (at least for filesystem or http stores or using appropriate clients for other stores).
  • The original motivation to have /meta and /data separate is to be able to list all meta keys without also listing the chunk files for efficiency reasons. If it's possible to exclude directories for key-listings for most relevant stores, only using a prefix for the chunk files would still give this efficiency, but it's unclear if that's the case.
  • It might be useful to be able to place chunk-files in arbitrary locations (possibly even other stores). This could be added as an extension, but can also be considered for the core spec.

Pinging discussion participants I remember so far: @joshmoore @jbms @rabernat @WardF

Metadata

Metadata

Assignees

No one assigned

    Labels

    core-protocol-v3.0Issue relates to the core protocol version 3.0 spec

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions