-
-
Notifications
You must be signed in to change notification settings - Fork 368
[v3] Elevate codec pipeline #1932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
the changes here would address some of the problems described in #1913 |
normanrz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced that the class structure needs to match the json structure one-to-one. That is why we have the overridable from_dict and to_dict classes. We should be pragmatic, in particular, in places where we can make the user API more convenient.
I am fine with the changes to the CodecPipeline and can live with the changed validate signature. I am not happy with forcing the users to supply typesize and shuffle for the BloscCodec.
|
@normanrz i reverted the changes to the blosc codec edit: feel free to ignore this PR until I fix the merge conflicts |
…nto elevate_codec_pipeline
…nto elevate_codec_pipeline
|
the pre-commit action is flagging a mypy error that a) I can't replicate locally and b) is in a file that this PR did not touch, so I'm going to merge this despite the pre-commit failing. |
* v3: (22 commits) [v3] `Buffer` ensure correct subclass based on the `BufferPrototype` argument (zarr-developers#1974) Fix doc build (zarr-developers#1987) Fix doc build warnings (zarr-developers#1985) Automatically generate API reference docs (zarr-developers#1918) Update `RemoteStore.__str__` and add UPath tests (zarr-developers#1964) [v3] Elevate codec pipeline (zarr-developers#1932) 0 dim arrays: indexing (zarr-developers#1980) `parse_shapelike` allows 0 (zarr-developers#1979) Clean up typing and docs for indexing (zarr-developers#1961) add json indentation to config (zarr-developers#1952) chore: update pre-commit hooks (zarr-developers#1973) Bump pypa/gh-action-pypi-publish in the actions group (zarr-developers#1969) chore: update pre-commit hooks (zarr-developers#1957) Update release.rst (zarr-developers#1960) doc: update release notes for 3.0.0.alpha (zarr-developers#1959) Basic working FsspecStore (zarr-developers#1785) Feature: Top level V3 API (zarr-developers#1884) Buffer Prototype Argument (zarr-developers#1910) Create issue-metrics.yml fixes bug in transpose (zarr-developers#1949) ...
This PR moves the
CodecPipelinedata structure off theArrayMetadataclasses and instead localizes it higher in the stack.design for array metadata classes
In
zarr-python, we haveArrayMetadataclasses that are expressly designed to model the contents of zarr metadata, e.g.zarr.jsonor.zarray(and these classes should not do anything else). Designing theArrayMetadataclasses around this goal is very deliberate, largely due to lessons we learned in the v2 codebase, where thezarr.ArrayAPI was mixed together with the a model of the.zarrayJSON document, and this led to a variety of problems.we are breaking that design in v3
With that in mind, note that the v3 spec defines that the
codecsattribute ofzarr.jsonis a JSON array ofCodecobjects. But in the current version of the v3 branch,ArrayV3Metadata.codecshas the typeCodecPipeline, butCodecPipelineis not a collection ofCodecobjects; instead, it's an object with a lot of attributes and methods. Putting such an object under theArrayV3Metadata.codecskey violates the design principle of theArrayMetadataclass.one way to fix it
But the fix is relatively simple: we remove
CodecPipelinefromArrayV3Metadataand instead push the responsibility for creating instances of that class to objects that consumeArrayMetadata. That's what this PR does.Specifically, I make
CodecPipelinean attribute ofAsyncArray; that attribute is created on initialization, by calling a function on themetadata.codecsattribute. Sharding codecs also useCodecPipeline; in this PR, they create theCodecPipelinewhen needed instead of keeping it as an attribute. If we don't like this, we could consider adding a_codec_pipelineattribute to the sharding codecs, as long as it's clear that this attribute is not part of the JSON serialized form of the codec.I made a few other changes in the process, including changing the signature of the
Codec.validatemethod to explicitly take the parameters that it needs (shape, dtype, chunk_shape), instead of anArraySpecobject that includes extra fields (specifically,ArraySpec.attributes) that no codec will ever need for validation. If that's unpopular in this PR we can spin it out into its own PR or just not do it, but I think this change is an improvement.@normanrz this touches a lot of code you authored so I will be very interested in getting your feedback here