Skip to content

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jul 10, 2025

This PR removes the default chunk encoding parameters from the config.

Prior to the data types refactor, when we inferred the chunk encoding automatically, we categorized the array data type into one of a few categories like "numeric", "string", "bytes". These categories were created before we added datetimes and fixed-length strings, and they don't really make sense after those data types were added:

  • "string" and "bytes" are not real categories, because they do not distinguish between fixed length and variable length bytes / strings. The fixed-length types are basically "numeric", but the variable length types must use the vlen utf8 and vlen bytes codecs.
  • datetime data, fixed-length strings, and structured data types are not obviously "numeric"

It's not really possible to preserve the old categories in the config now that we have more data types that resist categorization. I don't think it's really possible break the data types down into a small number of general categories, and it's less possible / attractive to put this categorization logic in our config.

I think a better solution is for zarr python to choose sane defaults; when users want something other than the default, they specify it in the function call. In practice these sane defaults are the same codecs for all data types except the variable-length data types, which get special treatment (data types with endianness in zarr v3 also get a bytes codec with an endianness parameter). That's what I added in this PR. The default chunk encoding parameters each come from functions that are inaccessible to configuration.

I did a quick github survey to find users who might be using the zarr config to change the default codecs, and I could only find 2: @TomAugspurger uses the config here and @dcangst in here.

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jul 10, 2025
Copy link

codecov bot commented Jul 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.76%. Comparing base (ded59d9) to head (1f8c88b).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3228      +/-   ##
==========================================
- Coverage   94.76%   94.76%   -0.01%     
==========================================
  Files          78       78              
  Lines        8642     8672      +30     
==========================================
+ Hits         8190     8218      +28     
- Misses        452      454       +2     
Files with missing lines Coverage Δ
src/zarr/core/array.py 98.52% <100.00%> (+0.05%) ⬆️
src/zarr/core/config.py 83.33% <100.00%> (-16.67%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 10, 2025

closes #3104

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 10, 2025
@d-v-b d-v-b requested review from dstansby and normanrz July 10, 2025 22:04
@d-v-b d-v-b added this to the 3.1.0 milestone Jul 10, 2025
Copy link
Member

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense!

@d-v-b d-v-b enabled auto-merge (squash) July 11, 2025 15:34
@d-v-b d-v-b merged commit c0e39af into zarr-developers:main Jul 11, 2025
30 checks passed
@d-v-b d-v-b deleted the deprecate-encoding-config branch July 11, 2025 17:49
@TomAugspurger TomAugspurger mentioned this pull request Jul 14, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants