- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10
Add json data type #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 
 100%. I have raised this as well: 
 
 That would be my preferred alternative. But the  FYI in  
 | 
| 
 As the person who implemented this, I agree! 🙃 I was just copying what Zarr V2 did and trying to get to some level of feature parity quickly. But it's not too late to change this. This issue does highlight the challenge of coupling between dtypes and codecs. It also touches on logical types vs. physical types. 
 I would favor this route. But I'd appreciate folks thoughts on how to specify these relationships between dtypes and codecs more formally. Like, is there someway for a dtype extension to say "this dtype must be used in conjunction with the  | 
| 
 This coupling is an interesting problem. I think it is currently the other way around: codecs define what dtypes they can (de)serialize. | 
| The coupling is inherent but not necessarily a problem --- but it would be better to be able to add bidirectional references in both the codec and data type, rather than unidirectional references. If the core codecs were also defined in this repository that would simplify things, though. As far as  The  @LDeakin One potential issue with what you have implemented in zarrs is that there is an advantage in placing the index at the end rather than the beginning, which is that it is then possible to stream out the data.  If the index is at the beginning, then you have to first calculate the size of every element, and e.g. for json if the elements are not already stored encoded it is difficult to avoid buffering or redundant work.  On the other hand I suppose it is easier to do a streaming decode if the index is at the beginning.  We could add an  | 
| 
 Sounds good | 
This defines a
jsondata type and adds it as a supported data type to thevlen-utf8codec.In my opinion it is a bit unfortunate that
vlen-utf8andvlen-byteswere added as separate codecs rather than just usingvlenfor both. In zarr v2 a separate identifier was needed because the data type for both was listed as "O" and therefore the real data type had to be determined based on the codec identifier. In zarr v3 that problem does not exist.In this PR I decided to just allow the
jsondata type in thevlen-utf8codec, because encoded JSON is itself valid UTF-8.Possible alternatives:
vlen-jsoncodec instead.vlencodec that will supportbytes,string, andjsonand makevlen-bytesandvlen-utf8deprecated.