List view
Miscellaneous backlog tickets for Rivulet workstream (multimodal dataset format)
No due date•1/2 issues closedCreate DeltaCAT V2 APIs, including (1) a native DeltaCAT Catalog implementation, (2) a native DeltaCAT CLI and corresponding Linux-FS-like APIs, (3) Ray/Daft Data source/sink adapters (to enable local/distributed reads/writes of DeltaCAT catalogs). The DeltaCAT Catalog implementation should also include all capabilities in`deltacat/storage/rivulet/dataset.py` (mostly on the table version level), including: 1. Manage (multiple) schemas on dataset 2. Import data (e.g. from_csv) 3. Export data (e.g. to webdataset) 4. Read and write methods (currently, deltacat catalog has somewhat different read/write methods from rivulet)
No due date•2/9 issues closedThe DeltaCAT V2 Metastore is defined by a working implementation of the DeltaCAT Storage Interface, which controls all metadata I/O. This milestone tracks the development of the DeltaCAT V2 Native Storage Implementation, which forms an abstraction layer over all lower-level code in `metafile.py` and `transaction.py` code that operates directly on metafiles.
No due date•3/3 issues closedUse the DeltaCAT Metastore format in rivulet. Be able to express rivulet concepts (e.g. multiple schemas) in deltacat metastore. Clean up internal classes in rivulet that will no longer be needed.
No due date•1/7 issues closedImplement all required DeltaCAT storage APIs and make any changes required to integrate LSM-based CDC on Ray with Iceberg! Proposal Doc: https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit.
No due date•8/13 issues closedMilestone to track enhancements to the existing DeltaCAT compactor by creating and better leveraging enhanced primary key indices.
No due date•2/5 issues closed