Skip to content

[Feature] Introduce Archive ability #5510

@JingsongLi

Description

@JingsongLi

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Introduce Archive ability to object store just like OSS and S3.
Spark SQL:

ALTER TABLE t PARTITION(dt = '20250101') ARCHIVE;
ALTER TABLE t PARTITION(dt = '20250101') COLD ARCHIVE;

ALTER TABLE t PARTITION(dt = '20250101') RESTORE ARCHIVE;
ALTER TABLE t PARTITION(dt = '20250101') UNARCHIVE;

For this SQL, we should initiate a task to perform distributed archiving, archiving all files in the partition for each file.

We can introduce methods to FileIO:

Optional<Path> archive(Path path, StorageType type);

void restoreArchive(Path path, Duration duration);

Optional<Path> unarchive(Path path, StorageType type);

enum StorageType {
    Standard("Standard"),
    Archive("Archive"),
    ColdArchive("ColdArchive"),
}

Archive and unarchive can return new path, some file systems do not support in place archiving and require generating additional paths, this may need to commit table too.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions