Skip to content

go/adbc/driver/flightsql: support generic ingest #1107

@joellubi

Description

@joellubi

I wanted to pose the question of what it would take to be able to support ingest with the flightsql driver. I understand that each driver is meant to supply its own specific implementation for ingestion, which makes doing so for a flightsql backend challenging because the driver wouldn't necessarily know the specifics of it's underlying representation or syntax.

I had a few thoughts on how this might be achieved:

  • One option would be to extend the flightsql spec to support ingest natively. This would enable any backend to implement its own ingest logic, and the adbc ingest would simply map to the flightsql ingest. This could be done by adding a new command message type to include in the FlightDescriptor of a DoPut call.
  • Another option, though perhaps longer term and less clear, would be to defer this to substrait. The problem is that a generic flightsql adbc driver would not know the UPDATE or INSERT or COPY syntax to submit as a query to the backend, but perhaps a substrait plan could abstract the "ingestion plan". Now I'm not very familiar with the details of the substrait spec and I've only seen it used for "SELECT" style queries, so this may not even align with its stated goals. I think the first option is likely a better fit.

Elaborating on how the first option might be implemented, here's an example of how the new message type might look:

message CommandStatementIngest {
  option (experimental) = true;

  enum IngestMode {
    INGEST_MODE_CREATE = 0;
    INGEST_MODE_APPEND = 1;
    INGEST_MODE_REPLACE = 2;
    INGEST_MODE_CREATE_APPEND = 3;
  }

  string target_table = 1;

  IngestMode mode = 2;
}

After receiving this in the FlightDescriptor, the flightsql server may then handle the subsequent stream with whichever means provide the desired throughput to the requested target.

I would appreciate any feedback on this approach, or links to prior context I may have missed. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions