Skip to content

Provide an example of using a remote catalog #13714

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Quoting from @westonpace on #13582

Many catalogs are remote (and/or disk based) and offer only asynchronous APIs. For example, Polaris, Unity, and Hive. Integrating with this catalogs is impossible since something like ctx.sql("SELECT * FROM db.schm.tbl") first enters an async context (sql) then a synchronous context (calling the catalog provider to resolve db) and then we need to go into an asynchronous context to interact with the catalog and this async -> sync -> async path is generally forbidden.

This also came up in

I believe it is possible to interact with remote catalogs with DataFusion's non async CatalogAPIs but it is not obvious how to do so

Describe the solution you'd like

I would like a clear well documented example of a DataFusion catalog that interacts with a remote catalog

Describe alternatives you've considered

Another approach that is taken by the SessionContext::sql Is:

Does an initial pass through the parse tree to find all references (non async)
Then fetch all references (can be async)
Then does the planning (non async) with all the relevant references
I don't think this is particularly well documented

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions