Skip to content

Query performance optimizations #362

@tswast

Description

@tswast

This issue tracks the "fast query path" changes for the Python client(s):

  • perf: use jobs.getQueryResults to download result sets #363 -- Update QueryJob to use getQueryResults in RowIterator. Project down to avoid fetching schema and other unnecessary job stats in RowIterator.
  • perf: cache first page of jobs.getQueryResults rows #374 -- Update QueryJob and RowIterator to cache the first page of results, which we fetch as part of the logic to wait for the job to finish. Discard the cache if maxResults or startIndex are set.
  • perf: use getQueryResults from DB-API #375 -- Update DB-API to avoid direct call to list_rows()
  • perf: avoid extra API calls from to_dataframe if all rows are cached #384 -- Update to_dataframe and related methods in RowIterator to not call BQ Storage API if cached results are the only page.
  • Update DB-API to not call BQ Storage API if cached results are the only page.
  • Update Client.query to call jobs.query backend API method for acceptable job_configs.
  • (optional?) Avoid call to jobs.get in certain cases, such as QueryJob.to_dataframe and QueryJob.to_arrow
    • Add "reload" argument to QueryJob.result() -- default to True.
    • Update RowIterator to call get_job to fetch the destination table ID before attempting use of BQ Storage API (if destination table ID isn't available).

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions