Skip to content

fix: reduce bigquery table modification via DML for to_gbq #1737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 16, 2025

Conversation

chelsea-lin
Copy link
Contributor

@chelsea-lin chelsea-lin commented May 14, 2025

To avoid exceeding BigQuery's 1500 daily table modification limit, to_gbq now prioritizes INSERT or MERGE DMLs. This method is used when the target table exists and shares the same schema, supporting both data replacement and appending. If schema discrepancies are found, to_gbq will default back to its original table modification process.

Fixes internal issue 409086472 🦕

@chelsea-lin chelsea-lin requested a review from tswast May 14, 2025 23:43
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels May 14, 2025
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_togbqmerge branch 2 times, most recently from 056c6c8 to 5180abb Compare May 15, 2025 19:53
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels May 15, 2025
@chelsea-lin chelsea-lin changed the title fix: avoid table modification on to_gbq fix: reduce bigquery table modification via DML for to_gbq May 15, 2025
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_togbqmerge branch from 5180abb to 1c32d28 Compare May 15, 2025 20:04
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels May 15, 2025
@chelsea-lin chelsea-lin marked this pull request as ready for review May 15, 2025 20:05
@chelsea-lin chelsea-lin requested review from a team as code owners May 15, 2025 20:05
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_togbqmerge branch from 1c32d28 to d8ce12f Compare May 16, 2025 01:21
for field in table_schema:
if field.name not in schema.names:
return False
if bigframes.dtypes.convert_schema_field(field)[1] != schema.get_type(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do anything special here for the duration/timedelta type that we added recently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert_schema_field() is able handle timedelta:

elif (
field.field_type == "INTEGER"
and field.description is not None
and field.description.endswith(TIMEDELTA_DESCRIPTION_TAG)
):
return field.name, TIMEDELTA_DTYPE
.

I believe we are good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@chelsea-lin chelsea-lin merged commit 545cdca into main May 16, 2025
24 checks passed
@chelsea-lin chelsea-lin deleted the main_chelsealin_togbqmerge branch May 16, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants