Skip to content

Conversation

@antoineeripret
Copy link
Contributor

Full context: googleapis/google-cloud-python#14488

The objective of this change is to ensure that we can leverage partitioning and clustering when using the pands_gbq library. New arguments added to the function (and documented in the code).

clustering_columns
time_partitioning_column
time_partitioning_type
time_partitioning_expiration_ms
range_partitioning_column
range_partitioning_range

This allows us to execute the following code (for example):

pandas_gbq.to_gbq(
    df, 
    "xxxxx.yyyy", 
    project_id="zzzzz",
    if_exists="replace", 
    credentials=credentials, 
    time_partitioning_column="date", 
    clustering_columns=['country', 'page']
)

Which would end up in the following configuration being applied in BigQuery:

image

@google-cla
Copy link

google-cla bot commented Sep 22, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-pandas API. labels Sep 22, 2025
@antoineeripret antoineeripret marked this pull request as ready for review October 2, 2025 12:54
@antoineeripret antoineeripret requested review from a team as code owners October 2, 2025 12:54
@sycai sycai changed the title Add partitioning and clustering to the to_gbq function feat: Add partitioning and clustering to the to_gbq function Oct 2, 2025
@sycai sycai requested a review from tswast October 2, 2025 17:37
@antoineeripret
Copy link
Contributor Author

antoineeripret commented Oct 22, 2025

@shuoweil, is there something missing from my side before @tswast can review the PR? Or are we just waiting that he has some free time to review it? Thank you !

@shuoweil
Copy link

@shuoweil, is there something missing from my side before @tswast can review the PR? Or are we just waiting that he has some free time to review it? Thank you !

Hi @antoineeripret It looks like there are bunch of tests are failed. We may need to fix them before sending request.

@antoineeripret
Copy link
Contributor Author

@shuoweil, is there something missing from my side before @tswast can review the PR? Or are we just waiting that he has some free time to review it? Thank you !

Hi @antoineeripret It looks like there are bunch of tests are failed. We may need to fix them before sending request.

Hi @shuoweil, fixed the issues. The remaining failing test is linked to internal approvers.

@tswast tswast added the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 28, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 28, 2025
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to keep unit test coverage high. Would you mind adding a unit test or two or three similar to

def test_to_gbq_w_default_project(mock_bigquery_client):
"""If no project is specified, we should be able to use project from
default credentials.
"""
import google.api_core.exceptions
from google.cloud.bigquery.table import TableReference
mock_bigquery_client.get_table.side_effect = google.api_core.exceptions.NotFound(
"my_table"
)
gbq.to_gbq(DataFrame(), "my_dataset.my_table")
mock_bigquery_client.get_table.assert_called_with(
TableReference.from_string("default-project.my_dataset.my_table")
)
mock_bigquery_client.create_table.assert_called_with(mock.ANY)
table = mock_bigquery_client.create_table.call_args[0][0]
assert table.project == "default-project"

that confirms that these options are passed through to the create_table call as expected?

@tswast tswast added the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 28, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 28, 2025
@antoineeripret antoineeripret requested a review from tswast October 28, 2025 21:10
@antoineeripret
Copy link
Contributor Author

We try to keep unit test coverage high. Would you mind adding a unit test or two or three similar to

def test_to_gbq_w_default_project(mock_bigquery_client):
"""If no project is specified, we should be able to use project from
default credentials.
"""
import google.api_core.exceptions
from google.cloud.bigquery.table import TableReference
mock_bigquery_client.get_table.side_effect = google.api_core.exceptions.NotFound(
"my_table"
)
gbq.to_gbq(DataFrame(), "my_dataset.my_table")
mock_bigquery_client.get_table.assert_called_with(
TableReference.from_string("default-project.my_dataset.my_table")
)
mock_bigquery_client.create_table.assert_called_with(mock.ANY)
table = mock_bigquery_client.create_table.call_args[0][0]
assert table.project == "default-project"

that confirms that these options are passed through to the create_table call as expected?

Added in the last commit.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 30, 2025
if clustering_columns:
table.clustering_fields = list(clustering_columns)

if time_partitioning_column:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically someone could enable time partitioning without a column to do ingestion time partitioning. This is less flexible than using a column, though, so I'm not sure we need to worry about it.

@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 30, 2025
tswast
tswast previously approved these changes Oct 30, 2025
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@antoineeripret
Copy link
Contributor Author

@tswast : added your suggestion in the last commit, thanks !

@tswast tswast added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 31, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 31, 2025
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 31, 2025
@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 31, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 31, 2025
@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 31, 2025
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 31, 2025
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@tswast tswast enabled auto-merge (squash) October 31, 2025 14:59
@shuoweil shuoweil self-requested a review October 31, 2025 17:16
@tswast tswast merged commit e7213c7 into googleapis:main Oct 31, 2025
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-pandas API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants