feat: Add partitioning and clustering to the to_gbq function #949

antoineeripret · 2025-09-22T11:43:48Z

Full context: googleapis/google-cloud-python#14488

The objective of this change is to ensure that we can leverage partitioning and clustering when using the pands_gbq library. New arguments added to the function (and documented in the code).

clustering_columns
time_partitioning_column
time_partitioning_type
time_partitioning_expiration_ms
range_partitioning_column
range_partitioning_range

This allows us to execute the following code (for example):

pandas_gbq.to_gbq(
    df, 
    "xxxxx.yyyy", 
    project_id="zzzzz",
    if_exists="replace", 
    credentials=credentials, 
    time_partitioning_column="date", 
    clustering_columns=['country', 'page']
)

Which would end up in the following configuration being applied in BigQuery:

google-cla · 2025-09-22T11:43:53Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

antoineeripret · 2025-10-22T11:36:39Z

@shuoweil, is there something missing from my side before @tswast can review the PR? Or are we just waiting that he has some free time to review it? Thank you !

shuoweil · 2025-10-27T20:43:08Z

@shuoweil, is there something missing from my side before @tswast can review the PR? Or are we just waiting that he has some free time to review it? Thank you !

Hi @antoineeripret It looks like there are bunch of tests are failed. We may need to fix them before sending request.

…t/python-bigquery-pandas into add_clustering_gbq

antoineeripret · 2025-10-27T21:37:16Z

@shuoweil, is there something missing from my side before @tswast can review the PR? Or are we just waiting that he has some free time to review it? Thank you !

Hi @antoineeripret It looks like there are bunch of tests are failed. We may need to fix them before sending request.

Hi @shuoweil, fixed the issues. The remaining failing test is linked to internal approvers.

tswast

We try to keep unit test coverage high. Would you mind adding a unit test or two or three similar to

python-bigquery-pandas/tests/unit/test_gbq.py

Lines 306 to 323 in 48a91df

    
           def test_to_gbq_w_default_project(mock_bigquery_client): 
        
               """If no project is specified, we should be able to use project from 
        
               default credentials. 
        
               """ 
        
               import google.api_core.exceptions 
        
               from google.cloud.bigquery.table import TableReference 
        
               mock_bigquery_client.get_table.side_effect = google.api_core.exceptions.NotFound( 
        
                   "my_table" 
        
               ) 
        
               gbq.to_gbq(DataFrame(), "my_dataset.my_table") 
        
               mock_bigquery_client.get_table.assert_called_with( 
        
                   TableReference.from_string("default-project.my_dataset.my_table") 
        
               ) 
        
               mock_bigquery_client.create_table.assert_called_with(mock.ANY) 
        
               table = mock_bigquery_client.create_table.call_args[0][0] 
        
               assert table.project == "default-project"

that confirms that these options are passed through to the create_table call as expected?

antoineeripret · 2025-10-28T21:11:12Z

We try to keep unit test coverage high. Would you mind adding a unit test or two or three similar to

python-bigquery-pandas/tests/unit/test_gbq.py

Lines 306 to 323 in 48a91df

def test_to_gbq_w_default_project(mock_bigquery_client):

"""If no project is specified, we should be able to use project from

default credentials.

"""

import google.api_core.exceptions

from google.cloud.bigquery.table import TableReference

mock_bigquery_client.get_table.side_effect = google.api_core.exceptions.NotFound(

"my_table"

)

gbq.to_gbq(DataFrame(), "my_dataset.my_table")

mock_bigquery_client.get_table.assert_called_with(

TableReference.from_string("default-project.my_dataset.my_table")

)

mock_bigquery_client.create_table.assert_called_with(mock.ANY)

table = mock_bigquery_client.create_table.call_args[0][0]

assert table.project == "default-project"

that confirms that these options are passed through to the create_table call as expected?

Added in the last commit.

tswast · 2025-10-30T21:40:45Z

pandas_gbq/gbq.py

+        if clustering_columns:
+            table.clustering_fields = list(clustering_columns)
+
+        if time_partitioning_column:


Technically someone could enable time partitioning without a column to do ingestion time partitioning. This is less flexible than using a column, though, so I'm not sure we need to worry about it.

tswast

Thank you!

pandas_gbq/gbq.py

antoineeripret · 2025-10-31T07:10:50Z

@tswast : added your suggestion in the last commit, thanks !

tswast

Thank you!

add partitioning and clustering to the to_gbq function

4b83d04

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-pandas API. labels Sep 22, 2025

antoineeripret mentioned this pull request Sep 22, 2025

add configuration argument to to_gbq googleapis/google-cloud-python#14488

Open

antoineeripret marked this pull request as ready for review October 2, 2025 12:54

antoineeripret requested review from a team as code owners October 2, 2025 12:54

antoineeripret requested review from GaoleMeng and sycai October 2, 2025 12:54

blunderbuss-gcf bot assigned shuoweil Oct 2, 2025

sycai changed the title ~~Add partitioning and clustering to the to_gbq function~~ feat: Add partitioning and clustering to the to_gbq function Oct 2, 2025

sycai requested a review from tswast October 2, 2025 17:37

antoineeripret added 2 commits October 27, 2025 22:34

feat: add partitioning and clustering to the to_gbq function

89a3752

Merge branch 'add_clustering_gbq' of https://github.com/antoineeripre…

858c3d8

…t/python-bigquery-pandas into add_clustering_gbq

Merge branch 'main' into add_clustering_gbq

6892b57

tswast added the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 28, 2025

yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 28, 2025

tswast reviewed Oct 28, 2025

View reviewed changes

tswast added the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 28, 2025

yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 28, 2025

added tests

36e1b9e

antoineeripret requested a review from tswast October 28, 2025 21:10

Merge branch 'main' into add_clustering_gbq

b1e851e

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 30, 2025

tswast reviewed Oct 30, 2025

View reviewed changes

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 30, 2025

tswast previously approved these changes Oct 30, 2025

View reviewed changes

tswast reviewed Oct 30, 2025

View reviewed changes

pandas_gbq/gbq.py Outdated Show resolved Hide resolved

fixed documentation error

0f82f20

antoineeripret dismissed tswast’s stale review via 0f82f20 October 31, 2025 07:09

tswast added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 31, 2025

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 31, 2025

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 31, 2025

tswast added 2 commits October 31, 2025 14:30

run nox -r -s format lint

7d81e85

Merge branch 'main' into add_clustering_gbq

d76dad6

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 31, 2025

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 31, 2025

tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 31, 2025

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 31, 2025

tswast approved these changes Oct 31, 2025

View reviewed changes

tswast enabled auto-merge (squash) October 31, 2025 14:59

shuoweil self-requested a review October 31, 2025 17:16

shuoweil approved these changes Oct 31, 2025

View reviewed changes

tswast merged commit e7213c7 into googleapis:main Oct 31, 2025
24 of 25 checks passed

release-please bot mentioned this pull request Oct 31, 2025

chore(main): release 0.30.0 #973

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add partitioning and clustering to the to_gbq function #949

feat: Add partitioning and clustering to the to_gbq function #949

Uh oh!

antoineeripret commented Sep 22, 2025

Uh oh!

google-cla bot commented Sep 22, 2025

Uh oh!

antoineeripret commented Oct 22, 2025 •

edited

Loading

Uh oh!

shuoweil commented Oct 27, 2025

Uh oh!

antoineeripret commented Oct 27, 2025

Uh oh!

tswast left a comment

Uh oh!

antoineeripret commented Oct 28, 2025

Uh oh!

tswast Oct 30, 2025

Uh oh!

tswast left a comment

Uh oh!

Uh oh!

antoineeripret commented Oct 31, 2025

Uh oh!

tswast left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	def test_to_gbq_w_default_project(mock_bigquery_client):
	"""If no project is specified, we should be able to use project from
	default credentials.
	"""
	import google.api_core.exceptions
	from google.cloud.bigquery.table import TableReference

	mock_bigquery_client.get_table.side_effect = google.api_core.exceptions.NotFound(
	"my_table"
	)
	gbq.to_gbq(DataFrame(), "my_dataset.my_table")

	mock_bigquery_client.get_table.assert_called_with(
	TableReference.from_string("default-project.my_dataset.my_table")
	)
	mock_bigquery_client.create_table.assert_called_with(mock.ANY)
	table = mock_bigquery_client.create_table.call_args[0][0]
	assert table.project == "default-project"

feat: Add partitioning and clustering to the to_gbq function #949

feat: Add partitioning and clustering to the to_gbq function #949

Uh oh!

Conversation

antoineeripret commented Sep 22, 2025

Uh oh!

google-cla bot commented Sep 22, 2025

Uh oh!

antoineeripret commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shuoweil commented Oct 27, 2025

Uh oh!

antoineeripret commented Oct 27, 2025

Uh oh!

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

antoineeripret commented Oct 28, 2025

Uh oh!

tswast Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antoineeripret commented Oct 31, 2025

Uh oh!

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

antoineeripret commented Oct 22, 2025 •

edited

Loading