-
Notifications
You must be signed in to change notification settings - Fork 125
feat: Add partitioning and clustering to the to_gbq function #949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Hi @antoineeripret It looks like there are bunch of tests are failed. We may need to fix them before sending request. |
Hi @shuoweil, fixed the issues. The remaining failing test is linked to internal approvers. |
tswast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to keep unit test coverage high. Would you mind adding a unit test or two or three similar to
python-bigquery-pandas/tests/unit/test_gbq.py
Lines 306 to 323 in 48a91df
| def test_to_gbq_w_default_project(mock_bigquery_client): | |
| """If no project is specified, we should be able to use project from | |
| default credentials. | |
| """ | |
| import google.api_core.exceptions | |
| from google.cloud.bigquery.table import TableReference | |
| mock_bigquery_client.get_table.side_effect = google.api_core.exceptions.NotFound( | |
| "my_table" | |
| ) | |
| gbq.to_gbq(DataFrame(), "my_dataset.my_table") | |
| mock_bigquery_client.get_table.assert_called_with( | |
| TableReference.from_string("default-project.my_dataset.my_table") | |
| ) | |
| mock_bigquery_client.create_table.assert_called_with(mock.ANY) | |
| table = mock_bigquery_client.create_table.call_args[0][0] | |
| assert table.project == "default-project" |
that confirms that these options are passed through to the create_table call as expected?
Added in the last commit. |
| if clustering_columns: | ||
| table.clustering_fields = list(clustering_columns) | ||
|
|
||
| if time_partitioning_column: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically someone could enable time partitioning without a column to do ingestion time partitioning. This is less flexible than using a column, though, so I'm not sure we need to worry about it.
tswast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
|
@tswast : added your suggestion in the last commit, thanks ! |
tswast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Full context: googleapis/google-cloud-python#14488
The objective of this change is to ensure that we can leverage partitioning and clustering when using the
pands_gbqlibrary. New arguments added to the function (and documented in the code).This allows us to execute the following code (for example):
Which would end up in the following configuration being applied in BigQuery: