Skip to content

Conversation

@tswast
Copy link
Collaborator

@tswast tswast commented Jan 9, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Requires #720, #721, and googleapis/python-bigquery#1793 to be merged first.

Fixes #710 🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-pandas API. size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Jan 9, 2024
@tswast
Copy link
Collaborator Author

tswast commented Jan 10, 2024

I tested this manually in a local ipython instance to verify that the latency is improved when using this new functionality:

After this change:

# In [1]:
import pandas_gbq

# In [2]: %%timeit -n10 -r10
df = pandas_gbq.read_gbq("SELECT 1", progress_bar_type=None)

# Out [2]:
# 394 ms ± 37.8 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

Before this change:

# In [1]:
import pandas_gbq

# In [2]: %%timeit -n10 -r10
df = pandas_gbq.read_gbq("SELECT 1", progress_bar_type=None)

# Out [2]:
# 1.08 s ± 102 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

This is quite similar to the results I observed when implementing this optimization in the DB-API connector. googleapis/python-bigquery#1747 (comment)

…n available

fix unit tests

fix python 3.7

fix python 3.7

fix python 3.7

fix python 3.7

fix wait_timeout units

boost test coverage

remove dead code

boost a little more coverage
@tswast tswast force-pushed the issue710-query_and_wait_via_client_libary branch from 51fa1a4 to 7739f41 Compare January 16, 2024 21:56
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Jan 16, 2024
@tswast tswast marked this pull request as ready for review January 16, 2024 21:56
@tswast tswast requested review from a team as code owners January 16, 2024 21:56
@tswast tswast requested a review from kiraksi January 16, 2024 21:56
)
tm.assert_frame_equal(df, DataFrame({"valid_string": ["PI"]}))

def test_configuration_without_query(self, project_id):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure why this isn't failing anymore. I guess jobs.query is more lenient about unknown parameters than jobs.insert is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share a link to a failing test log? I wasn't able to find it in test fusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran this test in a debugger. The request body we send to jobs.query looks like:

{
 'useLegacySql': True,
 'formatOptions': {'useInt64Timestamp': True},
 'query': 'SELECT 1',
 'requestId': '8c4508cb-b8e4-463b-bef8-4a8f894d3644'
}

Notice that none of the copy configuration is included. I believe this to be a bug in the query_and_wait implementation. I'll send a PR today to fix this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally with googleapis/python-bigquery#1793 and the request body now includes unknown parameters, but surprisingly the REST API seems to ignore these. I'll follow-up on that PR to raise on common errors, such as passing in a configuration object of the wrong type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated googleapis/python-bigquery#1793 with some client-side checks for common mistakes of sending the wrong type for job_config and now this test is passing.

Will need to wait for a google-cloud-bigquery release with that fix before merging this PR.

@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 16, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 16, 2024
@Linchin
Copy link
Contributor

Linchin commented Jan 17, 2024

All looks good except for that I wanted to double check on the removed system test.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 24, 2024
@tswast tswast requested a review from Linchin January 24, 2024 19:37
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 24, 2024
@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 24, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 24, 2024
@tswast
Copy link
Collaborator Author

tswast commented Jan 25, 2024

@Linchin Tests are passing after the most recent google-cloud-bigquery release. OK for another look.

@Linchin
Copy link
Contributor

Linchin commented Jan 25, 2024

LGTM, thank you!

@Linchin Linchin merged commit ac3ce3f into main Jan 25, 2024
@Linchin Linchin deleted the issue710-query_and_wait_via_client_libary branch January 25, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-pandas API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use faster query_and_wait method when available to skip BQ Storage API if all query results are already available

4 participants