Skip to content

Conversation

shobsi
Copy link
Contributor

@shobsi shobsi commented Mar 25, 2025

This is to optimize small result operations (which are more common). If the result can exceed 10MB, set bigframes.pandas.options.bigquery.allow_large_results=True explicitly.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 394658588 🦕

@shobsi shobsi requested a review from tswast March 25, 2025 22:24
@shobsi shobsi requested review from a team as code owners March 25, 2025 22:24
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 25, 2025
@shobsi shobsi changed the title feat!: set allow_large_results=False by default to optimize small r… feat!: set allow_large_results=False by default Mar 25, 2025
@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: xs Pull request size is extra small. labels Mar 26, 2025
@tswast
Copy link
Collaborator

tswast commented Mar 26, 2025

Looks like we need to update the benchmark script. Some metrics aren't available yet with allow_large_results=False. googleapis/python-bigquery#1996 has been closed, but I don't think we've updated bigframes to take advantage of it yet.

nox > python scripts/run_and_publish_benchmark.py --notebook --publish-benchmarks=notebooks/
Traceback (most recent call last):
  File "/tmpfs/src/github/python-bigquery-dataframes/scripts/run_and_publish_benchmark.py", line 486, in <module>
    main()
  File "/tmpfs/src/github/python-bigquery-dataframes/scripts/run_and_publish_benchmark.py", line 447, in main
    benchmark_metrics, error_message = collect_benchmark_result(
  File "/tmpfs/src/github/python-bigquery-dataframes/scripts/run_and_publish_benchmark.py", line 102, in collect_benchmark_result
    raise ValueError(
ValueError: Mismatch in the number of report files for bytes, millis, seconds and query char count.

@shobsi
Copy link
Contributor Author

shobsi commented Mar 26, 2025

Waiting for #1545 to go in first to fix the benchmarking script failure.

@shobsi shobsi requested a review from Genesis929 March 26, 2025 17:45
@shobsi shobsi merged commit e9fb712 into main Mar 27, 2025
24 checks passed
@shobsi shobsi deleted the shobs-allow_large_results-False branch March 27, 2025 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: s Pull request size is small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants