-
Notifications
You must be signed in to change notification settings - Fork 322
Closed
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.
Description
Environment details
- OS type and version: Linux, happens on docker container python:3.9-slim-bullseye in GKE as well.
- Python version:
3.9.12 - pip version:
22.0.4 google-cloud-bigqueryversion:3.2.0
Steps to reproduce
- Read from a large table with to_dataframe_iterable(bqstorage_client)
- Will continue to fill memory until OOMKiller kicks in.
- Disable bqstorage_client and the problem is gone. ##EDIT, not entirely sure this is true.. I think this still happens just astronomically slower. Iterating by row is different though.
Code example
# Runs out of memory:
bqstorage_client = bigquery_storage.BigQueryReadClient()
for df in bigquery_result.result().to_dataframe_iterable(bqstorage_client=bqstorage_client, max_queue_size=2):
pass
# Works fine:
for row in bigquery_result.result():
passIs max_queue_size not propagated or something like that? The table I'm reading from is 24gb in size and not partitioned. I've been trying to use tracemalloc etc to track down what's going on, but not been successful. Happy to help add debug information if anyone has any ideas on how to resolve this one.
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.