Skip to content

OOM when running to_dataframe_iterable with bqstorage client #1292

@andaag

Description

@andaag

Environment details

  • OS type and version: Linux, happens on docker container python:3.9-slim-bullseye in GKE as well.
  • Python version: 3.9.12
  • pip version: 22.0.4
  • google-cloud-bigquery version: 3.2.0

Steps to reproduce

  1. Read from a large table with to_dataframe_iterable(bqstorage_client)
  2. Will continue to fill memory until OOMKiller kicks in.
  3. Disable bqstorage_client and the problem is gone. ##EDIT, not entirely sure this is true.. I think this still happens just astronomically slower. Iterating by row is different though.

Code example

# Runs out of memory:
bqstorage_client = bigquery_storage.BigQueryReadClient()
for df in bigquery_result.result().to_dataframe_iterable(bqstorage_client=bqstorage_client, max_queue_size=2):
    pass

# Works fine:
for row in bigquery_result.result():
    pass

Is max_queue_size not propagated or something like that? The table I'm reading from is 24gb in size and not partitioned. I've been trying to use tracemalloc etc to track down what's going on, but not been successful. Happy to help add debug information if anyone has any ideas on how to resolve this one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigqueryIssues related to the googleapis/python-bigquery API.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions