Skip to content

500 server error when creating table using clustering #242

Closed
@charlielito

Description

@charlielito

Environment details

  • OS type and version: Ubuntu20 PopOs
  • Python version: 3.7.8
  • pip version: 20.1.1
  • google-cloud-bigquery version: 1.27.2

Steps to reproduce

I'm creating a table with some columns, one of them is of type GEOGRAHPY. When I try to create the table with a sample data, if I choose to use clustering, I got the 500 error. I can create the table only if no clustering is made. Also I can create the table with clustering if I don't include the column of type GEOGRAHPY.
Code with a toy example to reproduce it:

Code example

import time
import pandas as pd
from google.cloud import bigquery
from shapely.geometry import Point

client = bigquery.Client()
PROJECT_ID = ""
table_id = f"{PROJECT_ID}.data_capture.toy"

df = pd.DataFrame(
    dict(
        lat=[6.208969] * 100,
        lon=[-75.571696] * 100,
        logged_at=[int(time.time() * 1000) for _ in range(100)],
    )
)
df["point"] = df.apply(lambda row: Point(row["lon"], row["lat"]).wkb_hex, axis=1)

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("lon", "FLOAT64", "REQUIRED"),
        bigquery.SchemaField("lat", "FLOAT64", "REQUIRED"),
        bigquery.SchemaField("point", "GEOGRAPHY", "REQUIRED"),
        bigquery.SchemaField("logged_at", "TIMESTAMP", "REQUIRED"),
    ],
    write_disposition="WRITE_TRUNCATE",
    time_partitioning=bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY, field="logged_at",
    ),
    clustering_fields=["logged_at"],
)

job = client.load_table_from_dataframe(
    df, table_id, job_config=job_config
)  # Make an API request.
job.result()  # Wait for the job to complete.

Stack trace

Traceback (most recent call last):
  File "test.py", line 108, in <module>
    job.result()  # Wait for the job to complete.
  File "/home/charlie/data/kiwi/data-upload/.venv/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 812, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/home/charlie/data/kiwi/data-upload/.venv/lib/python3.7/site-packages/google/api_core/future/polling.py", line 130, in result
    raise self._exception
google.api_core.exceptions.InternalServerError: 500 An internal error occurred and the request could not be completed. Error: 3144498

Thank you in advance!

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.externalThis issue is blocked on a bug with the actual product.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions