-
Notifications
You must be signed in to change notification settings - Fork 322
Closed
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.
Description
Environment details
- OS type and version: macOS 13
- Python version: Python 3.10.7
- pip version: pip 23.2
google-cloud-bigquery
version: 3.11.4
Steps to reproduce
- Create a table with a
BIGNUMERIC
type column - Use
Client.load_table_from_dataframe()
to load a column with the valueDecimal("0.12345678901234560000000000000000000000")
Code example
df = pd.DataFrame("TEST_COL", Decimal("0.12345678901234560000000000000000000000"))
with bigquery.Client(project=project, credentials=credentials) as client:
load_job = client.load_table_from_dataframe(df, table_name)
result = load_job.result()
Stack trace
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Rescaling Decimal128 value would cause data loss
Details
I poked around a bit, and I think I see the problem. To determine the column type you create an arrow table
arrow_table = pyarrow.array(dataframe.reset_index()[field.name]) |
detected_type = ARROW_SCALAR_IDS_TO_BQ.get(arrow_table.type.id) |
My test decimal value can be represented by an arrow 128-bit type
>>>pyarrow.array(pd.DataFrame({"x": [Decimal('0.12345678901234560000000000000000000000')]})["x"]).type
Decimal128Type(decimal128(38, 38))
But AFAICT that has the same type ID as every other Decimal128Type, so it maps to "NUMERIC" in your lookup table:
pyarrow.decimal128(38, scale=9).id: "NUMERIC", |
So when you convert to pyarrow before uploading, it tries to coerce using your hardcoded type for "NUMERIC"
return pyarrow.Array.from_pandas(series, type=arrow_type) |
and fails because it needed
decimal128(38, 38)
to represent the value, but is given decimal128(38, 9)
. I think you will need to detect if a 128 bit type can't fit in your NUMERIC subtype and upgrade those to BIGNUMERIC.
Hope that helps!
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.