Skip to content

client.insert_rows should fail when inserting non-existing fields #151

@simonvanderveldt

Description

@simonvanderveldt

Version

google-cloud-bigquery==1.25.0

The client.insert_rows() function doesn't fail when inserting non-existing fields whereas the BigQuery API does fail with a message like

{
  "kind": "bigquery#tableDataInsertAllResponse",
  "insertErrors": [
    {
      "index": 0,
      "errors": [
        {
          "reason": "invalid",
          "location": "zap",
          "debugInfo": "",
          "message": "no such field."
        }
      ]
    }
  ]
}

insert_rows() silently drops the additional columns instead.

This happens because insert_rows() uses _record_field_to_json which only iterates over the list of fields that are provided ignoring all the other fields that are part of the data and insert_rows() passes the table's schema as list of fields to _record_field_to_json.

This behavior is opposite to the BigQuery API and means we cannot reliably insert data because we're not made aware of changes to the incoming data because there's no failure.
IMHO this behavior is not correct, I think it would be OK if selected_fields was provided but it should not silently use the schema to limit which of the fields of the input data are processed and ignore the rest.
I can image there might be cases where one wants to be lenient/ignore all fields that are not part of table so this behavior might have to be an option, possibly combined with selected_fields.

P.S. By extension this also applies to the client.insert_rows_from_dataframe() function which uses client.insert_rows().

P.P.S We initially ran into this when using insert_rows_from_dataframe() and it was a bit of a search to find where this was going wrong because it's a somewhat indirect chain of insert_rows_from_dataframe -> insert_rows -> insert_rows_json.
Why was this long way chosen/added instead of simply using insert_rows_json(table, df.to_dict(orient="records"))? It seems a lot simpler and will probably be the workaround we'll implement for now.

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions