client.insert_rows should fail when inserting non-existing fields

Version
```
google-cloud-bigquery==1.25.0
```

The [`client.insert_rows()`](https://github.com/googleapis/python-bigquery/blob/v1.25.0/google/cloud/bigquery/client.py#L2471) function doesn't fail when inserting non-existing fields whereas the BigQuery API does fail with a message like
```
{
  "kind": "bigquery#tableDataInsertAllResponse",
  "insertErrors": [
    {
      "index": 0,
      "errors": [
        {
          "reason": "invalid",
          "location": "zap",
          "debugInfo": "",
          "message": "no such field."
        }
      ]
    }
  ]
}
```
`insert_rows()` silently drops the additional columns instead.

This happens because `insert_rows()` uses [`_record_field_to_json`](https://github.com/googleapis/python-bigquery/blob/3869e34f3eff6fabca21c397e3f1cbc368ec880e/google/cloud/bigquery/_helpers.py#L409) which only iterates over the list of fields that are provided ignoring all the other fields that are part of the data and `insert_rows()` passes the table's schema as list of fields to `_record_field_to_json`.

This behavior is opposite to the BigQuery API and means we cannot reliably insert data because we're not made aware of changes to the incoming data because there's no failure.
IMHO this behavior is not correct, I think it would be OK if `selected_fields` was provided but it should not silently use the schema to limit which of the fields of the input data are processed and ignore the rest.
I can image there might be cases where one wants to be lenient/ignore all fields that are not part of table so this behavior might have to be an option, possibly combined with `selected_fields`.

P.S. By extension this also applies to the [`client.insert_rows_from_dataframe()`](https://github.com/googleapis/python-bigquery/blob/v1.25.0/google/cloud/bigquery/client.py#L2533) function which uses `client.insert_rows()`.

P.P.S We initially ran into this when using `insert_rows_from_dataframe()` and it was a bit of a search to find where this was going wrong because it's a somewhat indirect chain of `insert_rows_from_dataframe -> insert_rows -> insert_rows_json`.
Why was this long way chosen/added instead of simply using `insert_rows_json(table, df.to_dict(orient="records"))`? It seems a lot simpler and will probably be the workaround we'll implement for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

client.insert_rows should fail when inserting non-existing fields #151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

client.insert_rows should fail when inserting non-existing fields #151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions