Skip to content

Uploading RECORD with ARRAY type using load_table_from_dataframe yields intermediate list.item structure #439

@jonyvp

Description

@jonyvp

I'm currently attempting to upload a pandas DF containing several columns which either contain a value, or a list of json objects, which result in a nested RECORD field on the bigquery side.

i.e. a column version might contain:
[{"type": "specific_api_name_1", "value": ["0.0.1"]}, {"type": "specific_api_name_2", "value": ["0.0.1"]}]
Uploading this results in the desired format: a nested RECORD in bigquery for a specific row.

Alas, this also yields the schema:

version                              | RECORD   | NULLABLE |   
version.list                         | RECORD   | REPEATED |   
version.list.item                    | RECORD   | NULLABLE |   
version.list.item.type               | STRING   | NULLABLE |   
version.list.item.value              | RECORD   | NULLABLE |   
version.list.item.value.list         | RECORD   | REPEATED |  
version.list.item.value.list.item    | STRING   | NULLABLE |

I would expect (or want) this to generate the schema:

version                              | RECORD   | NULLABLE |  
version.type                         | STRING   | NULLABLE |  
version.value                        | STRING   | REPEATED |  

How could this be achieved?

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.type: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions