Skip to content

BigQuery: to_dataframe does not respect date fields #11

@QuinRiva

Description

@QuinRiva

Environment details

  1. google-cloud-bigquery: version 1.22
  2. OS type and version: Centos 7
  3. Python version: 3.6.9

Steps to reproduce

  1. Download a table from BigQuery to a pandas Dataframe

Code example

df = bq_client.list_rows(table).to_dataframe()
df.types

contractId                       int64
contractTypeId                   int64
affiliateId                      int64
invoicePeriodId                  int64
startDate                       **object**
endDate                         **object**
contractName                    object
details                         object
updated            datetime64[ns, UTC]
created            datetime64[ns, UTC]
dtype: object

Timestamps seem to work fine, but date is just being treated as a string.

Table schema is defined as:

'contract' : [
            bigquery.SchemaField("contractId", "INTEGER", mode="REQUIRED"),
            bigquery.SchemaField("contractTypeId", "INTEGER", mode="REQUIRED"),
            bigquery.SchemaField("affiliateId", "INTEGER", mode="REQUIRED"),
            bigquery.SchemaField("invoicePeriodId", "INTEGER", mode="REQUIRED"),
            bigquery.SchemaField("startDate", "DATE", mode="REQUIRED"),
            bigquery.SchemaField("endDate", "DATE", mode="NULLABLE"),
            bigquery.SchemaField("contractName", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("details", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("created", "TIMESTAMP", mode="REQUIRED"),
            bigquery.SchemaField("updated", "TIMESTAMP", mode="REQUIRED"),
        ]

This should have anything to do with pagination, because the startDate field is required; also the table only has 20 records.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions