Skip to content

Commit 7aeacca

Browse files
committed
DOC: Document BigQuery to dtype translation for read_gbq
Adds a table documenting the current behavior, including that pandas 0.24.0 stores as time zone aware dtype and earlier versions store naive. I could not figure out how to make 0.24.0+ store as a naive dtype, nor could I figure out how to make earlier versions use time zone aware.
1 parent 7edfc3e commit 7aeacca

File tree

3 files changed

+64
-19
lines changed

3 files changed

+64
-19
lines changed

docs/source/changelog.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@ Changelog
88

99
- This fixes a bug where pandas-gbq could not upload an empty database. (:issue:`237`)
1010

11+
Documentation
12+
~~~~~~~~~~~~~
13+
14+
- Document :ref:`BigQuery data type to pandas dtype conversion
15+
<reading-dtypes>` for ``read_gbq``. (:issue:`TBD`)
16+
1117
Dependency updates
1218
~~~~~~~~~~~~~~~~~~
1319

docs/source/reading.rst

Lines changed: 55 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,32 @@ Suppose you want to load all data from an existing BigQuery table
99

1010
.. code-block:: python
1111
12-
# Insert your BigQuery Project ID Here
13-
# Can be found in the Google web console
12+
import pandas_gbq
13+
14+
# TODO: Set your BigQuery Project ID.
1415
projectid = "xxxxxxxx"
1516
16-
data_frame = read_gbq('SELECT * FROM test_dataset.test_table', projectid)
17+
data_frame = pandas_gbq.read_gbq(
18+
'SELECT * FROM `test_dataset.test_table`',
19+
project_id=projectid)
20+
21+
.. note::
1722

23+
A project ID is sometimes optional if it can be inferred during
24+
authentication, but it is required when authenticating with user
25+
credentials. You can find your project ID in the `Google Cloud console
26+
<https://console.cloud.google.com>`__.
1827

1928
You can define which column from BigQuery to use as an index in the
2029
destination DataFrame as well as a preferred column order as follows:
2130

2231
.. code-block:: python
2332
24-
data_frame = read_gbq('SELECT * FROM test_dataset.test_table',
25-
index_col='index_column_name',
26-
col_order=['col1', 'col2', 'col3'], projectid)
33+
data_frame = pandas_gbq.read_gbq(
34+
'SELECT * FROM `test_dataset.test_table`',
35+
project_id=projectid,
36+
index_col='index_column_name',
37+
col_order=['col1', 'col2', 'col3'])
2738
2839
2940
You can specify the query config as parameter to use additional options of
@@ -37,20 +48,45 @@ your job. For more information about query configuration parameters see `here
3748
"useQueryCache": False
3849
}
3950
}
40-
data_frame = read_gbq('SELECT * FROM test_dataset.test_table',
41-
configuration=configuration, projectid)
42-
51+
data_frame = read_gbq(
52+
'SELECT * FROM `test_dataset.test_table`',
53+
project_id=projectid,
54+
configuration=configuration)
4355
44-
.. note::
45-
46-
You can find your project id in the `Google developers console
47-
<https://console.developers.google.com>`__.
4856
57+
The ``dialect`` argument can be used to indicate whether to use
58+
BigQuery's ``'legacy'`` SQL or BigQuery's ``'standard'`` SQL (beta). The
59+
default value is ``'standard'`` For more information on BigQuery's standard
60+
SQL, see `BigQuery SQL Reference
61+
<https://cloud.google.com/bigquery/docs/reference/standard-sql/>`__
4962

50-
.. note::
63+
.. code-block:: python
5164
52-
The ``dialect`` argument can be used to indicate whether to use BigQuery's ``'legacy'`` SQL
53-
or BigQuery's ``'standard'`` SQL (beta). The default value is ``'legacy'``, though this will change
54-
in a subsequent release to ``'standard'``. For more information
55-
on BigQuery's standard SQL, see `BigQuery SQL Reference
56-
<https://cloud.google.com/bigquery/sql-reference/>`__
65+
data_frame = pandas_gbq.read_gbq(
66+
'SELECT * FROM [test_dataset.test_table]',
67+
project_id=projectid,
68+
dialect='legacy')
69+
70+
71+
.. _reading-dtypes:
72+
73+
Inferring the DataFrame's dtypes
74+
--------------------------------
75+
76+
The :func:`~pandas_gbq.read_gbq` method infers the pandas dtype for each column, based on the BigQuery table schema.
77+
78+
================== =========================
79+
BigQuery Data Type dtype
80+
================== =========================
81+
FLOAT float
82+
------------------ -------------------------
83+
TIMESTAMP **pandas versions 0.24.0+**
84+
:class:`~pandas.DatetimeTZDtype` with ``unit='ns'`` and
85+
``tz='UTC'``
86+
**Earlier versions**
87+
datetime64[ns]
88+
------------------ -------------------------
89+
DATETIME datetime64[ns]
90+
TIME datetime64[ns]
91+
DATE datetime64[ns]
92+
================== =========================

pandas_gbq/gbq.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -650,6 +650,9 @@ def _bqschema_to_nullsafe_dtypes(schema_fields):
650650
# See:
651651
# http://pandas.pydata.org/pandas-docs/dev/missing_data.html
652652
# #missing-data-casting-rules-and-indexing
653+
#
654+
# If you update this mapping, also update the table at
655+
# `docs/source/reading.rst`.
653656
dtype_map = {
654657
"FLOAT": np.dtype(float),
655658
# Even though TIMESTAMPs are timezone-aware in BigQuery, pandas doesn't

0 commit comments

Comments
 (0)