Skip to content

Fix index for datetime64 conversion. Fixes #13937 #14446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 1 addition & 43 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -590,99 +590,57 @@ Bug Fixes
- Bug in ``DataFrame.loc`` with indexing a ``MultiIndex`` with a ``Series`` indexer (:issue:`14730`, :issue:`15424`)
- Bug in ``DataFrame.loc`` with indexing a ``MultiIndex`` with a numpy array (:issue:`15434`)
- Bug in ``Rolling.quantile`` function that caused a segmentation fault when called with a quantile value outside of the range [0, 1] (:issue:`15463`)


- Bug in the display of ``.info()`` where a qualifier (+) would always be displayed with a ``MultiIndex`` that contains only non-strings (:issue:`15245`)
- Bug in ``pd.read_msgpack()`` in which ``Series`` categoricals were being improperly processed (:issue:`14901`)
- Bug in ``Series.ffill()`` with mixed dtypes containing tz-aware datetimes. (:issue:`14956`)



- Bug in ``Series.where()`` and ``DataFrame.where()`` where array-like conditionals were being rejected (:issue:`15414`)
- Bug in ``Series`` construction with a datetimetz (:issue:`14928`)
- Bug in output formatting of a ``MultiIndex`` when names are integers (:issue:`12223`, :issue:`15262`)

- Bug in compat for passing long integers to ``Timestamp.replace`` (:issue:`15030`)
- Bug in ``.loc`` that would not return the correct dtype for scalar access for a DataFrame (:issue:`11617`)
- Bug in ``GroupBy.get_group()`` failing with a categorical grouper (:issue:`15155`)
- Bug in ``pandas.tools.utils.cartesian_product()`` with large input can cause overflow on windows (:issue:`15265`)



- Bug in ``.groupby(...).rolling(...)`` when ``on`` is specified and using a ``DatetimeIndex`` (:issue:`15130`)


- Bug in ``to_sql`` when writing a DataFrame with numeric index names (:issue:`15404`).
- Bug in ``Series.iloc`` where a ``Categorical`` object for list-like indexes input was returned, where a ``Series`` was expected. (:issue:`14580`)



- Bug in groupby operations with timedelta64 when passing ``numeric_only=False`` (:issue:`5724`)


- Bug in ``DataFrame.to_html`` with ``index=False`` and ``max_rows`` raising in ``IndexError`` (:issue:`14998`)

- Bug in ``Categorical.searchsorted()`` where alphabetical instead of the provided categorical order was used (:issue:`14522`)



- Bug in ``resample``, where a non-string ```loffset`` argument would not be applied when resampling a timeseries (:issue:`13218`)



- Bug in ``.rank()`` which incorrectly ranks ordered categories (:issue:`15420`)
- Bug in ``.corr()`` and ``.cov()`` where the column and index were the same object (:issue:`14617`)


- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`)
- Bug in ``pd.pivot_table()`` where no error was raised when values argument was not in the columns (:issue:`14938`)

- Bug in ``.to_json()`` where ``lines=True`` and contents (keys or values) contain escaped characters (:issue:`15096`)
- Bug in ``.to_json()`` causing single byte ascii characters to be expanded to four byte unicode (:issue:`15344`)
- Bug in ``.read_json()`` for Python 2 where ``lines=True`` and contents contain non-ascii unicode characters (:issue:`15132`)
- Bug in ``.rolling/expanding()`` functions where ``count()`` was not counting ``np.Inf``, nor handling ``object`` dtypes (:issue:`12541`)
- Bug in ``.rolling()`` where ``pd.Timedelta`` or ``datetime.timedelta`` was not accepted as a ``window`` argument (:issue:`15440`)
- Bug in ``DataFrame.resample().median()`` if duplicate column names are present (:issue:`14233`)

- Bug in ``DataFrame.groupby().describe()`` when grouping on ``Index`` containing tuples (:issue:`14848`)
- Bug in creating a ``MultiIndex`` with tuples and not passing a list of names; this will now raise ``ValueError`` (:issue:`15110`)
- Bug in ``groupby().nunique()`` with a datetimelike-grouper where bins counts were incorrect (:issue:`13453`)

- Bug in catching an overflow in ``Timestamp`` + ``Timedelta/Offset`` operations (:issue:`15126`)
- Bug in the HTML display with with a ``MultiIndex`` and truncation (:issue:`14882`)


- Bug in ``pd.merge_asof()`` where ``left_index``/``right_index`` together caused a failure when ``tolerance`` was specified (:issue:`15135`)





- Bug in ``Series`` constructor when both ``copy=True`` and ``dtype`` arguments are provided (:issue:`15125`)
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
- Incorrect dtyped ``Series`` was returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty ``DataFrame`` (:issue:`15077`)
- Bug in ``Series.dt.round`` inconsistent behaviour on NAT's with different arguments (:issue:`14940`)
- Bug in ``DataFrame.fillna()`` where the argument ``downcast`` was ignored when fillna value was of type ``dict`` (:issue:`15277`)
- Bug in ``.reset_index()`` when an all ``NaN`` level of a ``MultiIndex`` would fail (:issue:`6322`)

- Bug in ``pd.read_msgpack()`` when deserializing a ``CategoricalIndex`` (:issue:`15487`)
- Bug in ``pd.DataFrame.to_records()`` which failed with unicode characters in column names (:issue:`11879`)


- Bug in ``pd.read_csv()`` with ``float_precision='round_trip'`` which caused a segfault when a text entry is parsed (:issue:`15140`)

- Bug in ``DataFrame.to_stata()`` and ``StataWriter`` which produces incorrectly formatted files to be produced for some locales (:issue:`13856`)
- Bug in ``pd.concat()`` in which concatting with an empty dataframe with ``join='inner'`` was being improperly handled (:issue:`15328`)
- Bug in ``groupby.agg()`` incorrectly localizing timezone on ``datetime`` (:issue:`15426`, :issue:`10668`, :issue:`13046`)



- Bug in ``.read_csv()`` with ``parse_dates`` when multiline headers are specified (:issue:`15376`)
- Bug in ``groupby.transform()`` that would coerce the resultant dtypes back to the original (:issue:`10972`, :issue:`11444`)

- Bug in ``DataFrame.hist`` where ``plt.tight_layout`` caused an ``AttributeError`` (use ``matplotlib >= 0.2.0``) (:issue:`9351`)
- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
- Bug in ``DataFrame.to_records()`` with converting datetime64 index with timezone (:issue: `13937`)
4 changes: 2 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
is_object_dtype,
is_extension_type,
is_datetimetz,
is_datetime64_dtype,
is_datetime64_any_dtype,
is_datetime64tz_dtype,
is_bool_dtype,
is_integer_dtype,
Expand Down Expand Up @@ -1086,7 +1086,7 @@ def to_records(self, index=True, convert_datetime64=True):
y : recarray
"""
if index:
if is_datetime64_dtype(self.index) and convert_datetime64:
if is_datetime64_any_dtype(self.index) and convert_datetime64:
ix_vals = [self.index.to_pydatetime()]
else:
if isinstance(self.index, MultiIndex):
Expand Down
48 changes: 42 additions & 6 deletions pandas/tests/frame/test_convert_to.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,11 @@

from __future__ import print_function

from numpy import nan
import numpy as np

from pandas import compat
from pandas import (DataFrame, Series, MultiIndex, Timestamp,
date_range)
from numpy import nan

import pandas.util.testing as tm

from pandas import DataFrame, MultiIndex, Series, Timestamp, compat, date_range
from pandas.tests.frame.common import TestData


Expand Down Expand Up @@ -192,3 +188,43 @@ def test_to_records_with_unicode_column_names(self):
"formats": ['<i8', '<f8']}
)
tm.assert_almost_equal(result, expected)

def test_to_records_with_tz(self):
# GH13937
date_range_tz_utc = date_range('2016-01-01', periods=10,
freq='S', tz='UTC')
date_range_tz_gmt = date_range('2016-01-01', periods=10,
freq='S', tz='GMT')

df_utc = DataFrame({'datetime': date_range_tz_utc},
index=date_range_tz_utc)
df_gmt = DataFrame({'datetime': date_range_tz_gmt},
index=date_range_tz_gmt)

df_utc_expected = df_utc.to_records()
df_gmt_result = df_utc.tz_convert("GMT").to_records()

df_gmt_expected = df_gmt.to_records()
df_utc_result = df_gmt.tz_convert("UTC").to_records()

# Check that it does not shows same time zone after conversion.
tm.assertIsNot(df_utc_expected.index[0].tzinfo,
df_gmt_result.index[0].tzinfo)

# Check df.to_records() are generated properly
tm.assert_numpy_array_equal(df_utc_expected,
df_gmt_result)

# Test DataFrame.to_records() with timezone conversion to UTC

expected = df_utc_expected['datetime']

result = df_gmt_result['datetime']

tm.assert_numpy_array_equal(expected, result)

tm.assertIsNot(df_gmt_expected.index[0].tzinfo,
df_utc_result.index[0].tzinfo)

tm.assert_numpy_array_equal(df_utc_expected,
df_gmt_result)