Skip to content

Commit ca1ab42

Browse files
committed
BUG: More followups on to_datetime exceptions, xref #13033
Author: Jeff Reback <[email protected]> Closes #13059 from jreback/to_datetime3 and squashes the following commits: 6cd8e0f [Jeff Reback] BUG: More followups on to_datetime exceptions, xref #13033
1 parent c6110e2 commit ca1ab42

File tree

4 files changed

+138
-64
lines changed

4 files changed

+138
-64
lines changed

doc/source/whatsnew/v0.18.1.txt

Lines changed: 73 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ Highlights include:
1111

1212
- ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see :ref:`here <whatsnew_0181.deferred_ops>`
1313
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here <whatsnew_0181.enhancements.assembling>`
14+
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.
1415
- Custom business hour offset, see :ref:`here <whatsnew_0181.enhancements.custombusinesshour>`.
1516
- Many bug fixes in the handling of ``sparse``, see :ref:`here <whatsnew_0181.sparse>`
16-
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.
1717
- Expanded the :ref:`Tutorials section <tutorial-modern>` with a feature on modern pandas, courtesy of `@TomAugsburger <https://twitter.com/TomAugspurger>`__. (:issue:`13045`).
1818

1919

@@ -40,12 +40,19 @@ see :ref:`Custom Business Hour <timeseries.custombusinesshour>` (:issue:`11514`)
4040
from pandas.tseries.offsets import CustomBusinessHour
4141
from pandas.tseries.holiday import USFederalHolidayCalendar
4242
bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())
43-
# Friday before MLK Day
43+
44+
Friday before MLK Day
45+
46+
.. ipython:: python
47+
4448
dt = datetime(2014, 1, 17, 15)
4549

4650
dt + bhour_us
4751

48-
# Tuesday after MLK Day (Monday is skipped because it's a holiday)
52+
Tuesday after MLK Day (Monday is skipped because it's a holiday)
53+
54+
.. ipython:: python
55+
4956
dt + bhour_us * 2
5057

5158
.. _whatsnew_0181.deferred_ops:
@@ -102,8 +109,8 @@ Now you can do:
102109
Method chaininng improvements
103110
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
104111

105-
The following methods / indexers now accept ``callable``. It is intended to make
106-
these more useful in method chains, see :ref:`Selection By Callable <indexing.callable>`.
112+
The following methods / indexers now accept a ``callable``. It is intended to make
113+
these more useful in method chains, see the :ref:`documentation <indexing.callable>`.
107114
(:issue:`11485`, :issue:`12533`)
108115

109116
- ``.where()`` and ``.mask()``
@@ -113,7 +120,7 @@ these more useful in method chains, see :ref:`Selection By Callable <indexing.ca
113120
``.where()`` and ``.mask()``
114121
""""""""""""""""""""""""""""
115122

116-
These can accept a callable as condition and ``other``
123+
These can accept a callable for the condition and ``other``
117124
arguments.
118125

119126
.. ipython:: python
@@ -126,8 +133,8 @@ arguments.
126133
``.loc[]``, ``.iloc[]``, ``.ix[]``
127134
""""""""""""""""""""""""""""""""""
128135

129-
These can accept a callable, and tuple of callable as a slicer. The callable
130-
can return valid ``bool`` indexer or anything which is valid for these indexer's input.
136+
These can accept a callable, and a tuple of callable as a slicer. The callable
137+
can return a valid boolean indexer or anything which is valid for these indexer's input.
131138

132139
.. ipython:: python
133140

@@ -141,7 +148,7 @@ can return valid ``bool`` indexer or anything which is valid for these indexer's
141148
"""""""""""""""
142149

143150
Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel.
144-
The callable must return valid input for ``[]`` indexing depending on its
151+
The callable must return a valid input for ``[]`` indexing depending on its
145152
class and index type.
146153

147154
.. ipython:: python
@@ -154,8 +161,10 @@ without using temporary variable.
154161
.. ipython:: python
155162

156163
bb = pd.read_csv('data/baseball.csv', index_col='id')
157-
(bb.groupby(['year', 'team']).sum()
158-
.loc[lambda df: df.r > 100])
164+
(bb.groupby(['year', 'team'])
165+
.sum()
166+
.loc[lambda df: df.r > 100]
167+
)
159168

160169
.. _whatsnew_0181.partial_string_indexing:
161170

@@ -174,8 +183,14 @@ Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiI
174183
['a', 'b']]))
175184
dft2
176185
dft2.loc['2013-01-05']
186+
187+
On other levels
188+
189+
.. ipython:: python
190+
177191
idx = pd.IndexSlice
178192
dft2 = dft2.swaplevel(0, 1).sort_index()
193+
dft2
179194
dft2.loc[idx[:, '2013-01-05'], :]
180195

181196
.. _whatsnew_0181.enhancements.assembling:
@@ -225,7 +240,9 @@ Other Enhancements
225240
.. ipython:: python
226241

227242
idx = pd.Index([1., 2., 3., 4.], dtype='float')
228-
idx.take([2, -1]) # default, allow_fill=True, fill_value=None
243+
244+
# default, allow_fill=True, fill_value=None
245+
idx.take([2, -1])
229246
idx.take([2, -1], fill_value=True)
230247

231248
- ``Index`` now supports ``.str.get_dummies()`` which returns ``MultiIndex``, see :ref:`Creating Indicator Variables <text.indicator>` (:issue:`10008`, :issue:`10103`)
@@ -362,7 +379,7 @@ New Behavior:
362379
numpy function compatibility
363380
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
364381

365-
Compatibility between pandas array-like methods (e.g. ```sum`` and ``take``) and their ``numpy``
382+
Compatibility between pandas array-like methods (e.g. ``sum`` and ``take``) and their ``numpy``
366383
counterparts has been greatly increased by augmenting the signatures of the ``pandas`` methods so
367384
as to accept arguments that can be passed in from ``numpy``, even if they are not necessarily
368385
used in the ``pandas`` implementation (:issue:`12644`, :issue:`12638`, :issue:`12687`)
@@ -436,12 +453,12 @@ New Behavior:
436453
Changes in ``read_csv`` exceptions
437454
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
438455

439-
In order to standardize the ``read_csv`` API for both the C and Python engines, both will now raise an
456+
In order to standardize the ``read_csv`` API for both the ``c`` and ``python`` engines, both will now raise an
440457
``EmptyDataError``, a subclass of ``ValueError``, in response to empty columns or header (:issue:`12493`, :issue:`12506`)
441458

442459
Previous behaviour:
443460

444-
.. code-block:: python
461+
.. code-block:: ipython
445462

446463
In [1]: df = pd.read_csv(StringIO(''), engine='c')
447464
...
@@ -453,7 +470,7 @@ Previous behaviour:
453470

454471
New behaviour:
455472

456-
.. code-block:: python
473+
.. code-block:: ipython
457474

458475
In [1]: df = pd.read_csv(StringIO(''), engine='c')
459476
...
@@ -465,10 +482,10 @@ New behaviour:
465482

466483
In addition to this error change, several others have been made as well:
467484

468-
- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception`` (:issue:`12551`)
469-
- A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the C engine cannot parse a column (:issue:`12506`)
470-
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the C engine encounters a ``NaN`` value in an integer column (:issue:`12506`)
471-
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the C engine encounters an element in a column containing unencodable bytes (:issue:`12506`)
485+
- ``CParserError`` now sub-classes ``ValueError`` instead of just a ``Exception`` (:issue:`12551`)
486+
- A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine cannot parse a column (:issue:`12506`)
487+
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine encounters a ``NaN`` value in an integer column (:issue:`12506`)
488+
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the ``c`` engine encounters an element in a column containing unencodable bytes (:issue:`12506`)
472489
- ``pandas.parser.OverflowError`` exception has been removed and has been replaced with Python's built-in ``OverflowError`` exception (:issue:`12506`)
473490
- ``pd.read_csv()`` no longer allows a combination of strings and integers for the ``usecols`` parameter (:issue:`12678`)
474491

@@ -478,24 +495,33 @@ In addition to this error change, several others have been made as well:
478495
``to_datetime`` error changes
479496
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
480497

481-
Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'`` (:issue:`11758`, :issue:`13052`)
498+
Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'``. Furthermore, an ``OutOfBoundsDateime`` exception will be raised when an out-of-range value is encountered for that unit when ``errors='raise'``. (:issue:`11758`, :issue:`13052`, :issue:`13059`)
482499

483500
Previous behaviour:
484501

485-
.. code-block:: python
502+
.. code-block:: ipython
486503

487504
In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce')
488505
Out[27]: NaT
489506

490507
In [28]: pd.to_datetime(11111111, unit='D', errors='ignore')
491508
OverflowError: Python int too large to convert to C long
492509

510+
In [29]: pd.to_datetime(11111111, unit='D', errors='raise')
511+
OverflowError: Python int too large to convert to C long
512+
493513
New behaviour:
494514

495-
.. ipython:: python
515+
.. code-block:: ipython
516+
517+
In [2]: pd.to_datetime(1420043460, unit='s', errors='coerce')
518+
Out[2]: Timestamp('2014-12-31 16:31:00')
519+
520+
In [3]: pd.to_datetime(11111111, unit='D', errors='ignore')
521+
Out[3]: 11111111
496522

497-
pd.to_datetime(1420043460, unit='s', errors='coerce')
498-
pd.to_datetime(11111111, unit='D', errors='ignore')
523+
In [4]: pd.to_datetime(11111111, unit='D', errors='raise')
524+
OutOfBoundsDatetime: cannot convert input with unit 'D'
499525

500526
.. _whatsnew_0181.api.other:
501527

@@ -505,14 +531,14 @@ Other API changes
505531
- ``.swaplevel()`` for ``Series``, ``DataFrame``, ``Panel``, and ``MultiIndex`` now features defaults for its first two parameters ``i`` and ``j`` that swap the two innermost levels of the index. (:issue:`12934`)
506532
- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
507533
- ``Period`` and ``PeriodIndex`` now raises ``IncompatibleFrequency`` error which inherits ``ValueError`` rather than raw ``ValueError`` (:issue:`12615`)
508-
- ``Series.apply`` for category dtype now applies the passed function to each ``.categories`` (not ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`)
509-
- ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (:issue:`5636`)
510-
- The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which Previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`)
534+
- ``Series.apply`` for category dtype now applies the passed function to each of the ``.categories`` (and not the ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`)
535+
- ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (matches the doc-string) (:issue:`5636`)
536+
- The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which, previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`)
511537
- ``pd.show_versions()`` now includes ``pandas_datareader`` version (:issue:`12740`)
512538
- Provide a proper ``__name__`` and ``__qualname__`` attributes for generic functions (:issue:`12021`)
513539
- ``pd.concat(ignore_index=True)`` now uses ``RangeIndex`` as default (:issue:`12695`)
514540
- ``pd.merge()`` and ``DataFrame.join()`` will show a ``UserWarning`` when merging/joining a single- with a multi-leveled dataframe (:issue:`9455`, :issue:`12219`)
515-
- Compat with SciPy > 0.17 for deprecated ``piecewise_polynomial`` interpolation method (:issue:`12887`)
541+
- Compat with ``scipy`` > 0.17 for deprecated ``piecewise_polynomial`` interpolation method; support for the replacement ``from_derivatives`` method (:issue:`12887`)
516542

517543
.. _whatsnew_0181.deprecations:
518544

@@ -578,7 +604,8 @@ Bug Fixes
578604
- Bug in ``pd.crosstab()`` where would silently ignore ``aggfunc`` if ``values=None`` (:issue:`12569`).
579605
- Potential segfault in ``DataFrame.to_json`` when serialising ``datetime.time`` (:issue:`11473`).
580606
- Potential segfault in ``DataFrame.to_json`` when attempting to serialise 0d array (:issue:`11299`).
581-
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values (:issue:`10778`).
607+
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values; now supports serialization of ``category``, ``sparse``, and ``datetime64[ns, tz]`` dtypes (:issue:`10778`).
608+
- Bug in ``DataFrame.to_json`` with unsupported dtype not passed to default handler (:issue:`12554`).
582609
- Bug in ``.align`` not returning the sub-class (:issue:`12983`)
583610
- Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`)
584611
- Bug in ``ABCPanel`` in which ``Panel4D`` was not being considered as a valid instance of this generic type (:issue:`12810`)
@@ -587,33 +614,32 @@ Bug Fixes
587614
- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)
588615

589616
- Bug in ``Timestamp.__repr__`` that caused ``pprint`` to fail in nested structures (:issue:`12622`)
590-
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by Pandas. See :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
617+
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by pandas. See the :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
591618
- Bug in ``.quantile()`` with interpolation may coerce to ``float`` unexpectedly (:issue:`12772`)
592-
- Bug in ``.quantile()`` with empty Series may return scalar rather than empty Series (:issue:`12772`)
619+
- Bug in ``.quantile()`` with empty ``Series`` may return scalar rather than empty ``Series`` (:issue:`12772`)
593620

594621

595622
- Bug in ``.loc`` with out-of-bounds in a large indexer would raise ``IndexError`` rather than ``KeyError`` (:issue:`12527`)
596623
- Bug in resampling when using a ``TimedeltaIndex`` and ``.asfreq()``, would previously not include the final fencepost (:issue:`12926`)
597-
- Bug in ``DataFrame.to_json`` with unsupported `dtype` not passed to default handler (:issue:`12554`).
598624

599625
- Bug in equality testing with a ``Categorical`` in a ``DataFrame`` (:issue:`12564`)
600626
- Bug in ``GroupBy.first()``, ``.last()`` returns incorrect row when ``TimeGrouper`` is used (:issue:`7453`)
601627

602628

603629

604-
- Bug in ``read_csv`` with the C engine when specifying ``skiprows`` with newlines in quoted items (:issue:`10911`, :issue:`12775`)
630+
- Bug in ``pd.read_csv()`` with the ``c`` engine when specifying ``skiprows`` with newlines in quoted items (:issue:`10911`, :issue:`12775`)
605631
- Bug in ``DataFrame`` timezone lost when assigning tz-aware datetime ``Series`` with alignment (:issue:`12981`)
606632

607633

608634

609635

610-
- Bug in ``value_counts`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
611-
- Bug in ``Series.value_counts()`` loses name if its dtype is category (:issue:`12835`)
636+
- Bug in ``.value_counts()`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
637+
- Bug in ``Series.value_counts()`` loses name if its dtype is ``category`` (:issue:`12835`)
612638
- Bug in ``Series.value_counts()`` loses timezone info (:issue:`12835`)
613639
- Bug in ``Series.value_counts(normalize=True)`` with ``Categorical`` raises ``UnboundLocalError`` (:issue:`12835`)
614640
- Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`)
615-
- Bug in ``read_csv`` when specifying ``names``, ``usecols``, and ``parse_dates`` simultaneously with the C engine (:issue:`9755`)
616-
- Bug in ``read_csv`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the C engine (:issue:`12912`)
641+
- Bug in ``pd.read_csv()`` when specifying ``names``, ``usecols``, and ``parse_dates`` simultaneously with the ``c`` engine (:issue:`9755`)
642+
- Bug in ``pd.read_csv()`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the ``c`` engine (:issue:`12912`)
617643
- Bug in ``Series.rename``, ``DataFrame.rename`` and ``DataFrame.rename_axis`` not treating ``Series`` as mappings to relabel (:issue:`12623`).
618644
- Clean in ``.rolling.min`` and ``.rolling.max`` to enhance dtype handling (:issue:`12373`)
619645
- Bug in ``groupby`` where complex types are coerced to float (:issue:`12902`)
@@ -635,25 +661,25 @@ Bug Fixes
635661

636662

637663

638-
- Bug in ``concat`` raises ``AttributeError`` when input data contains tz-aware datetime and timedelta (:issue:`12620`)
639-
- Bug in ``concat`` did not handle empty ``Series`` properly (:issue:`11082`)
664+
- Bug in ``pd.concat`` raises ``AttributeError`` when input data contains tz-aware datetime and timedelta (:issue:`12620`)
665+
- Bug in ``pd.concat`` did not handle empty ``Series`` properly (:issue:`11082`)
640666

641667
- Bug in ``.plot.bar`` alginment when ``width`` is specified with ``int`` (:issue:`12979`)
642668

643669

644670
- Bug in ``fill_value`` is ignored if the argument to a binary operator is a constant (:issue:`12723`)
645671

646-
- Bug in ``pd.read_html`` when using bs4 flavor and parsing table with a header and only one column (:issue:`9178`)
672+
- Bug in ``pd.read_html()`` when using bs4 flavor and parsing table with a header and only one column (:issue:`9178`)
647673

648-
- Bug in ``pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
649-
- Bug in ``pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
650-
- Bug in ``crosstab`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)
674+
- Bug in ``.pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
675+
- Bug in ``.pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
676+
- Bug in ``pd.crosstab()`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)
651677

652678
- Bug in ``Series.name`` when ``name`` attribute can be a hashable type (:issue:`12610`)
653679

654680
- Bug in ``.describe()`` resets categorical columns information (:issue:`11558`)
655681
- Bug where ``loffset`` argument was not applied when calling ``resample().count()`` on a timeseries (:issue:`12725`)
656682
- ``pd.read_excel()`` now accepts column names associated with keyword argument ``names`` (:issue:`12870`)
657-
- Bug in ``to_numeric`` with ``Index`` returns ``np.ndarray``, rather than ``Index`` (:issue:`12777`)
658-
- Bug in ``to_numeric`` with datetime-like may raise ``TypeError`` (:issue:`12777`)
659-
- Bug in ``to_numeric`` with scalar raises ``ValueError`` (:issue:`12777`)
683+
- Bug in ``pd.to_numeric()`` with ``Index`` returns ``np.ndarray``, rather than ``Index`` (:issue:`12777`)
684+
- Bug in ``pd.to_numeric()`` with datetime-like may raise ``TypeError`` (:issue:`12777`)
685+
- Bug in ``pd.to_numeric()`` with scalar raises ``ValueError`` (:issue:`12777`)

0 commit comments

Comments
 (0)