You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BUG: More followups on to_datetime exceptions, xref #13033
Author: Jeff Reback <[email protected]>
Closes#13059 from jreback/to_datetime3 and squashes the following commits:
6cd8e0f [Jeff Reback] BUG: More followups on to_datetime exceptions, xref #13033
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v0.18.1.txt
+73-47Lines changed: 73 additions & 47 deletions
Original file line number
Diff line number
Diff line change
@@ -11,9 +11,9 @@ Highlights include:
11
11
12
12
- ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see :ref:`here <whatsnew_0181.deferred_ops>`
13
13
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here <whatsnew_0181.enhancements.assembling>`
14
+
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.
14
15
- Custom business hour offset, see :ref:`here <whatsnew_0181.enhancements.custombusinesshour>`.
15
16
- Many bug fixes in the handling of ``sparse``, see :ref:`here <whatsnew_0181.sparse>`
16
-
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.
17
17
- Expanded the :ref:`Tutorials section <tutorial-modern>` with a feature on modern pandas, courtesy of `@TomAugsburger <https://twitter.com/TomAugspurger>`__. (:issue:`13045`).
18
18
19
19
@@ -40,12 +40,19 @@ see :ref:`Custom Business Hour <timeseries.custombusinesshour>` (:issue:`11514`)
40
40
from pandas.tseries.offsets import CustomBusinessHour
41
41
from pandas.tseries.holiday import USFederalHolidayCalendar
- ``Index`` now supports ``.str.get_dummies()`` which returns ``MultiIndex``, see :ref:`Creating Indicator Variables <text.indicator>` (:issue:`10008`, :issue:`10103`)
@@ -362,7 +379,7 @@ New Behavior:
362
379
numpy function compatibility
363
380
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
364
381
365
-
Compatibility between pandas array-like methods (e.g. ```sum`` and ``take``) and their ``numpy``
382
+
Compatibility between pandas array-like methods (e.g. ``sum`` and ``take``) and their ``numpy``
366
383
counterparts has been greatly increased by augmenting the signatures of the ``pandas`` methods so
367
384
as to accept arguments that can be passed in from ``numpy``, even if they are not necessarily
368
385
used in the ``pandas`` implementation (:issue:`12644`, :issue:`12638`, :issue:`12687`)
@@ -436,12 +453,12 @@ New Behavior:
436
453
Changes in ``read_csv`` exceptions
437
454
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
438
455
439
-
In order to standardize the ``read_csv`` API for both the C and Python engines, both will now raise an
456
+
In order to standardize the ``read_csv`` API for both the ``c`` and ``python`` engines, both will now raise an
440
457
``EmptyDataError``, a subclass of ``ValueError``, in response to empty columns or header (:issue:`12493`, :issue:`12506`)
441
458
442
459
Previous behaviour:
443
460
444
-
.. code-block:: python
461
+
.. code-block:: ipython
445
462
446
463
In [1]: df = pd.read_csv(StringIO(''), engine='c')
447
464
...
@@ -453,7 +470,7 @@ Previous behaviour:
453
470
454
471
New behaviour:
455
472
456
-
.. code-block:: python
473
+
.. code-block:: ipython
457
474
458
475
In [1]: df = pd.read_csv(StringIO(''), engine='c')
459
476
...
@@ -465,10 +482,10 @@ New behaviour:
465
482
466
483
In addition to this error change, several others have been made as well:
467
484
468
-
- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception`` (:issue:`12551`)
469
-
- A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the C engine cannot parse a column (:issue:`12506`)
470
-
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the C engine encounters a ``NaN`` value in an integer column (:issue:`12506`)
471
-
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the C engine encounters an element in a column containing unencodable bytes (:issue:`12506`)
485
+
- ``CParserError`` now sub-classes ``ValueError`` instead of just a ``Exception`` (:issue:`12551`)
486
+
- A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine cannot parse a column (:issue:`12506`)
487
+
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine encounters a ``NaN`` value in an integer column (:issue:`12506`)
488
+
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the ``c`` engine encounters an element in a column containing unencodable bytes (:issue:`12506`)
472
489
- ``pandas.parser.OverflowError`` exception has been removed and has been replaced with Python's built-in ``OverflowError`` exception (:issue:`12506`)
473
490
- ``pd.read_csv()`` no longer allows a combination of strings and integers for the ``usecols`` parameter (:issue:`12678`)
474
491
@@ -478,24 +495,33 @@ In addition to this error change, several others have been made as well:
478
495
``to_datetime`` error changes
479
496
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
480
497
481
-
Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'``(:issue:`11758`, :issue:`13052`)
498
+
Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'``. Furthermore, an ``OutOfBoundsDateime`` exception will be raised when an out-of-range value is encountered for that unit when ``errors='raise'``. (:issue:`11758`, :issue:`13052`, :issue:`13059`)
482
499
483
500
Previous behaviour:
484
501
485
-
.. code-block:: python
502
+
.. code-block:: ipython
486
503
487
504
In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce')
488
505
Out[27]: NaT
489
506
490
507
In [28]: pd.to_datetime(11111111, unit='D', errors='ignore')
491
508
OverflowError: Python int too large to convert to C long
492
509
510
+
In [29]: pd.to_datetime(11111111, unit='D', errors='raise')
511
+
OverflowError: Python int too large to convert to C long
512
+
493
513
New behaviour:
494
514
495
-
.. ipython:: python
515
+
.. code-block:: ipython
516
+
517
+
In [2]: pd.to_datetime(1420043460, unit='s', errors='coerce')
518
+
Out[2]: Timestamp('2014-12-31 16:31:00')
519
+
520
+
In [3]: pd.to_datetime(11111111, unit='D', errors='ignore')
In [4]: pd.to_datetime(11111111, unit='D', errors='raise')
524
+
OutOfBoundsDatetime: cannot convert input with unit'D'
499
525
500
526
.. _whatsnew_0181.api.other:
501
527
@@ -505,14 +531,14 @@ Other API changes
505
531
- ``.swaplevel()`` for ``Series``, ``DataFrame``, ``Panel``, and ``MultiIndex`` now features defaults for its first two parameters ``i`` and ``j`` that swap the two innermost levels of the index. (:issue:`12934`)
506
532
- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
507
533
- ``Period`` and ``PeriodIndex`` now raises ``IncompatibleFrequency`` error which inherits ``ValueError`` rather than raw ``ValueError`` (:issue:`12615`)
508
-
- ``Series.apply`` for category dtype now applies the passed function to each ``.categories`` (not ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`)
509
-
- ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (:issue:`5636`)
510
-
- The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which Previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`)
534
+
- ``Series.apply`` for category dtype now applies the passed function to each of the ``.categories`` (and not the ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`)
535
+
- ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (matches the doc-string) (:issue:`5636`)
536
+
- The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which, previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`)
511
537
- ``pd.show_versions()`` now includes ``pandas_datareader`` version (:issue:`12740`)
512
538
- Provide a proper ``__name__`` and ``__qualname__`` attributes for generic functions (:issue:`12021`)
513
539
- ``pd.concat(ignore_index=True)`` now uses ``RangeIndex`` as default (:issue:`12695`)
514
540
- ``pd.merge()`` and ``DataFrame.join()`` will show a ``UserWarning`` when merging/joining a single- with a multi-leveled dataframe (:issue:`9455`, :issue:`12219`)
515
-
- Compat with SciPy > 0.17 for deprecated ``piecewise_polynomial`` interpolation method (:issue:`12887`)
541
+
- Compat with ``scipy`` > 0.17 for deprecated ``piecewise_polynomial`` interpolation method; support for the replacement ``from_derivatives`` method (:issue:`12887`)
516
542
517
543
.. _whatsnew_0181.deprecations:
518
544
@@ -578,7 +604,8 @@ Bug Fixes
578
604
- Bug in ``pd.crosstab()`` where would silently ignore ``aggfunc`` if ``values=None`` (:issue:`12569`).
579
605
- Potential segfault in ``DataFrame.to_json`` when serialising ``datetime.time`` (:issue:`11473`).
580
606
- Potential segfault in ``DataFrame.to_json`` when attempting to serialise 0d array (:issue:`11299`).
581
-
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values (:issue:`10778`).
607
+
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values; now supports serialization of ``category``, ``sparse``, and ``datetime64[ns, tz]`` dtypes (:issue:`10778`).
608
+
- Bug in ``DataFrame.to_json`` with unsupported dtype not passed to default handler (:issue:`12554`).
582
609
- Bug in ``.align`` not returning the sub-class (:issue:`12983`)
583
610
- Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`)
584
611
- Bug in ``ABCPanel`` in which ``Panel4D`` was not being considered as a valid instance of this generic type (:issue:`12810`)
@@ -587,33 +614,32 @@ Bug Fixes
587
614
- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)
588
615
589
616
- Bug in ``Timestamp.__repr__`` that caused ``pprint`` to fail in nested structures (:issue:`12622`)
590
-
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by Pandas. See :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
617
+
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by pandas. See the :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
591
618
- Bug in ``.quantile()`` with interpolation may coerce to ``float`` unexpectedly (:issue:`12772`)
592
-
- Bug in ``.quantile()`` with empty Series may return scalar rather than empty Series (:issue:`12772`)
619
+
- Bug in ``.quantile()`` with empty ``Series`` may return scalar rather than empty ``Series`` (:issue:`12772`)
593
620
594
621
595
622
- Bug in ``.loc`` with out-of-bounds in a large indexer would raise ``IndexError`` rather than ``KeyError`` (:issue:`12527`)
596
623
- Bug in resampling when using a ``TimedeltaIndex`` and ``.asfreq()``, would previously not include the final fencepost (:issue:`12926`)
597
-
- Bug in ``DataFrame.to_json`` with unsupported `dtype` not passed to default handler (:issue:`12554`).
598
624
599
625
- Bug in equality testing with a ``Categorical`` in a ``DataFrame`` (:issue:`12564`)
600
626
- Bug in ``GroupBy.first()``, ``.last()`` returns incorrect row when ``TimeGrouper`` is used (:issue:`7453`)
601
627
602
628
603
629
604
-
- Bug in ``read_csv`` with the C engine when specifying ``skiprows`` with newlines in quoted items (:issue:`10911`, :issue:`12775`)
630
+
- Bug in ``pd.read_csv()`` with the ``c`` engine when specifying ``skiprows`` with newlines in quoted items (:issue:`10911`, :issue:`12775`)
605
631
- Bug in ``DataFrame`` timezone lost when assigning tz-aware datetime ``Series`` with alignment (:issue:`12981`)
606
632
607
633
608
634
609
635
610
-
- Bug in ``value_counts`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
611
-
- Bug in ``Series.value_counts()`` loses name if its dtype is category (:issue:`12835`)
636
+
- Bug in ``.value_counts()`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
637
+
- Bug in ``Series.value_counts()`` loses name if its dtype is ``category`` (:issue:`12835`)
612
638
- Bug in ``Series.value_counts()`` loses timezone info (:issue:`12835`)
613
639
- Bug in ``Series.value_counts(normalize=True)`` with ``Categorical`` raises ``UnboundLocalError`` (:issue:`12835`)
614
640
- Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`)
615
-
- Bug in ``read_csv`` when specifying ``names``, ``usecols``, and ``parse_dates`` simultaneously with the C engine (:issue:`9755`)
616
-
- Bug in ``read_csv`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the C engine (:issue:`12912`)
641
+
- Bug in ``pd.read_csv()`` when specifying ``names``, ``usecols``, and ``parse_dates`` simultaneously with the ``c`` engine (:issue:`9755`)
642
+
- Bug in ``pd.read_csv()`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the ``c`` engine (:issue:`12912`)
617
643
- Bug in ``Series.rename``, ``DataFrame.rename`` and ``DataFrame.rename_axis`` not treating ``Series`` as mappings to relabel (:issue:`12623`).
618
644
- Clean in ``.rolling.min`` and ``.rolling.max`` to enhance dtype handling (:issue:`12373`)
619
645
- Bug in ``groupby`` where complex types are coerced to float (:issue:`12902`)
@@ -635,25 +661,25 @@ Bug Fixes
635
661
636
662
637
663
638
-
- Bug in ``concat`` raises ``AttributeError`` when input data contains tz-aware datetime and timedelta (:issue:`12620`)
639
-
- Bug in ``concat`` did not handle empty ``Series`` properly (:issue:`11082`)
664
+
- Bug in ``pd.concat`` raises ``AttributeError`` when input data contains tz-aware datetime and timedelta (:issue:`12620`)
665
+
- Bug in ``pd.concat`` did not handle empty ``Series`` properly (:issue:`11082`)
640
666
641
667
- Bug in ``.plot.bar`` alginment when ``width`` is specified with ``int`` (:issue:`12979`)
642
668
643
669
644
670
- Bug in ``fill_value`` is ignored if the argument to a binary operator is a constant (:issue:`12723`)
645
671
646
-
- Bug in ``pd.read_html`` when using bs4 flavor and parsing table with a header and only one column (:issue:`9178`)
672
+
- Bug in ``pd.read_html()`` when using bs4 flavor and parsing table with a header and only one column (:issue:`9178`)
647
673
648
-
- Bug in ``pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
649
-
- Bug in ``pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
650
-
- Bug in ``crosstab`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)
674
+
- Bug in ``.pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
675
+
- Bug in ``.pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
676
+
- Bug in ``pd.crosstab()`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)
651
677
652
678
- Bug in ``Series.name`` when ``name`` attribute can be a hashable type (:issue:`12610`)
653
679
654
680
- Bug in ``.describe()`` resets categorical columns information (:issue:`11558`)
655
681
- Bug where ``loffset`` argument was not applied when calling ``resample().count()`` on a timeseries (:issue:`12725`)
656
682
- ``pd.read_excel()`` now accepts column names associated with keyword argument ``names`` (:issue:`12870`)
657
-
- Bug in ``to_numeric`` with ``Index`` returns ``np.ndarray``, rather than ``Index`` (:issue:`12777`)
658
-
- Bug in ``to_numeric`` with datetime-like may raise ``TypeError`` (:issue:`12777`)
659
-
- Bug in ``to_numeric`` with scalar raises ``ValueError`` (:issue:`12777`)
683
+
- Bug in ``pd.to_numeric()`` with ``Index`` returns ``np.ndarray``, rather than ``Index`` (:issue:`12777`)
684
+
- Bug in ``pd.to_numeric()`` with datetime-like may raise ``TypeError`` (:issue:`12777`)
685
+
- Bug in ``pd.to_numeric()`` with scalar raises ``ValueError`` (:issue:`12777`)
0 commit comments