diff --git a/doc/source/io.rst b/doc/source/io.rst index 2ec61f7f00bd8..d5bbddfeb7d37 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -3403,7 +3403,7 @@ writes ``data`` to the database in batches of 1000 rows at a time: data.to_sql('data_chunked', engine, chunksize=1000) SQL data types -"""""""""""""" +++++++++++++++ :func:`~pandas.DataFrame.to_sql` will try to map your data to an appropriate SQL data type based on the dtype of the data. When you have columns of dtype @@ -3801,7 +3801,7 @@ is lost when exporting. Labeled data can similarly be imported from *Stata* data files as ``Categorical`` variables using the keyword argument ``convert_categoricals`` (``True`` by default). The keyword argument ``order_categoricals`` (``True`` by default) determines - whether imported ``Categorical`` variables are ordered. +whether imported ``Categorical`` variables are ordered. .. note:: diff --git a/doc/source/release.rst b/doc/source/release.rst index 321947111574b..6d952344576e6 100644 --- a/doc/source/release.rst +++ b/doc/source/release.rst @@ -50,7 +50,9 @@ pandas 0.15.2 **Release date:** (December 12, 2014) -This is a minor release from 0.15.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. +This is a minor release from 0.15.1 and includes a large number of bug fixes +along with several new features, enhancements, and performance improvements. +A small number of API changes were necessary to fix existing bugs. See the :ref:`v0.15.2 Whatsnew ` overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.2. diff --git a/doc/source/whatsnew/v0.15.2.txt b/doc/source/whatsnew/v0.15.2.txt index 3a0fdbae5e297..02de919e3f83e 100644 --- a/doc/source/whatsnew/v0.15.2.txt +++ b/doc/source/whatsnew/v0.15.2.txt @@ -3,9 +3,10 @@ v0.15.2 (December 12, 2014) --------------------------- -This is a minor release from 0.15.1 and includes a small number of API changes, several new features, -enhancements, and performance improvements along with a large number of bug fixes. We recommend that all -users upgrade to this version. +This is a minor release from 0.15.1 and includes a large number of bug fixes +along with several new features, enhancements, and performance improvements. +A small number of API changes were necessary to fix existing bugs. +We recommend that all users upgrade to this version. - :ref:`Enhancements ` - :ref:`API Changes ` @@ -16,6 +17,7 @@ users upgrade to this version. API changes ~~~~~~~~~~~ + - Indexing in ``MultiIndex`` beyond lex-sort depth is now supported, though a lexically sorted index will have a better performance. (:issue:`2646`) @@ -38,24 +40,30 @@ API changes df2.index.lexsort_depth df2.loc[(1,'z')] -- Bug in concat of Series with ``category`` dtype which were coercing to ``object``. (:issue:`8641`) - - Bug in unique of Series with ``category`` dtype, which returned all categories regardless whether they were "used" or not (see :issue:`8559` for the discussion). + Previous behaviour was to return all categories: -- ``Series.all`` and ``Series.any`` now support the ``level`` and ``skipna`` parameters. ``Series.all``, ``Series.any``, ``Index.all``, and ``Index.any`` no longer support the ``out`` and ``keepdims`` parameters, which existed for compatibility with ndarray. Various index types no longer support the ``all`` and ``any`` aggregation functions and will now raise ``TypeError``. (:issue:`8302`): + .. code-block:: python - .. ipython:: python + In [3]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) - s = pd.Series([False, True, False], index=[0, 0, 1]) - s.any(level=0) + In [4]: cat + Out[4]: + [a, b, a] + Categories (3, object): [a < b < c] -- ``Panel`` now supports the ``all`` and ``any`` aggregation functions. (:issue:`8302`): + In [5]: cat.unique() + Out[5]: array(['a', 'b', 'c'], dtype=object) + + Now, only the categories that do effectively occur in the array are returned: .. ipython:: python - p = pd.Panel(np.random.rand(2, 5, 4) > 0.1) - p.all() + cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) + cat.unique() + +- ``Series.all`` and ``Series.any`` now support the ``level`` and ``skipna`` parameters. ``Series.all``, ``Series.any``, ``Index.all``, and ``Index.any`` no longer support the ``out`` and ``keepdims`` parameters, which existed for compatibility with ndarray. Various index types no longer support the ``all`` and ``any`` aggregation functions and will now raise ``TypeError``. (:issue:`8302`). - Allow equality comparisons of Series with a categorical dtype and object dtype; previously these would raise ``TypeError`` (:issue:`8938`) @@ -90,25 +98,70 @@ API changes - ``Timestamp('now')`` is now equivalent to ``Timestamp.now()`` in that it returns the local time rather than UTC. Also, ``Timestamp('today')`` is now equivalent to ``Timestamp.today()`` and both have ``tz`` as a possible argument. (:issue:`9000`) +- Fix negative step support for label-based slices (:issue:`8753`) + + Old behavior: + + .. code-block:: python + + In [1]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) + Out[1]: + a 0 + b 1 + c 2 + dtype: int64 + + In [2]: s.loc['c':'a':-1] + Out[2]: + c 2 + dtype: int64 + + New behavior: + + .. ipython:: python + + s = pd.Series(np.arange(3), ['a', 'b', 'c']) + s.loc['c':'a':-1] + + .. _whatsnew_0152.enhancements: Enhancements ~~~~~~~~~~~~ +``Categorical`` enhancements: + +- Added ability to export Categorical data to Stata (:issue:`8633`). See :ref:`here ` for limitations of categorical variables exported to Stata data files. +- Added flag ``order_categoricals`` to ``StataReader`` and ``read_stata`` to select whether to order imported categorical data (:issue:`8836`). See :ref:`here ` for more information on importing categorical variables from Stata data files. +- Added ability to export Categorical data to to/from HDF5 (:issue:`7621`). Queries work the same as if it was an object array. However, the ``category`` dtyped data is stored in a more efficient manner. See :ref:`here ` for an example and caveats w.r.t. prior versions of pandas. +- Added support for ``searchsorted()`` on `Categorical` class (:issue:`8420`). + +Other enhancements: + - Added the ability to specify the SQL type of columns when writing a DataFrame to a database (:issue:`8778`). For example, specifying to use the sqlalchemy ``String`` type instead of the default ``Text`` type for string columns: - .. code-block:: + .. code-block:: python from sqlalchemy.types import String data.to_sql('data_dtype', engine, dtype={'Col_1': String}) -- Added ability to export Categorical data to Stata (:issue:`8633`). See :ref:`here ` for limitations of categorical variables exported to Stata data files. -- Added flag ``order_categoricals`` to ``StataReader`` and ``read_stata`` to select whether to order imported categorical data (:issue:`8836`). See :ref:`here ` for more information on importing categorical variables from Stata data files. -- Added ability to export Categorical data to to/from HDF5 (:issue:`7621`). Queries work the same as if it was an object array. However, the ``category`` dtyped data is stored in a more efficient manner. See :ref:`here ` for an example and caveats w.r.t. prior versions of pandas. -- Added support for ``searchsorted()`` on `Categorical` class (:issue:`8420`). +- ``Series.all`` and ``Series.any`` now support the ``level`` and ``skipna`` parameters (:issue:`8302`): + + .. ipython:: python + + s = pd.Series([False, True, False], index=[0, 0, 1]) + s.any(level=0) + +- ``Panel`` now supports the ``all`` and ``any`` aggregation functions. (:issue:`8302`): + + .. ipython:: python + + p = pd.Panel(np.random.rand(2, 5, 4) > 0.1) + p.all() + - Added support for ``utcfromtimestamp()``, ``fromtimestamp()``, and ``combine()`` on `Timestamp` class (:issue:`5351`). - Added Google Analytics (`pandas.io.ga`) basic documentation (:issue:`8835`). See :ref:`here`. - ``Timedelta`` arithmetic returns ``NotImplemented`` in unknown cases, allowing extensions by custom classes (:issue:`8813`). @@ -122,19 +175,22 @@ Enhancements - Added ability to read table footers to read_html (:issue:`8552`) - ``to_sql`` now infers datatypes of non-NA values for columns that contain NA values and have dtype ``object`` (:issue:`8778`). + .. _whatsnew_0152.performance: Performance ~~~~~~~~~~~ -- Reduce memory usage when skiprows is an integer in read_csv (:issue:`8681`) +- Reduce memory usage when skiprows is an integer in read_csv (:issue:`8681`) - Performance boost for ``to_datetime`` conversions with a passed ``format=``, and the ``exact=False`` (:issue:`8904`) + .. _whatsnew_0152.bug_fixes: Bug Fixes ~~~~~~~~~ +- Bug in concat of Series with ``category`` dtype which were coercing to ``object``. (:issue:`8641`) - Bug in Timestamp-Timestamp not returning a Timedelta type and datelike-datelike ops with timezones (:issue:`8865`) - Made consistent a timezone mismatch exception (either tz operated with None or incompatible timezone), will now return ``TypeError`` rather than ``ValueError`` (a couple of edge cases only), (:issue:`8865`) - Bug in using a ``pd.Grouper(key=...)`` with no level/axis or level only (:issue:`8795`, :issue:`8866`) @@ -154,95 +210,32 @@ Bug Fixes - Bug in ``merge`` where ``how='left'`` and ``sort=False`` would not preserve left frame order (:issue:`7331`) - Bug in ``MultiIndex.reindex`` where reindexing at level would not reorder labels (:issue:`4088`) - Bug in certain operations with dateutil timezones, manifesting with dateutil 2.3 (:issue:`8639`) - -- Fix negative step support for label-based slices (:issue:`8753`) - - Old behavior: - - .. code-block:: python - - In [1]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) - Out[1]: - a 0 - b 1 - c 2 - dtype: int64 - - In [2]: s.loc['c':'a':-1] - Out[2]: - c 2 - dtype: int64 - - New behavior: - - .. ipython:: python - - s = pd.Series(np.arange(3), ['a', 'b', 'c']) - s.loc['c':'a':-1] - - Regression in DatetimeIndex iteration with a Fixed/Local offset timezone (:issue:`8890`) - Bug in ``to_datetime`` when parsing a nanoseconds using the ``%f`` format (:issue:`8989`) - ``io.data.Options`` now raises ``RemoteDataError`` when no expiry dates are available from Yahoo and when it receives no data from Yahoo (:issue:`8761`), (:issue:`8783`). - Fix: The font size was only set on x axis if vertical or the y axis if horizontal. (:issue:`8765`) - Fixed division by 0 when reading big csv files in python 3 (:issue:`8621`) - Bug in outputing a Multindex with ``to_html,index=False`` which would add an extra column (:issue:`8452`) - - - - - - - - Imported categorical variables from Stata files retain the ordinal information in the underlying data (:issue:`8836`). - - - - Defined ``.size`` attribute across ``NDFrame`` objects to provide compat with numpy >= 1.9.1; buggy with ``np.array_split`` (:issue:`8846`) - - - Skip testing of histogram plots for matplotlib <= 1.2 (:issue:`8648`). - - - - - - - Bug where ``get_data_google`` returned object dtypes (:issue:`3995`) - - Bug in ``DataFrame.stack(..., dropna=False)`` when the DataFrame's ``columns`` is a ``MultiIndex`` whose ``labels`` do not reference all its ``levels``. (:issue:`8844`) - - - Bug in that Option context applied on ``__enter__`` (:issue:`8514`) - - - Bug in resample that causes a ValueError when resampling across multiple days and the last offset is not calculated from the start of the range (:issue:`8683`) - - - - Bug where ``DataFrame.plot(kind='scatter')`` fails when checking if an np.array is in the DataFrame (:issue:`8852`) - - - - Bug in ``pd.infer_freq/DataFrame.inferred_freq`` that prevented proper sub-daily frequency inference when the index contained DST days (:issue:`8772`). - Bug where index name was still used when plotting a series with ``use_index=False`` (:issue:`8558`). - Bugs when trying to stack multiple columns, when some (or all) of the level names are numbers (:issue:`8584`). - Bug in ``MultiIndex`` where ``__contains__`` returns wrong result if index is not lexically sorted or unique (:issue:`7724`) - BUG CSV: fix problem with trailing whitespace in skipped rows, (:issue:`8679`), (:issue:`8661`), (:issue:`8983`) - Regression in ``Timestamp`` does not parse 'Z' zone designator for UTC (:issue:`8771`) - - - - - - - Bug in `StataWriter` the produces writes strings with 244 characters irrespective of actual size (:issue:`8969`) - - - Fixed ValueError raised by cummin/cummax when datetime64 Series contains NaT. (:issue:`8965`) - Bug in Datareader returns object dtype if there are missing values (:issue:`8980`) - Bug in plotting if sharex was enabled and index was a timeseries, would show labels on multiple axes (:issue:`3964`). - - Bug where passing a unit to the TimedeltaIndex constructor applied the to nano-second conversion twice. (:issue:`9011`). - Bug in plotting of a period-like array (:issue:`9012`) +