Skip to content

Commit f14a9e5

Browse files
committed
Merging master (pandas-dev#35498)
2 parents e083962 + 9a8152c commit f14a9e5

File tree

207 files changed

+2687
-1255
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

207 files changed

+2687
-1255
lines changed

.github/workflows/ci.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@ on:
44
push:
55
branches: master
66
pull_request:
7-
branches: master
7+
branches:
8+
- master
9+
- 1.1.x
810

911
env:
1012
ENV_FILE: environment.yml

asv_bench/benchmarks/frame_ctor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
from .pandas_vb_common import tm
77

88
try:
9-
from pandas.tseries.offsets import Nano, Hour
9+
from pandas.tseries.offsets import Hour, Nano
1010
except ImportError:
1111
# For compatibility with older versions
1212
from pandas.core.datetools import * # noqa

asv_bench/benchmarks/gil.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@
77

88
try:
99
from pandas import (
10-
rolling_median,
10+
rolling_kurt,
11+
rolling_max,
1112
rolling_mean,
13+
rolling_median,
1214
rolling_min,
13-
rolling_max,
14-
rolling_var,
1515
rolling_skew,
16-
rolling_kurt,
1716
rolling_std,
17+
rolling_var,
1818
)
1919

2020
have_rolling_methods = True

asv_bench/benchmarks/io/parsers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
try:
44
from pandas._libs.tslibs.parsing import (
5-
concat_date_cols,
65
_does_string_look_like_datetime,
6+
concat_date_cols,
77
)
88
except ImportError:
99
# Avoid whole benchmark suite import failure on asv (currently 0.4)

asv_bench/benchmarks/tslibs/normalize.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
try:
2-
from pandas._libs.tslibs import normalize_i8_timestamps, is_date_array_normalized
2+
from pandas._libs.tslibs import is_date_array_normalized, normalize_i8_timestamps
33
except ImportError:
44
from pandas._libs.tslibs.conversion import (
55
normalize_i8_timestamps,

azure-pipelines.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# Adapted from https://github.com/numba/numba/blob/master/azure-pipelines.yml
22
trigger:
33
- master
4+
- 1.1.x
45

56
pr:
67
- master
8+
- 1.1.x
79

810
variables:
911
PYTEST_WORKERS: auto

ci/code_checks.sh

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
121121

122122
# Imports - Check formatting using isort see setup.cfg for settings
123123
MSG='Check import format using isort' ; echo $MSG
124-
ISORT_CMD="isort --quiet --recursive --check-only pandas asv_bench scripts"
124+
ISORT_CMD="isort --quiet --check-only pandas asv_bench scripts"
125125
if [[ "$GITHUB_ACTIONS" == "true" ]]; then
126126
eval $ISORT_CMD | awk '{print "##[error]" $0}'; RET=$(($RET + ${PIPESTATUS[0]}))
127127
else
@@ -230,6 +230,11 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
230230
invgrep -R --include="*.py" -P '# type: (?!ignore)' pandas
231231
RET=$(($RET + $?)) ; echo $MSG "DONE"
232232

233+
# https://github.com/python/mypy/issues/7384
234+
# MSG='Check for missing error codes with # type: ignore' ; echo $MSG
235+
# invgrep -R --include="*.py" -P '# type: ignore(?!\[)' pandas
236+
# RET=$(($RET + $?)) ; echo $MSG "DONE"
237+
233238
MSG='Check for use of foo.__class__ instead of type(foo)' ; echo $MSG
234239
invgrep -R --include=*.{py,pyx} '\.__class__' pandas
235240
RET=$(($RET + $?)) ; echo $MSG "DONE"

ci/deps/azure-36-locale.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ dependencies:
77

88
# tools
99
- cython>=0.29.16
10-
- pytest>=5.0.1
10+
- pytest>=5.0.1,<6.0.0 # https://github.com/pandas-dev/pandas/issues/35620
1111
- pytest-xdist>=1.21
1212
- pytest-asyncio
1313
- hypothesis>=3.58.0

doc/source/development/contributing.rst

Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -153,14 +153,38 @@ to build the documentation locally before pushing your changes.
153153
Using a Docker container
154154
~~~~~~~~~~~~~~~~~~~~~~~~
155155

156-
Instead of manually setting up a development environment, you can use Docker to
157-
automatically create the environment with just several commands. Pandas provides a `DockerFile`
158-
in the root directory to build a Docker image with a full pandas development environment.
156+
Instead of manually setting up a development environment, you can use `Docker
157+
<https://docs.docker.com/get-docker/>`_ to automatically create the environment with just several
158+
commands. Pandas provides a `DockerFile` in the root directory to build a Docker image
159+
with a full pandas development environment.
159160

160-
Even easier, you can use the DockerFile to launch a remote session with Visual Studio Code,
161+
**Docker Commands**
162+
163+
Pass your GitHub username in the `DockerFile` to use your own fork::
164+
165+
# Build the image pandas-yourname-env
166+
docker build --tag pandas-yourname-env .
167+
# Run a container and bind your local forked repo, pandas-yourname, to the container
168+
docker run -it --rm -v path-to-pandas-yourname:/home/pandas-yourname pandas-yourname-env
169+
170+
Even easier, you can integrate Docker with the following IDEs:
171+
172+
**Visual Studio Code**
173+
174+
You can use the DockerFile to launch a remote session with Visual Studio Code,
161175
a popular free IDE, using the `.devcontainer.json` file.
162176
See https://code.visualstudio.com/docs/remote/containers for details.
163177

178+
**PyCharm (Professional)**
179+
180+
Enable Docker support and use the Services tool window to build and manage images as well as
181+
run and interact with containers.
182+
See https://www.jetbrains.com/help/pycharm/docker.html for details.
183+
184+
Note that you might need to rebuild the C extensions if/when you merge with upstream/master using::
185+
186+
python setup.py build_ext --inplace -j 4
187+
164188
.. _contributing.dev_c:
165189

166190
Installing a C compiler
@@ -751,7 +775,7 @@ Imports are alphabetically sorted within these sections.
751775

752776
As part of :ref:`Continuous Integration <contributing.ci>` checks we run::
753777

754-
isort --recursive --check-only pandas
778+
isort --check-only pandas
755779

756780
to check that imports are correctly formatted as per the `setup.cfg`.
757781

@@ -770,8 +794,6 @@ You should run::
770794

771795
to automatically format imports correctly. This will modify your local copy of the files.
772796

773-
The `--recursive` flag can be passed to sort all files in a directory.
774-
775797
Alternatively, you can run a command similar to what was suggested for ``black`` and ``flake8`` :ref:`right above <contributing.code-formatting>`::
776798

777799
git diff upstream/master --name-only -- "*.py" | xargs -r isort

doc/source/ecosystem.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,11 @@ ML pipeline.
8080

8181
Featuretools is a Python library for automated feature engineering built on top of pandas. It excels at transforming temporal and relational datasets into feature matrices for machine learning using reusable feature engineering "primitives". Users can contribute their own primitives in Python and share them with the rest of the community.
8282

83+
`Compose <https://github.com/FeatureLabs/compose>`__
84+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
85+
86+
Compose is a machine learning tool for labeling data and prediction engineering. It allows you to structure the labeling process by parameterizing prediction problems and transforming time-driven relational data into target values with cutoff times that can be used for supervised learning.
87+
8388
.. _ecosystem.visualization:
8489

8590
Visualization
@@ -445,6 +450,7 @@ Library Accessor Classes Description
445450
`pdvega`_ ``vgplot`` ``Series``, ``DataFrame`` Provides plotting functions from the Altair_ library.
446451
`pandas_path`_ ``path`` ``Index``, ``Series`` Provides `pathlib.Path`_ functions for Series.
447452
`pint-pandas`_ ``pint`` ``Series``, ``DataFrame`` Provides units support for numeric Series and DataFrames.
453+
`composeml`_ ``slice`` ``DataFrame`` Provides a generator for enhanced data slicing.
448454
=============== ========== ========================= ===============================================================
449455

450456
.. _cyberpandas: https://cyberpandas.readthedocs.io/en/latest
@@ -453,3 +459,4 @@ Library Accessor Classes Description
453459
.. _pandas_path: https://github.com/drivendataorg/pandas-path/
454460
.. _pathlib.Path: https://docs.python.org/3/library/pathlib.html
455461
.. _pint-pandas: https://github.com/hgrecco/pint-pandas
462+
.. _composeml: https://github.com/FeatureLabs/compose

doc/source/user_guide/dsintro.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -397,6 +397,32 @@ The result will be a DataFrame with the same index as the input Series, and
397397
with one column whose name is the original name of the Series (only if no other
398398
column name provided).
399399

400+
401+
.. _basics.dataframe.from_list_namedtuples:
402+
403+
From a list of namedtuples
404+
~~~~~~~~~~~~~~~~~~~~~~~~~~
405+
406+
The field names of the first ``namedtuple`` in the list determine the columns
407+
of the ``DataFrame``. The remaining namedtuples (or tuples) are simply unpacked
408+
and their values are fed into the rows of the ``DataFrame``. If any of those
409+
tuples is shorter than the first ``namedtuple`` then the later columns in the
410+
corresponding row are marked as missing values. If any are longer than the
411+
first ``namedtuple``, a ``ValueError`` is raised.
412+
413+
.. ipython:: python
414+
415+
from collections import namedtuple
416+
417+
Point = namedtuple('Point', 'x y')
418+
419+
pd.DataFrame([Point(0, 0), Point(0, 3), (2, 3)])
420+
421+
Point3D = namedtuple('Point3D', 'x y z')
422+
423+
pd.DataFrame([Point3D(0, 0, 0), Point3D(0, 3, 5), Point(2, 3)])
424+
425+
400426
.. _basics.dataframe.from_list_dataclasses:
401427

402428
From a list of dataclasses

doc/source/user_guide/indexing.rst

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1532,12 +1532,8 @@ Setting metadata
15321532
~~~~~~~~~~~~~~~~
15331533

15341534
Indexes are "mostly immutable", but it is possible to set and change their
1535-
metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and
1536-
``codes``).
1537-
1538-
You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_codes``
1539-
to set these attributes directly. They default to returning a copy; however,
1540-
you can specify ``inplace=True`` to have the data change in place.
1535+
``name`` attribute. You can use the ``rename``, ``set_names`` to set these attributes
1536+
directly, and they default to returning a copy.
15411537

15421538
See :ref:`Advanced Indexing <advanced>` for usage of MultiIndexes.
15431539

doc/source/user_guide/io.rst

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1064,6 +1064,23 @@ DD/MM/YYYY instead. For convenience, a ``dayfirst`` keyword is provided:
10641064
pd.read_csv('tmp.csv', parse_dates=[0])
10651065
pd.read_csv('tmp.csv', dayfirst=True, parse_dates=[0])
10661066
1067+
Writing CSVs to binary file objects
1068+
+++++++++++++++++++++++++++++++++++
1069+
1070+
.. versionadded:: 1.2.0
1071+
1072+
``df.to_csv(..., mode="w+b")`` allows writing a CSV to a file object
1073+
opened binary mode. For this to work, it is necessary that ``mode``
1074+
contains a "b":
1075+
1076+
.. ipython:: python
1077+
1078+
import io
1079+
1080+
data = pd.DataFrame([0, 1, 2])
1081+
buffer = io.BytesIO()
1082+
data.to_csv(buffer, mode="w+b", encoding="utf-8", compression="gzip")
1083+
10671084
.. _io.float_precision:
10681085

10691086
Specifying method for floating-point conversion
@@ -3441,10 +3458,11 @@ for some advanced strategies
34413458

34423459
.. warning::
34433460

3444-
pandas requires ``PyTables`` >= 3.0.0.
3445-
There is a indexing bug in ``PyTables`` < 3.2 which may appear when querying stores using an index.
3446-
If you see a subset of results being returned, upgrade to ``PyTables`` >= 3.2.
3447-
Stores created previously will need to be rewritten using the updated version.
3461+
Pandas uses PyTables for reading and writing HDF5 files, which allows
3462+
serializing object-dtype data with pickle. Loading pickled data received from
3463+
untrusted sources can be unsafe.
3464+
3465+
See: https://docs.python.org/3/library/pickle.html for more.
34483466

34493467
.. ipython:: python
34503468
:suppress:

doc/source/whatsnew/v0.22.0.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0220:
22

3-
v0.22.0 (December 29, 2017)
4-
---------------------------
3+
Version 0.22.0 (December 29, 2017)
4+
----------------------------------
55

66
{{ header }}
77

@@ -96,7 +96,7 @@ returning ``1`` instead.
9696
These changes affect :meth:`DataFrame.sum` and :meth:`DataFrame.prod` as well.
9797
Finally, a few less obvious places in pandas are affected by this change.
9898

99-
Grouping by a categorical
99+
Grouping by a Categorical
100100
^^^^^^^^^^^^^^^^^^^^^^^^^
101101

102102
Grouping by a ``Categorical`` and summing now returns ``0`` instead of

doc/source/whatsnew/v0.23.0.rst

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@ Please note that the string `index` is not supported with the round trip format,
8686
.. _whatsnew_0230.enhancements.assign_dependent:
8787

8888

89-
``.assign()`` accepts dependent arguments
90-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
89+
Method ``.assign()`` accepts dependent arguments
90+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9191

9292
The :func:`DataFrame.assign` now accepts dependent keyword arguments for python version later than 3.6 (see also `PEP 468
9393
<https://www.python.org/dev/peps/pep-0468/>`_). Later keyword arguments may now refer to earlier ones if the argument is a callable. See the
@@ -244,7 +244,7 @@ documentation. If you build an extension array, publicize it on our
244244

245245
.. _whatsnew_0230.enhancements.categorical_grouping:
246246

247-
New ``observed`` keyword for excluding unobserved categories in ``groupby``
247+
New ``observed`` keyword for excluding unobserved categories in ``GroupBy``
248248
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
249249

250250
Grouping by a categorical includes the unobserved categories in the output.
@@ -360,8 +360,8 @@ Fill all consecutive outside values in both directions
360360
361361
.. _whatsnew_0210.enhancements.get_dummies_dtype:
362362

363-
``get_dummies`` now supports ``dtype`` argument
364-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
363+
Function ``get_dummies`` now supports ``dtype`` argument
364+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
365365

366366
The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtype for the new columns. The default remains uint8. (:issue:`18330`)
367367

@@ -388,8 +388,8 @@ See the :ref:`documentation here <timedeltas.mod_divmod>`. (:issue:`19365`)
388388
389389
.. _whatsnew_0230.enhancements.ran_inf:
390390

391-
``.rank()`` handles ``inf`` values when ``NaN`` are present
392-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
391+
Method ``.rank()`` handles ``inf`` values when ``NaN`` are present
392+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
393393

394394
In previous versions, ``.rank()`` would assign ``inf`` elements ``NaN`` as their ranks. Now ranks are calculated properly. (:issue:`6945`)
395395

@@ -587,7 +587,7 @@ If installed, we now require:
587587

588588
.. _whatsnew_0230.api_breaking.dict_insertion_order:
589589

590-
Instantiation from dicts preserves dict insertion order for python 3.6+
590+
Instantiation from dicts preserves dict insertion order for Python 3.6+
591591
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
592592

593593
Until Python 3.6, dicts in Python had no formally defined ordering. For Python
@@ -1365,8 +1365,8 @@ MultiIndex
13651365
- Bug in indexing where nested indexers having only numpy arrays are handled incorrectly (:issue:`19686`)
13661366

13671367

1368-
I/O
1369-
^^^
1368+
IO
1369+
^^
13701370

13711371
- :func:`read_html` now rewinds seekable IO objects after parse failure, before attempting to parse with a new parser. If a parser errors and the object is non-seekable, an informative error is raised suggesting the use of a different parser (:issue:`17975`)
13721372
- :meth:`DataFrame.to_html` now has an option to add an id to the leading `<table>` tag (:issue:`8496`)
@@ -1403,7 +1403,7 @@ Plotting
14031403
- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`)
14041404

14051405

1406-
Groupby/resample/rolling
1406+
GroupBy/resample/rolling
14071407
^^^^^^^^^^^^^^^^^^^^^^^^
14081408

14091409
- Bug when grouping by a single column and aggregating with a class like ``list`` or ``tuple`` (:issue:`18079`)

0 commit comments

Comments
 (0)