Skip to content

Commit ae527c7

Browse files
authored
Merge branch 'master' into dict_keys_as_csv_names
2 parents 89c9d51 + 5782dc0 commit ae527c7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+878
-568
lines changed

doc/source/development/contributing.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1365,16 +1365,16 @@ environments. If you want to use virtualenv instead, write::
13651365
The ``-E virtualenv`` option should be added to all ``asv`` commands
13661366
that run benchmarks. The default value is defined in ``asv.conf.json``.
13671367

1368-
Running the full test suite can take up to one hour and use up to 3GB of RAM.
1369-
Usually it is sufficient to paste only a subset of the results into the pull
1370-
request to show that the committed changes do not cause unexpected performance
1371-
regressions. You can run specific benchmarks using the ``-b`` flag, which
1372-
takes a regular expression. For example, this will only run tests from a
1373-
``pandas/asv_bench/benchmarks/groupby.py`` file::
1368+
Running the full benchmark suite can be an all-day process, depending on your
1369+
hardware and its resource utilization. However, usually it is sufficient to paste
1370+
only a subset of the results into the pull request to show that the committed changes
1371+
do not cause unexpected performance regressions. You can run specific benchmarks
1372+
using the ``-b`` flag, which takes a regular expression. For example, this will
1373+
only run benchmarks from a ``pandas/asv_bench/benchmarks/groupby.py`` file::
13741374

13751375
asv continuous -f 1.1 upstream/master HEAD -b ^groupby
13761376

1377-
If you want to only run a specific group of tests from a file, you can do it
1377+
If you want to only run a specific group of benchmarks from a file, you can do it
13781378
using ``.`` as a separator. For example::
13791379

13801380
asv continuous -f 1.1 upstream/master HEAD -b groupby.GroupByMethods

doc/source/user_guide/indexing.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -933,6 +933,24 @@ and :ref:`Advanced Indexing <advanced>` you may select along more than one axis
933933
934934
df2.loc[criterion & (df2['b'] == 'x'), 'b':'c']
935935
936+
.. warning::
937+
938+
``iloc`` supports two kinds of boolean indexing. If the indexer is a boolean ``Series``,
939+
an error will be raised. For instance, in the following example, ``df.iloc[s.values, 1]`` is ok.
940+
The boolean indexer is an array. But ``df.iloc[s, 1]`` would raise ``ValueError``.
941+
942+
.. ipython:: python
943+
944+
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]],
945+
index=list('abc'),
946+
columns=['A', 'B'])
947+
s = (df['A'] > 2)
948+
s
949+
950+
df.loc[s, 'B']
951+
952+
df.iloc[s.values, 1]
953+
936954
.. _indexing.basics.indexing_isin:
937955

938956
Indexing with isin

doc/source/user_guide/merging.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,14 @@ functionality below.
154154
frames = [ process_your_file(f) for f in files ]
155155
result = pd.concat(frames)
156156

157+
.. note::
158+
159+
When concatenating DataFrames with named axes, pandas will attempt to preserve
160+
these index/column names whenever possible. In the case where all inputs share a
161+
common name, this name will be assigned to the result. When the input names do
162+
not all agree, the result will be unnamed. The same is true for :class:`MultiIndex`,
163+
but the logic is applied separately on a level-by-level basis.
164+
157165

158166
Set logic on the other axes
159167
~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/whatsnew/v1.2.0.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,26 @@ Alternatively, you can also use the dtype object:
157157
behaviour or API may still change without warning. Expecially the behaviour
158158
regarding NaN (distinct from NA missing values) is subject to change.
159159

160+
.. _whatsnew_120.index_name_preservation:
161+
162+
Index/column name preservation when aggregating
163+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
164+
165+
When aggregating using :meth:`concat` or the :class:`DataFrame` constructor, Pandas
166+
will attempt to preserve index (and column) names whenever possible (:issue:`35847`).
167+
In the case where all inputs share a common name, this name will be assigned to the
168+
result. When the input names do not all agree, the result will be unnamed. Here is an
169+
example where the index name is preserved:
170+
171+
.. ipython:: python
172+
173+
idx = pd.Index(range(5), name='abc')
174+
ser = pd.Series(range(5, 10), index=idx)
175+
pd.concat({'x': ser[1:], 'y': ser[:-1]}, axis=1)
176+
177+
The same is true for :class:`MultiIndex`, but the logic is applied separately on a
178+
level-by-level basis.
179+
160180
.. _whatsnew_120.enhancements.other:
161181

162182
Other enhancements
@@ -269,6 +289,7 @@ Deprecations
269289
- Deprecated :meth:`Index.is_all_dates` (:issue:`27744`)
270290
- Deprecated automatic alignment on comparison operations between :class:`DataFrame` and :class:`Series`, do ``frame, ser = frame.align(ser, axis=1, copy=False)`` before e.g. ``frame == ser`` (:issue:`28759`)
271291
- :meth:`Rolling.count` with ``min_periods=None`` will default to the size of the window in a future version (:issue:`31302`)
292+
- Deprecated slice-indexing on timezone-aware :class:`DatetimeIndex` with naive ``datetime`` objects, to match scalar indexing behavior (:issue:`36148`)
272293
- :meth:`Index.ravel` returning a ``np.ndarray`` is deprecated, in the future this will return a view on the same index (:issue:`19956`)
273294

274295
.. ---------------------------------------------------------------------------
@@ -337,6 +358,7 @@ Numeric
337358
- Bug in :class:`Series` where two :class:`Series` each have a :class:`DatetimeIndex` with different timezones having those indexes incorrectly changed when performing arithmetic operations (:issue:`33671`)
338359
- Bug in :meth:`pd._testing.assert_almost_equal` was incorrect for complex numeric types (:issue:`28235`)
339360
- Bug in :meth:`DataFrame.__rmatmul__` error handling reporting transposed shapes (:issue:`21581`)
361+
- Bug in :class:`Series` flex arithmetic methods where the result when operating with a ``list``, ``tuple`` or ``np.ndarray`` would have an incorrect name (:issue:`36760`)
340362
- Bug in :class:`IntegerArray` multiplication with ``timedelta`` and ``np.timedelta64`` objects (:issue:`36870`)
341363

342364
Conversion
@@ -394,6 +416,7 @@ I/O
394416
- Bug in :meth:`read_csv` with ``engine='python'`` truncating data if multiple items present in first row and first element started with BOM (:issue:`36343`)
395417
- Removed ``private_key`` and ``verbose`` from :func:`read_gbq` as they are no longer supported in ``pandas-gbq`` (:issue:`34654`, :issue:`30200`)
396418
- Bumped minimum pytables version to 3.5.1 to avoid a ``ValueError`` in :meth:`read_hdf` (:issue:`24839`)
419+
- Bug in :func:`read_table` and :func:`read_csv` when ``delim_whitespace=True`` and ``sep=default`` (:issue:`36583`)
397420
- Bug in :meth:`read_parquet` with fixed offset timezones. String representation of timezones was not recognized (:issue:`35997`, :issue:`36004`)
398421

399422
Plotting

pandas/conftest.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -677,6 +677,43 @@ def all_arithmetic_operators(request):
677677
return request.param
678678

679679

680+
@pytest.fixture(
681+
params=[
682+
operator.add,
683+
ops.radd,
684+
operator.sub,
685+
ops.rsub,
686+
operator.mul,
687+
ops.rmul,
688+
operator.truediv,
689+
ops.rtruediv,
690+
operator.floordiv,
691+
ops.rfloordiv,
692+
operator.mod,
693+
ops.rmod,
694+
operator.pow,
695+
ops.rpow,
696+
operator.eq,
697+
operator.ne,
698+
operator.lt,
699+
operator.le,
700+
operator.gt,
701+
operator.ge,
702+
operator.and_,
703+
ops.rand_,
704+
operator.xor,
705+
ops.rxor,
706+
operator.or_,
707+
ops.ror_,
708+
]
709+
)
710+
def all_binary_operators(request):
711+
"""
712+
Fixture for operator and roperator arithmetic, comparison, and logical ops.
713+
"""
714+
return request.param
715+
716+
680717
@pytest.fixture(
681718
params=[
682719
operator.add,

pandas/core/arraylike.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
"""
2+
Methods that can be shared by many array-like classes or subclasses:
3+
Series
4+
Index
5+
ExtensionArray
6+
"""
7+
import operator
8+
9+
from pandas.errors import AbstractMethodError
10+
11+
from pandas.core.ops.common import unpack_zerodim_and_defer
12+
13+
14+
class OpsMixin:
15+
# -------------------------------------------------------------
16+
# Comparisons
17+
18+
def _cmp_method(self, other, op):
19+
raise AbstractMethodError(self)
20+
21+
@unpack_zerodim_and_defer("__eq__")
22+
def __eq__(self, other):
23+
return self._cmp_method(other, operator.eq)
24+
25+
@unpack_zerodim_and_defer("__ne__")
26+
def __ne__(self, other):
27+
return self._cmp_method(other, operator.ne)
28+
29+
@unpack_zerodim_and_defer("__lt__")
30+
def __lt__(self, other):
31+
return self._cmp_method(other, operator.lt)
32+
33+
@unpack_zerodim_and_defer("__le__")
34+
def __le__(self, other):
35+
return self._cmp_method(other, operator.le)
36+
37+
@unpack_zerodim_and_defer("__gt__")
38+
def __gt__(self, other):
39+
return self._cmp_method(other, operator.gt)
40+
41+
@unpack_zerodim_and_defer("__ge__")
42+
def __ge__(self, other):
43+
return self._cmp_method(other, operator.ge)

pandas/core/arrays/boolean.py

Lines changed: 33 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from pandas.core.dtypes.missing import isna
2424

2525
from pandas.core import ops
26+
from pandas.core.arraylike import OpsMixin
2627

2728
from .masked import BaseMaskedArray, BaseMaskedDtype
2829

@@ -202,7 +203,7 @@ def coerce_to_array(
202203
return values, mask
203204

204205

205-
class BooleanArray(BaseMaskedArray):
206+
class BooleanArray(OpsMixin, BaseMaskedArray):
206207
"""
207208
Array of boolean (True/False) data with missing values.
208209
@@ -603,52 +604,44 @@ def logical_method(self, other):
603604
name = f"__{op.__name__}__"
604605
return set_function_name(logical_method, name, cls)
605606

606-
@classmethod
607-
def _create_comparison_method(cls, op):
608-
@ops.unpack_zerodim_and_defer(op.__name__)
609-
def cmp_method(self, other):
610-
from pandas.arrays import FloatingArray, IntegerArray
607+
def _cmp_method(self, other, op):
608+
from pandas.arrays import FloatingArray, IntegerArray
611609

612-
if isinstance(other, (IntegerArray, FloatingArray)):
613-
return NotImplemented
610+
if isinstance(other, (IntegerArray, FloatingArray)):
611+
return NotImplemented
614612

615-
mask = None
613+
mask = None
616614

617-
if isinstance(other, BooleanArray):
618-
other, mask = other._data, other._mask
615+
if isinstance(other, BooleanArray):
616+
other, mask = other._data, other._mask
619617

620-
elif is_list_like(other):
621-
other = np.asarray(other)
622-
if other.ndim > 1:
623-
raise NotImplementedError(
624-
"can only perform ops with 1-d structures"
625-
)
626-
if len(self) != len(other):
627-
raise ValueError("Lengths must match to compare")
618+
elif is_list_like(other):
619+
other = np.asarray(other)
620+
if other.ndim > 1:
621+
raise NotImplementedError("can only perform ops with 1-d structures")
622+
if len(self) != len(other):
623+
raise ValueError("Lengths must match to compare")
628624

629-
if other is libmissing.NA:
630-
# numpy does not handle pd.NA well as "other" scalar (it returns
631-
# a scalar False instead of an array)
632-
result = np.zeros_like(self._data)
633-
mask = np.ones_like(self._data)
634-
else:
635-
# numpy will show a DeprecationWarning on invalid elementwise
636-
# comparisons, this will raise in the future
637-
with warnings.catch_warnings():
638-
warnings.filterwarnings("ignore", "elementwise", FutureWarning)
639-
with np.errstate(all="ignore"):
640-
result = op(self._data, other)
641-
642-
# nans propagate
643-
if mask is None:
644-
mask = self._mask.copy()
645-
else:
646-
mask = self._mask | mask
625+
if other is libmissing.NA:
626+
# numpy does not handle pd.NA well as "other" scalar (it returns
627+
# a scalar False instead of an array)
628+
result = np.zeros_like(self._data)
629+
mask = np.ones_like(self._data)
630+
else:
631+
# numpy will show a DeprecationWarning on invalid elementwise
632+
# comparisons, this will raise in the future
633+
with warnings.catch_warnings():
634+
warnings.filterwarnings("ignore", "elementwise", FutureWarning)
635+
with np.errstate(all="ignore"):
636+
result = op(self._data, other)
647637

648-
return BooleanArray(result, mask, copy=False)
638+
# nans propagate
639+
if mask is None:
640+
mask = self._mask.copy()
641+
else:
642+
mask = self._mask | mask
649643

650-
name = f"__{op.__name__}"
651-
return set_function_name(cmp_method, name, cls)
644+
return BooleanArray(result, mask, copy=False)
652645

653646
def _reduce(self, name: str, skipna: bool = True, **kwargs):
654647

@@ -741,5 +734,4 @@ def boolean_arithmetic_method(self, other):
741734

742735

743736
BooleanArray._add_logical_ops()
744-
BooleanArray._add_comparison_ops()
745737
BooleanArray._add_arithmetic_ops()

0 commit comments

Comments
 (0)