Skip to content

Commit 00d4189

Browse files
authored
PERF: faster _coerce_to_data_and_mask() for astype("Float64") (#60121)
* add fast path in _coerce_to_data_and_mask * update whatsnew * pre-commit
1 parent f9ae4cf commit 00d4189

File tree

2 files changed

+7
-0
lines changed

2 files changed

+7
-0
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,6 +592,7 @@ Performance improvements
592592
- Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`, :issue:`57752`)
593593
- Performance improvement in :func:`merge` if hash-join can be used (:issue:`57970`)
594594
- Performance improvement in :meth:`CategoricalDtype.update_dtype` when ``dtype`` is a :class:`CategoricalDtype` with non ``None`` categories and ordered (:issue:`59647`)
595+
- Performance improvement in :meth:`DataFrame.astype` when converting to extension floating dtypes, e.g. "Float64" (:issue:`60066`)
595596
- Performance improvement in :meth:`to_hdf` avoid unnecessary reopenings of the HDF5 file to speedup data addition to files with a very large number of groups . (:issue:`58248`)
596597
- Performance improvement in ``DataFrameGroupBy.__len__`` and ``SeriesGroupBy.__len__`` (:issue:`57595`)
597598
- Performance improvement in indexing operations for string dtypes (:issue:`56997`)

pandas/core/arrays/numeric.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,8 @@ def _coerce_to_data_and_mask(
174174
raise TypeError(f"{values.dtype} cannot be converted to {name}")
175175

176176
elif values.dtype.kind == "b" and checker(dtype):
177+
# fastpath
178+
mask = np.zeros(len(values), dtype=np.bool_)
177179
if not copy:
178180
values = np.asarray(values, dtype=default_dtype)
179181
else:
@@ -190,6 +192,10 @@ def _coerce_to_data_and_mask(
190192
if values.dtype.kind in "iu":
191193
# fastpath
192194
mask = np.zeros(len(values), dtype=np.bool_)
195+
elif values.dtype.kind == "f":
196+
# np.isnan is faster than is_numeric_na() for floats
197+
# github issue: #60066
198+
mask = np.isnan(values)
193199
else:
194200
mask = libmissing.is_numeric_na(values)
195201
else:

0 commit comments

Comments
 (0)