Skip to content

Commit 60dc3f5

Browse files
authored
BUG: to_numpy not respecting na_value before converting to array (#50506)
* BUG: to_numpy not respecting na_value before converting to array * Adjust whatsnew
1 parent a82f905 commit 60dc3f5

File tree

4 files changed

+51
-6
lines changed

4 files changed

+51
-6
lines changed

asv_bench/benchmarks/series_methods.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,4 +382,23 @@ def time_iter(self, dtype):
382382
pass
383383

384384

385+
class ToNumpy:
386+
def setup(self):
387+
N = 1_000_000
388+
self.ser = Series(
389+
np.random.randn(
390+
N,
391+
)
392+
)
393+
394+
def time_to_numpy(self):
395+
self.ser.to_numpy()
396+
397+
def time_to_numpy_double_copy(self):
398+
self.ser.to_numpy(dtype="float64", copy=True)
399+
400+
def time_to_numpy_copy(self):
401+
self.ser.to_numpy(copy=True)
402+
403+
385404
from .pandas_vb_common import setup # noqa: F401 isort:skip

doc/source/whatsnew/v2.0.0.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -780,6 +780,7 @@ Performance improvements
780780
- Performance improvement when iterating over pyarrow and nullable dtypes (:issue:`49825`, :issue:`49851`)
781781
- Performance improvements to :func:`read_sas` (:issue:`47403`, :issue:`47405`, :issue:`47656`, :issue:`48502`)
782782
- Memory improvement in :meth:`RangeIndex.sort_values` (:issue:`48801`)
783+
- Performance improvement in :meth:`Series.to_numpy` if ``copy=True`` by avoiding copying twice (:issue:`24345`)
783784
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``sort=False`` (:issue:`48976`)
784785
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``observed=False`` (:issue:`49596`)
785786
- Performance improvement in :func:`read_stata` with parameter ``index_col`` set to ``None`` (the default). Now the index will be a :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`49745`)
@@ -855,6 +856,7 @@ Conversion
855856
- Bug where any :class:`ExtensionDtype` subclass with ``kind="M"`` would be interpreted as a timezone type (:issue:`34986`)
856857
- Bug in :class:`.arrays.ArrowExtensionArray` that would raise ``NotImplementedError`` when passed a sequence of strings or binary (:issue:`49172`)
857858
- Bug in :meth:`Series.astype` raising ``pyarrow.ArrowInvalid`` when converting from a non-pyarrow string dtype to a pyarrow numeric type (:issue:`50430`)
859+
- Bug in :meth:`Series.to_numpy` converting to NumPy array before applying ``na_value`` (:issue:`48951`)
858860
- Bug in :func:`to_datetime` was not respecting ``exact`` argument when ``format`` was an ISO8601 format (:issue:`12649`)
859861
- Bug in :meth:`TimedeltaArray.astype` raising ``TypeError`` when converting to a pyarrow duration type (:issue:`49795`)
860862
-

pandas/core/base.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -531,12 +531,19 @@ def to_numpy(
531531
f"to_numpy() got an unexpected keyword argument '{bad_keys}'"
532532
)
533533

534-
result = np.asarray(self._values, dtype=dtype)
535-
# TODO(GH-24345): Avoid potential double copy
536-
if copy or na_value is not lib.no_default:
537-
result = result.copy()
538-
if na_value is not lib.no_default:
539-
result[np.asanyarray(self.isna())] = na_value
534+
if na_value is not lib.no_default:
535+
values = self._values.copy()
536+
values[np.asanyarray(self.isna())] = na_value
537+
else:
538+
values = self._values
539+
540+
result = np.asarray(values, dtype=dtype)
541+
542+
if copy and na_value is lib.no_default:
543+
if np.shares_memory(self._values[:2], result[:2]):
544+
# Take slices to improve performance of check
545+
result = result.copy()
546+
540547
return result
541548

542549
@final
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
import numpy as np
2+
import pytest
3+
4+
from pandas import (
5+
NA,
6+
Series,
7+
)
8+
import pandas._testing as tm
9+
10+
11+
@pytest.mark.parametrize("dtype", ["int64", "float64"])
12+
def test_to_numpy_na_value(dtype):
13+
# GH#48951
14+
ser = Series([1, 2, NA, 4])
15+
result = ser.to_numpy(dtype=dtype, na_value=0)
16+
expected = np.array([1, 2, 0, 4], dtype=dtype)
17+
tm.assert_numpy_array_equal(result, expected)

0 commit comments

Comments
 (0)