Skip to content

Commit ab76540

Browse files
authored
Remove use_nullable_dtypes and add dtype_backend keyword (#51853)
1 parent 7888cf4 commit ab76540

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+513
-899
lines changed

doc/source/user_guide/io.rst

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -170,12 +170,15 @@ dtype : Type name or dict of column -> type, default ``None``
170170
the default determines the dtype of the columns which are not explicitly
171171
listed.
172172

173-
use_nullable_dtypes : bool = False
174-
Whether or not to use nullable dtypes as default when reading data. If
175-
set to True, nullable dtypes are used for all dtypes that have a nullable
176-
implementation, even if no nulls are present.
173+
dtype_backend : {"numpy_nullable", "pyarrow"}, defaults to NumPy backed DataFrames
174+
Which dtype_backend to use, e.g. whether a DataFrame should have NumPy
175+
arrays, nullable dtypes are used for all dtypes that have a nullable
176+
implementation when "numpy_nullable" is set, pyarrow is used for all
177+
dtypes if "pyarrow" is set.
177178

178-
.. versionadded:: 2.0
179+
The dtype_backends are still experimential.
180+
181+
.. versionadded:: 2.0
179182

180183
engine : {``'c'``, ``'python'``, ``'pyarrow'``}
181184
Parser engine to use. The C and pyarrow engines are faster, while the python engine
@@ -475,7 +478,7 @@ worth trying.
475478
476479
os.remove("foo.csv")
477480
478-
Setting ``use_nullable_dtypes=True`` will result in nullable dtypes for every column.
481+
Setting ``dtype_backend="numpy_nullable"`` will result in nullable dtypes for every column.
479482

480483
.. ipython:: python
481484
@@ -484,7 +487,7 @@ Setting ``use_nullable_dtypes=True`` will result in nullable dtypes for every co
484487
3,4.5,False,b,6,7.5,True,a,12-31-2019,
485488
"""
486489
487-
df = pd.read_csv(StringIO(data), use_nullable_dtypes=True, parse_dates=["i"])
490+
df = pd.read_csv(StringIO(data), dtype_backend="numpy_nullable", parse_dates=["i"])
488491
df
489492
df.dtypes
490493

doc/source/user_guide/pyarrow.rst

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,8 @@ functions provide an ``engine`` keyword that can dispatch to PyArrow to accelera
145145
df
146146
147147
By default, these functions and all other IO reader functions return NumPy-backed data. These readers can return
148-
PyArrow-backed data by specifying the parameter ``use_nullable_dtypes=True`` **and** the global configuration option ``"mode.dtype_backend"``
149-
set to ``"pyarrow"``. A reader does not need to set ``engine="pyarrow"`` to necessarily return PyArrow-backed data.
148+
PyArrow-backed data by specifying the parameter ``dtype_backend="pyarrow"``. A reader does not need to set
149+
``engine="pyarrow"`` to necessarily return PyArrow-backed data.
150150

151151
.. ipython:: python
152152
@@ -155,20 +155,10 @@ set to ``"pyarrow"``. A reader does not need to set ``engine="pyarrow"`` to nece
155155
1,2.5,True,a,,,,,
156156
3,4.5,False,b,6,7.5,True,a,
157157
""")
158-
with pd.option_context("mode.dtype_backend", "pyarrow"):
159-
df_pyarrow = pd.read_csv(data, use_nullable_dtypes=True)
158+
df_pyarrow = pd.read_csv(data, dtype_backend="pyarrow")
160159
df_pyarrow.dtypes
161160
162-
To simplify specifying ``use_nullable_dtypes=True`` in several functions, you can set a global option ``nullable_dtypes``
163-
to ``True``. You will still need to set the global configuration option ``"mode.dtype_backend"`` to ``pyarrow``.
164-
165-
.. code-block:: ipython
166-
167-
In [1]: pd.set_option("mode.dtype_backend", "pyarrow")
168-
169-
In [2]: pd.options.mode.nullable_dtypes = True
170-
171-
Several non-IO reader functions can also use the ``"mode.dtype_backend"`` option to return PyArrow-backed data including:
161+
Several non-IO reader functions can also use the ``dtype_backend`` argument to return PyArrow-backed data including:
172162

173163
* :func:`to_numeric`
174164
* :meth:`DataFrame.convert_dtypes`

doc/source/whatsnew/v2.0.0.rst

Lines changed: 12 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -103,12 +103,12 @@ Below is a possibly non-exhaustive list of changes:
103103
pd.Index([1, 2, 3], dtype=np.float16)
104104
105105
106-
.. _whatsnew_200.enhancements.io_use_nullable_dtypes_and_dtype_backend:
106+
.. _whatsnew_200.enhancements.io_dtype_backend:
107107

108-
Configuration option, ``mode.dtype_backend``, to return pyarrow-backed dtypes
109-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
108+
Argument ``dtype_backend``, to return pyarrow-backed or numpy-backed nullable dtypes
109+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
110110

111-
The ``use_nullable_dtypes`` keyword argument has been expanded to the following functions to enable automatic conversion to nullable dtypes (:issue:`36712`)
111+
The following functions gained a new keyword ``dtype_backend`` (:issue:`36712`)
112112

113113
* :func:`read_csv`
114114
* :func:`read_clipboard`
@@ -124,19 +124,13 @@ The ``use_nullable_dtypes`` keyword argument has been expanded to the following
124124
* :func:`read_feather`
125125
* :func:`read_spss`
126126
* :func:`to_numeric`
127+
* :meth:`DataFrame.convert_dtypes`
128+
* :meth:`Series.convert_dtypes`
127129

128-
To simplify opting-in to nullable dtypes for these functions, a new option ``nullable_dtypes`` was added that allows setting
129-
the keyword argument globally to ``True`` if not specified directly. The option can be enabled
130-
through:
131-
132-
.. ipython:: python
133-
134-
pd.options.mode.nullable_dtypes = True
135-
136-
The option will only work for functions with the keyword ``use_nullable_dtypes``.
130+
When this option is set to ``"numpy_nullable"`` it will return a :class:`DataFrame` that is
131+
backed by nullable dtypes.
137132

138-
Additionally a new global configuration, ``mode.dtype_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
139-
to select the nullable dtypes implementation.
133+
When this keyword is set to ``"pyarrow"``, then these functions will return pyarrow-backed nullable :class:`ArrowDtype` DataFrames (:issue:`48957`, :issue:`49997`):
140134

141135
* :func:`read_csv`
142136
* :func:`read_clipboard`
@@ -153,30 +147,21 @@ to select the nullable dtypes implementation.
153147
* :func:`read_feather`
154148
* :func:`read_spss`
155149
* :func:`to_numeric`
156-
157-
158-
And the following methods will also utilize the ``mode.dtype_backend`` option.
159-
160150
* :meth:`DataFrame.convert_dtypes`
161151
* :meth:`Series.convert_dtypes`
162152

163-
By default, ``mode.dtype_backend`` is set to ``"pandas"`` to return existing, numpy-backed nullable dtypes, but it can also
164-
be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (:issue:`48957`, :issue:`49997`).
165-
166153
.. ipython:: python
167154
168155
import io
169156
data = io.StringIO("""a,b,c,d,e,f,g,h,i
170157
1,2.5,True,a,,,,,
171158
3,4.5,False,b,6,7.5,True,a,
172159
""")
173-
with pd.option_context("mode.dtype_backend", "pandas"):
174-
df = pd.read_csv(data, use_nullable_dtypes=True)
160+
df = pd.read_csv(data, dtype_backend="pyarrow")
175161
df.dtypes
176162
177163
data.seek(0)
178-
with pd.option_context("mode.dtype_backend", "pyarrow"):
179-
df_pyarrow = pd.read_csv(data, use_nullable_dtypes=True, engine="pyarrow")
164+
df_pyarrow = pd.read_csv(data, dtype_backend="pyarrow", engine="pyarrow")
180165
df_pyarrow.dtypes
181166
182167
Copy-on-Write improvements
@@ -810,6 +795,7 @@ Deprecations
810795
- Deprecated :meth:`Grouper.obj`, use :meth:`Groupby.obj` instead (:issue:`51206`)
811796
- Deprecated :meth:`Grouper.indexer`, use :meth:`Resampler.indexer` instead (:issue:`51206`)
812797
- Deprecated :meth:`Grouper.ax`, use :meth:`Resampler.ax` instead (:issue:`51206`)
798+
- Deprecated keyword ``use_nullable_dtypes`` in :func:`read_parquet`, use ``dtype_backend`` instead (:issue:`51853`)
813799
- Deprecated :meth:`Series.pad` in favor of :meth:`Series.ffill` (:issue:`33396`)
814800
- Deprecated :meth:`Series.backfill` in favor of :meth:`Series.bfill` (:issue:`33396`)
815801
- Deprecated :meth:`DataFrame.pad` in favor of :meth:`DataFrame.ffill` (:issue:`33396`)

pandas/_libs/parsers.pyi

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,5 +72,5 @@ class TextReader:
7272
na_values: dict
7373

7474
def _maybe_upcast(
75-
arr, use_nullable_dtypes: bool = ..., dtype_backend: str = ...
75+
arr, use_dtype_backend: bool = ..., dtype_backend: str = ...
7676
) -> np.ndarray: ...

pandas/_libs/parsers.pyx

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -372,7 +372,6 @@ cdef class TextReader:
372372
object index_col
373373
object skiprows
374374
object dtype
375-
bint use_nullable_dtypes
376375
object usecols
377376
set unnamed_cols # set[str]
378377
str dtype_backend
@@ -412,8 +411,7 @@ cdef class TextReader:
412411
float_precision=None,
413412
bint skip_blank_lines=True,
414413
encoding_errors=b"strict",
415-
use_nullable_dtypes=False,
416-
dtype_backend="pandas"):
414+
dtype_backend="numpy"):
417415

418416
# set encoding for native Python and C library
419417
if isinstance(encoding_errors, str):
@@ -534,7 +532,6 @@ cdef class TextReader:
534532
# - DtypeObj
535533
# - dict[Any, DtypeObj]
536534
self.dtype = dtype
537-
self.use_nullable_dtypes = use_nullable_dtypes
538535
self.dtype_backend = dtype_backend
539536

540537
self.noconvert = set()
@@ -961,7 +958,6 @@ cdef class TextReader:
961958
bint na_filter = 0
962959
int64_t num_cols
963960
dict results
964-
bint use_nullable_dtypes
965961

966962
start = self.parser_start
967963

@@ -1082,12 +1078,12 @@ cdef class TextReader:
10821078
# don't try to upcast EAs
10831079
if (
10841080
na_count > 0 and not is_extension_array_dtype(col_dtype)
1085-
or self.use_nullable_dtypes
1081+
or self.dtype_backend != "numpy"
10861082
):
1087-
use_nullable_dtypes = self.use_nullable_dtypes and col_dtype is None
1083+
use_dtype_backend = self.dtype_backend != "numpy" and col_dtype is None
10881084
col_res = _maybe_upcast(
10891085
col_res,
1090-
use_nullable_dtypes=use_nullable_dtypes,
1086+
use_dtype_backend=use_dtype_backend,
10911087
dtype_backend=self.dtype_backend,
10921088
)
10931089

@@ -1422,11 +1418,11 @@ _NA_VALUES = _ensure_encoded(list(STR_NA_VALUES))
14221418

14231419

14241420
def _maybe_upcast(
1425-
arr, use_nullable_dtypes: bool = False, dtype_backend: str = "pandas"
1421+
arr, use_dtype_backend: bool = False, dtype_backend: str = "numpy"
14261422
):
14271423
"""Sets nullable dtypes or upcasts if nans are present.
14281424
1429-
Upcast, if use_nullable_dtypes is false and nans are present so that the
1425+
Upcast, if use_dtype_backend is false and nans are present so that the
14301426
current dtype can not hold the na value. We use nullable dtypes if the
14311427
flag is true for every array.
14321428
@@ -1435,7 +1431,7 @@ def _maybe_upcast(
14351431
arr: ndarray
14361432
Numpy array that is potentially being upcast.
14371433
1438-
use_nullable_dtypes: bool, default False
1434+
use_dtype_backend: bool, default False
14391435
If true, we cast to the associated nullable dtypes.
14401436
14411437
Returns
@@ -1452,7 +1448,7 @@ def _maybe_upcast(
14521448
if issubclass(arr.dtype.type, np.integer):
14531449
mask = arr == na_value
14541450

1455-
if use_nullable_dtypes:
1451+
if use_dtype_backend:
14561452
arr = IntegerArray(arr, mask)
14571453
else:
14581454
arr = arr.astype(float)
@@ -1461,22 +1457,22 @@ def _maybe_upcast(
14611457
elif arr.dtype == np.bool_:
14621458
mask = arr.view(np.uint8) == na_value
14631459

1464-
if use_nullable_dtypes:
1460+
if use_dtype_backend:
14651461
arr = BooleanArray(arr, mask)
14661462
else:
14671463
arr = arr.astype(object)
14681464
np.putmask(arr, mask, np.nan)
14691465

14701466
elif issubclass(arr.dtype.type, float) or arr.dtype.type == np.float32:
1471-
if use_nullable_dtypes:
1467+
if use_dtype_backend:
14721468
mask = np.isnan(arr)
14731469
arr = FloatingArray(arr, mask)
14741470

14751471
elif arr.dtype == np.object_:
1476-
if use_nullable_dtypes:
1472+
if use_dtype_backend:
14771473
arr = StringDtype().construct_array_type()._from_sequence(arr)
14781474

1479-
if use_nullable_dtypes and dtype_backend == "pyarrow":
1475+
if use_dtype_backend and dtype_backend == "pyarrow":
14801476
import pyarrow as pa
14811477
if isinstance(arr, IntegerArray) and arr.isna().all():
14821478
# use null instead of int64 in pyarrow

pandas/_typing.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -380,3 +380,4 @@ def closed(self) -> bool:
380380
Literal["pearson", "kendall", "spearman"], Callable[[np.ndarray, np.ndarray], float]
381381
]
382382
AlignJoin = Literal["outer", "inner", "left", "right"]
383+
DtypeBackend = Literal["pyarrow", "numpy_nullable"]

pandas/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1274,7 +1274,7 @@ def string_storage(request):
12741274

12751275
@pytest.fixture(
12761276
params=[
1277-
"pandas",
1277+
"numpy_nullable",
12781278
pytest.param("pyarrow", marks=td.skip_if_no("pyarrow")),
12791279
]
12801280
)

pandas/core/arrays/numeric.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ def _from_sequence_of_strings(
286286
) -> T:
287287
from pandas.core.tools.numeric import to_numeric
288288

289-
scalars = to_numeric(strings, errors="raise", use_nullable_dtypes=True)
289+
scalars = to_numeric(strings, errors="raise", dtype_backend="numpy_nullable")
290290
return cls._from_sequence(scalars, dtype=dtype, copy=copy)
291291

292292
_HANDLED_TYPES = (np.ndarray, numbers.Number)

pandas/core/config_init.py

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -487,41 +487,13 @@ def use_inf_as_na_cb(key) -> None:
487487
The default storage for StringDtype.
488488
"""
489489

490-
dtype_backend_doc = """
491-
: string
492-
The nullable dtype implementation to return. Only applicable to certain
493-
operations where documented. Available options: 'pandas', 'pyarrow',
494-
the default is 'pandas'.
495-
"""
496-
497490
with cf.config_prefix("mode"):
498491
cf.register_option(
499492
"string_storage",
500493
"python",
501494
string_storage_doc,
502495
validator=is_one_of_factory(["python", "pyarrow"]),
503496
)
504-
cf.register_option(
505-
"dtype_backend",
506-
"pandas",
507-
dtype_backend_doc,
508-
validator=is_one_of_factory(["pandas", "pyarrow"]),
509-
)
510-
511-
512-
nullable_dtypes_doc = """
513-
: bool
514-
If nullable dtypes should be returned. This is only applicable to functions
515-
where the ``use_nullable_dtypes`` keyword is implemented.
516-
"""
517-
518-
with cf.config_prefix("mode"):
519-
cf.register_option(
520-
"nullable_dtypes",
521-
False,
522-
nullable_dtypes_doc,
523-
validator=is_bool,
524-
)
525497

526498

527499
# Set up the io.excel specific reader configuration.

pandas/core/dtypes/cast.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1007,7 +1007,7 @@ def convert_dtypes(
10071007
convert_boolean: bool = True,
10081008
convert_floating: bool = True,
10091009
infer_objects: bool = False,
1010-
dtype_backend: Literal["pandas", "pyarrow"] = "pandas",
1010+
dtype_backend: Literal["numpy_nullable", "pyarrow"] = "numpy_nullable",
10111011
) -> DtypeObj:
10121012
"""
10131013
Convert objects to best possible type, and optionally,
@@ -1029,10 +1029,10 @@ def convert_dtypes(
10291029
infer_objects : bool, defaults False
10301030
Whether to also infer objects to float/int if possible. Is only hit if the
10311031
object array contains pd.NA.
1032-
dtype_backend : str, default "pandas"
1032+
dtype_backend : str, default "numpy_nullable"
10331033
Nullable dtype implementation to use.
10341034
1035-
* "pandas" returns numpy-backed nullable types
1035+
* "numpy_nullable" returns numpy-backed nullable types
10361036
* "pyarrow" returns pyarrow-backed nullable types using ``ArrowDtype``
10371037
10381038
Returns

0 commit comments

Comments
 (0)