Skip to content

2.3.0 #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 49 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
3223070
Revert "CI: Pin blosc to fix pytables" (#58218)
lithomas1 Apr 11, 2024
d842753
Remove deprecated plot_date calls (#58484)
QuLogic Apr 30, 2024
acb9e97
ENH: Fix Python 3.13 test failures & enable CI (#59065)
lysnikolaou Jun 25, 2024
e480752
remove ops div class to solve #21374 (#59144)
WillAyd Aug 27, 2024
e496893
PDEP-14: Dedicated string data type for pandas 3.0 (#58551)
jorisvandenbossche Jul 24, 2024
b9614e7
TST / string dtype: add env variable to enable future_string and add …
jorisvandenbossche Jul 26, 2024
9ca7483
REF (string dtype): rename using_pyarrow_string_dtype to using_string…
jorisvandenbossche Jul 26, 2024
ab9d1db
TST (string dtype): clean-up xpasssing tests with future string dtype…
jorisvandenbossche Jul 27, 2024
1bf735b
String dtype: rename the storage options and add `na_value` keyword i…
jorisvandenbossche Jul 29, 2024
aba9fef
TST (string dtype): xfail all currently failing tests with future.inf…
WillAyd Aug 14, 2024
474241a
TST (string dtype): follow-up on GH-59329 fixing new xfails (#59352)
jorisvandenbossche Jul 30, 2024
2268c2b
TST (string dtype): change any_string_dtype fixture to use actual dty…
jorisvandenbossche Jul 31, 2024
8fbab63
TST (string dtype): remove usage of arrow_string_storage fixture (#59…
jorisvandenbossche Jul 31, 2024
6838faf
TST (string dtype): replace string_storage fixture with explicit stor…
jorisvandenbossche Jul 31, 2024
2e3f225
String dtype: restrict options.mode.string_storage to python|pyarrow …
jorisvandenbossche Aug 1, 2024
337ef04
API/TST: expand tests for string any/all reduction + fix pyarrow-base…
jorisvandenbossche Aug 6, 2024
c760c00
String dtype: implement object-dtype based StringArray variant with N…
WillAyd Aug 14, 2024
adbc4ed
REF (string dtype): de-duplicate _str_map methods (#59443)
WillAyd Aug 14, 2024
2d1174d
String dtype: use 'str' string alias and representation for NaN-varia…
WillAyd Aug 27, 2024
b7928e2
String dtype: fix alignment sorting in case of python storage (#59448)
jorisvandenbossche Aug 8, 2024
f1879d8
TST (string dtype): add test build with future strings enabled withou…
WillAyd Aug 14, 2024
2bb5ce1
REF (string dtype): de-duplicate _str_map (2) (#59451)
jbrockmendel Aug 9, 2024
54afab2
REF (string): de-duplicate str_map_nan_semantics (#59464)
jbrockmendel Aug 9, 2024
c424458
BUG (string dtype): convert dictionary input to materialized string a…
jorisvandenbossche Aug 12, 2024
9ade95d
String dtype: fix convert_dtypes() to convert NaN-string to NA-string…
jorisvandenbossche Aug 12, 2024
4eba41b
String dtype: honor mode.string_storage option (and change default to…
jorisvandenbossche Aug 12, 2024
837b132
BUG (string): ArrowEA comparisons with mismatched types (#59505)
jbrockmendel Aug 13, 2024
ee701c2
TST (string dtype): clean up construction of expected string arrays (…
jorisvandenbossche Aug 14, 2024
07dc9a2
TST (string dtype): clean up construction of expected string arrays (…
WillAyd Aug 22, 2024
0b98307
TST (string dtype): fix IO dtype_backend tests for storage of str dty…
WillAyd Aug 22, 2024
182842d
REF (string): Move StringArrayNumpySemantics methods to base class (#…
jbrockmendel Aug 14, 2024
e5dfcfa
REF (string): remove _str_na_value (#59515)
jbrockmendel Aug 15, 2024
8cdac15
REF (string): move ArrowStringArrayNumpySemantics methods to base cla…
jbrockmendel Aug 15, 2024
80499c9
API (string): return str dtype for .dt methods, DatetimeIndex methods…
jbrockmendel Aug 16, 2024
2c9aa39
String dtype: still return nullable NA-variant in object inference (`…
jorisvandenbossche Aug 21, 2024
057f64d
Backport fixes
WillAyd Aug 15, 2024
3a03337
Pick required fix from 2542674ee9 #56709
WillAyd Aug 27, 2024
db23861
Pick required fix from f4232e7 #58006
WillAyd Aug 22, 2024
1870e57
Pick required fix from #55901 and #59581
WillAyd Aug 22, 2024
ef661e6
Remove .pre-commit check for pytest ref #56671
WillAyd Aug 22, 2024
070b5c0
Skip niche issue
WillAyd Aug 22, 2024
74112b5
Add required skip from #58467
WillAyd Aug 27, 2024
490fd90
Remove tests that will fail without backport of #58437
WillAyd Aug 27, 2024
3c4bc03
un-xfail and adjust more tests
lithomas1 Sep 9, 2024
01b5d26
silence infer_objects warning for object backed string type
lithomas1 Sep 9, 2024
e8390b9
adjust replace tests for infer_string
lithomas1 Sep 9, 2024
bc9be34
silence value_counts dtype inference warning for non pyarrow
lithomas1 Sep 9, 2024
277afe3
pick out stringarray keepdims changes from #59234
lithomas1 Sep 9, 2024
7623eed
remove more xpasses
lithomas1 Sep 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/actions/setup-conda/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,9 @@ runs:
condarc-file: ci/.condarc
cache-environment: true
cache-downloads: true

- name: Uninstall pyarrow
if: ${{ env.REMOVE_PYARROW == '1' }}
run: |
micromamba remove -y pyarrow
shell: bash -el {0}
21 changes: 15 additions & 6 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ on:
push:
branches:
- main
- 2.2.x
- 2.3.x
pull_request:
branches:
- main
- 2.2.x
- 2.3.x
paths-ignore:
- "doc/**"
- "web/**"
Expand All @@ -29,6 +29,7 @@ jobs:
env_file: [actions-39.yaml, actions-310.yaml, actions-311.yaml, actions-312.yaml]
# Prevent the include jobs from overriding other jobs
pattern: [""]
pandas_future_infer_string: ["0"]
include:
- name: "Downstream Compat"
env_file: actions-311-downstream_compat.yaml
Expand Down Expand Up @@ -85,6 +86,12 @@ jobs:
env_file: actions-39.yaml
pattern: "not slow and not network and not single_cpu"
pandas_copy_on_write: "warn"
- name: "Future infer strings"
env_file: actions-312.yaml
pandas_future_infer_string: "1"
- name: "Future infer strings (without pyarrow)"
env_file: actions-311.yaml
pandas_future_infer_string: "1"
- name: "Pypy"
env_file: actions-pypy-39.yaml
pattern: "not slow and not network and not single_cpu"
Expand All @@ -103,16 +110,18 @@ jobs:
LANG: ${{ matrix.lang || 'C.UTF-8' }}
LC_ALL: ${{ matrix.lc_all || '' }}
PANDAS_COPY_ON_WRITE: ${{ matrix.pandas_copy_on_write || '0' }}
PANDAS_CI: ${{ matrix.pandas_ci || '1' }}
PANDAS_CI: '1'
PANDAS_FUTURE_INFER_STRING: ${{ matrix.pandas_future_infer_string || '0' }}
TEST_ARGS: ${{ matrix.test_args || '' }}
PYTEST_WORKERS: ${{ matrix.pytest_workers || 'auto' }}
PYTEST_TARGET: ${{ matrix.pytest_target || 'pandas' }}
NPY_PROMOTION_STATE: ${{ matrix.env_file == 'actions-311-numpydev.yaml' && 'weak' || 'legacy' }}
# Clipboard tests
QT_QPA_PLATFORM: offscreen
REMOVE_PYARROW: ${{ matrix.name == 'Future infer strings (without pyarrow)' && '1' || '0' }}
concurrency:
# https://github.community/t/concurrecy-not-work-for-push/183068/7
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{ matrix.env_file }}-${{ matrix.pattern }}-${{ matrix.extra_apt || '' }}-${{ matrix.pandas_copy_on_write || '' }}
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{ matrix.env_file }}-${{ matrix.pattern }}-${{ matrix.extra_apt || '' }}-${{ matrix.pandas_copy_on_write || '' }}-${{ matrix.pandas_future_infer_string }}
cancel-in-progress: true

services:
Expand Down Expand Up @@ -329,7 +338,7 @@ jobs:
# To freeze this file, uncomment out the ``if: false`` condition, and migrate the jobs
# to the corresponding posix/windows-macos/sdist etc. workflows.
# Feel free to modify this comment as necessary.
if: false # Uncomment this to freeze the workflow, comment it to unfreeze
# if: false # Uncomment this to freeze the workflow, comment it to unfreeze
defaults:
run:
shell: bash -eou pipefail {0}
Expand Down Expand Up @@ -361,7 +370,7 @@ jobs:
- name: Set up Python Dev Version
uses: actions/setup-python@v5
with:
python-version: '3.12-dev'
python-version: '3.13-dev'

- name: Build Environment
run: |
Expand Down
7 changes: 0 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -274,13 +274,6 @@ repos:
language: python
types: [rst]
files: ^doc/source/(development|reference)/
- id: unwanted-patterns-bare-pytest-raises
name: Check for use of bare pytest raises
language: python
entry: python scripts/validate_unwanted_patterns.py --validation-type="bare_pytest_raises"
types: [python]
files: ^pandas/tests/
exclude: ^pandas/tests/extension/
- id: unwanted-patterns-private-function-across-module
name: Check for use of private functions across modules
language: python
Expand Down
2 changes: 0 additions & 2 deletions ci/deps/actions-310.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ dependencies:

# optional dependencies
- beautifulsoup4>=4.11.2
# https://github.com/conda-forge/pytables-feedstock/issues/97
- c-blosc2=2.13.2
- blosc>=1.21.3
- bottleneck>=1.3.6
- fastparquet>=2022.12.0
Expand Down
2 changes: 0 additions & 2 deletions ci/deps/actions-311-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@ dependencies:

# optional dependencies
- beautifulsoup4>=4.11.2
# https://github.com/conda-forge/pytables-feedstock/issues/97
- c-blosc2=2.13.2
- blosc>=1.21.3
- bottleneck>=1.3.6
- fastparquet>=2022.12.0
Expand Down
2 changes: 0 additions & 2 deletions ci/deps/actions-311.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ dependencies:

# optional dependencies
- beautifulsoup4>=4.11.2
# https://github.com/conda-forge/pytables-feedstock/issues/97
- c-blosc2=2.13.2
- blosc>=1.21.3
- bottleneck>=1.3.6
- fastparquet>=2022.12.0
Expand Down
2 changes: 0 additions & 2 deletions ci/deps/actions-312.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ dependencies:

# optional dependencies
- beautifulsoup4>=4.11.2
# https://github.com/conda-forge/pytables-feedstock/issues/97
- c-blosc2=2.13.2
- blosc>=1.21.3
- bottleneck>=1.3.6
- fastparquet>=2022.12.0
Expand Down
2 changes: 0 additions & 2 deletions ci/deps/actions-39-minimum_versions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@ dependencies:

# optional dependencies
- beautifulsoup4=4.11.2
# https://github.com/conda-forge/pytables-feedstock/issues/97
- c-blosc2=2.13.2
- blosc=1.21.3
- bottleneck=1.3.6
- fastparquet=2022.12.0
Expand Down
2 changes: 0 additions & 2 deletions ci/deps/actions-39.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ dependencies:

# optional dependencies
- beautifulsoup4>=4.11.2
# https://github.com/conda-forge/pytables-feedstock/issues/97
- c-blosc2=2.13.2
- blosc>=1.21.3
- bottleneck>=1.3.6
- fastparquet>=2022.12.0
Expand Down
2 changes: 0 additions & 2 deletions ci/deps/circle-310-arm64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ dependencies:

# optional dependencies
- beautifulsoup4>=4.11.2
# https://github.com/conda-forge/pytables-feedstock/issues/97
- c-blosc2=2.13.2
- blosc>=1.21.3
- bottleneck>=1.3.6
- fastparquet>=2022.12.0
Expand Down
2 changes: 0 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@ dependencies:

# optional dependencies
- beautifulsoup4>=4.11.2
# https://github.com/conda-forge/pytables-feedstock/issues/97
- c-blosc2=2.13.2
- blosc
- bottleneck>=1.3.6
- fastparquet>=2022.12.0
Expand Down
2 changes: 1 addition & 1 deletion pandas/_config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,6 @@ def using_nullable_dtypes() -> bool:
return _mode_options["nullable_dtypes"]


def using_pyarrow_string_dtype() -> bool:
def using_string_dtype() -> bool:
_mode_options = _global_config["future"]
return _mode_options["infer_string"]
12 changes: 9 additions & 3 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ from cython cimport (
floating,
)

from pandas._config import using_pyarrow_string_dtype
from pandas._config import using_string_dtype

from pandas._libs.missing import check_na_tuples_nonequal

Expand Down Expand Up @@ -2725,10 +2725,16 @@ def maybe_convert_objects(ndarray[object] objects,
seen.object_ = True

elif seen.str_:
if using_pyarrow_string_dtype() and is_string_array(objects, skipna=True):
if convert_to_nullable_dtype and is_string_array(objects, skipna=True):
from pandas.core.arrays.string_ import StringDtype

dtype = StringDtype(storage="pyarrow_numpy")
dtype = StringDtype()
return dtype.construct_array_type()._from_sequence(objects, dtype=dtype)

elif using_string_dtype() and is_string_array(objects, skipna=True):
from pandas.core.arrays.string_ import StringDtype

dtype = StringDtype(na_value=np.nan)
return dtype.construct_array_type()._from_sequence(objects, dtype=dtype)

seen.object_ = True
Expand Down
12 changes: 6 additions & 6 deletions pandas/_libs/src/vendored/ujson/python/objToJSON.c
Original file line number Diff line number Diff line change
Expand Up @@ -410,8 +410,8 @@ static void NpyArr_iterBegin(JSOBJ _obj, JSONTypeContext *tc) {
npyarr->type_num = PyArray_DESCR(obj)->type_num;

if (GET_TC(tc)->transpose) {
npyarr->dim = PyArray_DIM(obj, npyarr->ndim);
npyarr->stride = PyArray_STRIDE(obj, npyarr->ndim);
npyarr->dim = PyArray_DIM(obj, (int)npyarr->ndim);
npyarr->stride = PyArray_STRIDE(obj, (int)npyarr->ndim);
npyarr->stridedim = npyarr->ndim;
npyarr->index[npyarr->ndim] = 0;
npyarr->inc = -1;
Expand Down Expand Up @@ -452,8 +452,8 @@ static void NpyArrPassThru_iterEnd(JSOBJ obj, JSONTypeContext *tc) {
return;
}
const PyArrayObject *arrayobj = (const PyArrayObject *)npyarr->array;
npyarr->dim = PyArray_DIM(arrayobj, npyarr->stridedim);
npyarr->stride = PyArray_STRIDE(arrayobj, npyarr->stridedim);
npyarr->dim = PyArray_DIM(arrayobj, (int)npyarr->stridedim);
npyarr->stride = PyArray_STRIDE(arrayobj, (int)npyarr->stridedim);
npyarr->dataptr += npyarr->stride;

NpyArr_freeItemValue(obj, tc);
Expand Down Expand Up @@ -524,8 +524,8 @@ static int NpyArr_iterNext(JSOBJ _obj, JSONTypeContext *tc) {
}
const PyArrayObject *arrayobj = (const PyArrayObject *)npyarr->array;

npyarr->dim = PyArray_DIM(arrayobj, npyarr->stridedim);
npyarr->stride = PyArray_STRIDE(arrayobj, npyarr->stridedim);
npyarr->dim = PyArray_DIM(arrayobj, (int)npyarr->stridedim);
npyarr->stride = PyArray_STRIDE(arrayobj, (int)npyarr->stridedim);
npyarr->index[npyarr->stridedim] = 0;

((PyObjectEncoder *)tc->encoder)->npyCtxtPassthru = npyarr;
Expand Down
7 changes: 6 additions & 1 deletion pandas/_libs/tslibs/offsets.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -4960,7 +4960,12 @@ cpdef to_offset(freq, bint is_period=False):
if result is None:
raise ValueError(INVALID_FREQ_ERR_MSG.format(freq))

if is_period and not hasattr(result, "_period_dtype_code"):
try:
has_period_dtype_code = hasattr(result, "_period_dtype_code")
except ValueError:
has_period_dtype_code = False

if is_period and not has_period_dtype_code:
if isinstance(freq, str):
raise ValueError(f"{result.name} is not supported as period frequency")
else:
Expand Down
11 changes: 8 additions & 3 deletions pandas/_testing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

import numpy as np

from pandas._config import using_string_dtype
from pandas._config.localization import (
can_set_locale,
get_locales,
Expand Down Expand Up @@ -110,7 +111,11 @@
ALL_FLOAT_DTYPES: list[Dtype] = [*FLOAT_NUMPY_DTYPES, *FLOAT_EA_DTYPES]

COMPLEX_DTYPES: list[Dtype] = [complex, "complex64", "complex128"]
STRING_DTYPES: list[Dtype] = [str, "str", "U"]
if using_string_dtype():
STRING_DTYPES: list[Dtype] = [str, "U"]
else:
STRING_DTYPES: list[Dtype] = [str, "str", "U"] # type: ignore[no-redef]
COMPLEX_FLOAT_DTYPES: list[Dtype] = [*COMPLEX_DTYPES, *FLOAT_NUMPY_DTYPES]

DATETIME64_DTYPES: list[Dtype] = ["datetime64[ns]", "M8[ns]"]
TIMEDELTA64_DTYPES: list[Dtype] = ["timedelta64[ns]", "m8[ns]"]
Expand Down Expand Up @@ -526,14 +531,14 @@ def shares_memory(left, right) -> bool:
if (
isinstance(left, ExtensionArray)
and is_string_dtype(left.dtype)
and left.dtype.storage in ("pyarrow", "pyarrow_numpy") # type: ignore[attr-defined]
and left.dtype.storage == "pyarrow" # type: ignore[attr-defined]
):
# https://github.com/pandas-dev/pandas/pull/43930#discussion_r736862669
left = cast("ArrowExtensionArray", left)
if (
isinstance(right, ExtensionArray)
and is_string_dtype(right.dtype)
and right.dtype.storage in ("pyarrow", "pyarrow_numpy") # type: ignore[attr-defined]
and right.dtype.storage == "pyarrow" # type: ignore[attr-defined]
):
right = cast("ArrowExtensionArray", right)
left_pa_data = left._pa_array
Expand Down
28 changes: 26 additions & 2 deletions pandas/_testing/asserters.py
Original file line number Diff line number Diff line change
Expand Up @@ -593,13 +593,19 @@ def raise_assert_detail(

if isinstance(left, np.ndarray):
left = pprint_thing(left)
elif isinstance(left, (CategoricalDtype, NumpyEADtype, StringDtype)):
elif isinstance(left, (CategoricalDtype, NumpyEADtype)):
left = repr(left)
elif isinstance(left, StringDtype):
# TODO(infer_string) this special case could be avoided if we have
# a more informative repr https://github.com/pandas-dev/pandas/issues/59342
left = f"StringDtype(storage={left.storage}, na_value={left.na_value})"

if isinstance(right, np.ndarray):
right = pprint_thing(right)
elif isinstance(right, (CategoricalDtype, NumpyEADtype, StringDtype)):
elif isinstance(right, (CategoricalDtype, NumpyEADtype)):
right = repr(right)
elif isinstance(right, StringDtype):
right = f"StringDtype(storage={right.storage}, na_value={right.na_value})"

msg += f"""
[left]: {left}
Expand Down Expand Up @@ -805,6 +811,24 @@ def assert_extension_array_equal(
left_na, right_na, obj=f"{obj} NA mask", index_values=index_values
)

# Specifically for StringArrayNumpySemantics, validate here we have a valid array
if (
isinstance(left.dtype, StringDtype)
and left.dtype.storage == "python"
and left.dtype.na_value is np.nan
):
assert np.all(
[np.isnan(val) for val in left._ndarray[left_na]] # type: ignore[attr-defined]
), "wrong missing value sentinels"
if (
isinstance(right.dtype, StringDtype)
and right.dtype.storage == "python"
and right.dtype.na_value is np.nan
):
assert np.all(
[np.isnan(val) for val in right._ndarray[right_na]] # type: ignore[attr-defined]
), "wrong missing value sentinels"

left_valid = left[~left_na].to_numpy(dtype=object)
right_valid = right[~right_na].to_numpy(dtype=object)
if check_exact:
Expand Down
2 changes: 2 additions & 0 deletions pandas/compat/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import pandas.compat.compressors
from pandas.compat.numpy import is_numpy_dev
from pandas.compat.pyarrow import (
HAS_PYARROW,
pa_version_under10p1,
pa_version_under11p0,
pa_version_under13p0,
Expand Down Expand Up @@ -190,6 +191,7 @@ def get_bz2_file() -> type[pandas.compat.compressors.BZ2File]:
"pa_version_under14p1",
"pa_version_under16p0",
"pa_version_under17p0",
"HAS_PYARROW",
"IS64",
"ISMUSL",
"PY310",
Expand Down
2 changes: 2 additions & 0 deletions pandas/compat/pyarrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
pa_version_under15p0 = _palv < Version("15.0.0")
pa_version_under16p0 = _palv < Version("16.0.0")
pa_version_under17p0 = _palv < Version("17.0.0")
HAS_PYARROW = True
except ImportError:
pa_version_under10p1 = True
pa_version_under11p0 = True
Expand All @@ -27,3 +28,4 @@
pa_version_under15p0 = True
pa_version_under16p0 = True
pa_version_under17p0 = True
HAS_PYARROW = False
Loading
Loading