PERF: Series.str.split(expand=True) for pyarrow-backed strings #53585

lukemanley · 2023-06-10T10:05:04Z

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v2.1.0.rst file if fixing a bug or adding a new feature.

Perf improvement for Series.str.split with expand=True for ArrowDtype(pa.string()):

import pandas as pd
import pyarrow as pa

N = 10_000
data = ["foo|bar|baz"] * N
ser = pd.Series(data, dtype=pd.ArrowDtype(pa.string()))

%timeit ser.str.split("|", expand=True)

# 93.8 ms ± 2.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  -> main
# 5.43 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  -> PR

mroeschke · 2023-06-12T17:45:12Z

Nice! Thanks @lukemanley

…s-dev#53585) * PERF: Series.str.split(expand=True) for pyarrow-backed * gh ref

PERF: Series.str.split(expand=True) for pyarrow-backed

ab991a6

lukemanley added Performance Memory or execution speed performance Strings String extension data type and string data Arrow pyarrow functionality labels Jun 10, 2023

lukemanley added this to the 2.1 milestone Jun 10, 2023

gh ref

730cf54

mroeschke approved these changes Jun 12, 2023

View reviewed changes

mroeschke merged commit 94a8af5 into pandas-dev:main Jun 12, 2023

lukemanley deleted the str-split-expand-arrow branch June 13, 2023 23:27

Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023

PERF: Series.str.split(expand=True) for pyarrow-backed strings (panda…

69c0701

…s-dev#53585) * PERF: Series.str.split(expand=True) for pyarrow-backed * gh ref

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Series.str.split(expand=True) for pyarrow-backed strings #53585

PERF: Series.str.split(expand=True) for pyarrow-backed strings #53585

Uh oh!

lukemanley commented Jun 10, 2023

Uh oh!

mroeschke commented Jun 12, 2023

Uh oh!

Uh oh!

Uh oh!

PERF: Series.str.split(expand=True) for pyarrow-backed strings #53585

PERF: Series.str.split(expand=True) for pyarrow-backed strings #53585

Uh oh!

Conversation

lukemanley commented Jun 10, 2023

Uh oh!

mroeschke commented Jun 12, 2023

Uh oh!

Uh oh!