Skip to content

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Mar 21, 2023

cc @jbrockmendel - this may partly close #48212, however, I suspect the OP was referring to non-EA's given the old version of pandas.

Performance improvement is mostly for EA's where the .kind call can be a bottleneck.

import pyarrow as pa
import pandas as pd
from pandas.core.internals.blocks import get_block_type

%timeit get_block_type(pd.ArrowDtype(pa.float64()))
# 3.51 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)    <- main
# 740 ns ± 5.19 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)  <- PR

%timeit get_block_type(pd.Float64Dtype())
# 1.3 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)  <- main
# 289 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)   <- PR

@lukemanley lukemanley added Performance Memory or execution speed performance Internals Related to non-user accessible pandas implementation labels Mar 21, 2023
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

kind = dtype.kind
if kind in ["M", "m"]:
return DatetimeLikeBlock
elif kind in ["f", "c", "i", "u", "b"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can improve a little bit here by checking kind in "fciub" instead of the list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


cls: type[Block]

if isinstance(dtype, SparseDtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the SparseDtype check may no longer be needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your suggested updates give a bit of an improvement to non-EA's as well:

import numpy as np
from pandas.core.internals.blocks import get_block_type

%timeit get_block_type(np.dtype('float64'))

# 724 ns ± 59.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)  -> main
# 590 ns ± 30 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)    -> PR

@jbrockmendel
Copy link
Member

ping on green

@lukemanley
Copy link
Member Author

ping on green

green - thanks

@jbrockmendel jbrockmendel merged commit 5c15588 into pandas-dev:main Mar 22, 2023
@jbrockmendel
Copy link
Member

thanks @lukemanley

@lukemanley lukemanley added this to the 2.1 milestone Mar 22, 2023
@lukemanley lukemanley deleted the perf-get-block-type branch April 18, 2023 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: get_block_type heavy use could benefit performance improvements
2 participants