Skip to content

DEPR: GroupBy.cumsum etc with axis=1 #51046

Closed
@jbrockmendel

Description

@jbrockmendel
df = pd.DataFrame(np.random.randn(10_000_000, 3))
grps = ["A", "B", "C", "D", "E"] * 2_000_000
gb = df.groupby(grps)

res = gb.cumsum(axis=1)
alt = df.cumsum(axis=1)

tm.assert_frame_equal(res, alt)

AFAICT GroupBy.cumsum with axis=1 is equivalent to DataFrame.cumsum(axis=1), just slower bc it operates group-by-group and reconstructs (timeit 1.34s vs 3.68s)

Are there scenarios where these are not equivalent? If not, we should either deprecate the GroupBy version or at least have it dispatch to the more performant version.

Same goes for cumprod, cummin, cummax and probably pct_change, shift, rank

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions