DataFrame replace slow on DataFrames containing strings when using use_inf_as_null option #18176

spillz · 2017-11-08T17:27:28Z

Probably an edge case but something that bit me

import pandas, numpy as np
df = pandas.DataFrame({'a':[0]*5000+[1]*5000, 'b':[2]*5000+[1]*5000 , 'c': ['a']*5000 + ['b']*5000})
%timeit df.replace(1,3, inplace=True)

df = pandas.DataFrame({'a':[0]*5000+[1]*5000, 'b':[2]*5000+[1]*5000 , 'c': ['a']*5000 + ['b']*5000})
def rep(df):
    for c in df.columns:
        df.loc[df[c]==1,c] = 3
    return df
%timeit rep(df)

pandas.set_option('use_inf_as_null', True)
df = pandas.DataFrame({'a':[0]*5000+[1]*5000, 'b':[2]*5000+[1]*5000, 'c': ['a']*5000 + ['b']*5000})
%timeit df.replace(1,3, inplace=True)

df = pandas.DataFrame({'a':[0]*5000+[1]*5000, 'b':[2]*5000+[1]*5000 , 'c': ['a']*5000 + ['b']*5000})
def rep(df):
    for c in df.columns:
        df.loc[df[c]==1,c] = 3
    return df
%timeit rep(df)

One of these things is not like the other!

1000 loops, best of 3: 1.77 ms per loop
100 loops, best of 3: 5.89 ms per loop
1 loop, best of 3: 2.24 s per loop
100 loops, best of 3: 5.92 ms per loop

Pandas 0.20.1

The text was updated successfully, but these errors were encountered:

jreback · 2017-11-08T21:17:20Z

in 0.21.0

this looks ok.

494 us +- 5.83 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
8.1 ms +- 553 us per loop (mean +- std. dev. of 7 runs, 100 loops each)
523 us +- 33.2 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
7.54 ms +- 212 us per loop (mean +- std. dev. of 7 runs, 100 loops each)

can you give a try. also would take a PR with asv's for this.

jreback · 2017-11-08T21:18:03Z

note that the option changed in 0.21.0 (same just null -> na)

pandas.set_option('use_inf_as_na', True)

mroeschke · 2024-02-14T21:10:27Z

Given that this option was deprecated in 2.1 #53494, going to close as a wont fix

jreback added Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance labels Nov 8, 2017

jbrockmendel added the replace replace method label Sep 21, 2020

mroeschke added Benchmark Performance (ASV) benchmarks and removed Performance Memory or execution speed performance Dtype Conversions Unexpected or buggy dtype conversions labels Jun 12, 2021

mroeschke closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DataFrame replace slow on DataFrames containing strings when using use_inf_as_null option #18176

DataFrame replace slow on DataFrames containing strings when using use_inf_as_null option #18176

spillz commented Nov 8, 2017

jreback commented Nov 8, 2017

Uh oh!

jreback commented Nov 8, 2017 •

edited

Loading

Uh oh!

mroeschke commented Feb 14, 2024

Uh oh!

Uh oh!

DataFrame replace slow on DataFrames containing strings when using use_inf_as_null option #18176

DataFrame replace slow on DataFrames containing strings when using use_inf_as_null option #18176

Comments

spillz commented Nov 8, 2017

jreback commented Nov 8, 2017

Uh oh!

jreback commented Nov 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Feb 14, 2024

Uh oh!

jreback commented Nov 8, 2017 •

edited

Loading