Skip to content

be80898 breaks display of large dataframes in IPython notebooks #5588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Nov 26, 2013 · 15 comments
Closed

be80898 breaks display of large dataframes in IPython notebooks #5588

michaelaye opened this issue Nov 26, 2013 · 15 comments
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string Performance Memory or execution speed performance
Milestone

Comments

@michaelaye
Copy link
Contributor

In 044ee06, I could just do, with no apparent delay:

df
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10718114 entries, 0 to 10718113
Columns: 20 entries, classification_id to spread
dtypes: datetime64[ns](2), float64(11), object(7)

In be80898, the notebook gets stuck and CPU is boiling and nothing is returned after 5 mins.

related #5550

@jreback
Copy link
Contributor

jreback commented Nov 26, 2013

cc @takluyver

@michaelaye

can u post

%prun str(df)

python 2.7 ?

@michaelaye
Copy link
Contributor Author

Yes, 2.7.3 (Enthought Canopy, all updates)

Here you go:

183499 function calls (183497 primitive calls) in 2.060 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       33    1.951    0.059    1.951    0.059 {pandas.lib.infer_dtype}
    22075    0.026    0.000    0.035    0.000 StringIO.py:208(write)
     2460    0.013    0.000    0.022    0.000 format.py:1792(just)
    27510    0.005    0.000    0.006    0.000 {isinstance}
     6160    0.005    0.000    0.005    0.000 {_codecs.utf_8_decode}
        1    0.005    0.005    0.040    0.040 StringIO.py:241(writelines)
     6160    0.004    0.000    0.016    0.000 format.py:226(_strlen)
     4095    0.003    0.000    0.008    0.000 {method 'decode' of 'str' objects}
      134    0.003    0.000    0.003    0.000 {method 'reduce' of 'numpy.ufunc' objects}
    22076    0.003    0.000    0.003    0.000 StringIO.py:38(_complain_ifclosed)
     6160    0.002    0.000    0.008    0.000 utf_8.py:15(decode)
31379/31377    0.002    0.000    0.002    0.000 {len}
       41    0.002    0.000    0.033    0.001 format.py:1772(_make_fixed_width)
    22977    0.002    0.000    0.002    0.000 {method 'append' of 'list' objects}
     2065    0.001    0.000    0.004    0.000 {method 'decode' of 'unicode' objects}
      429    0.001    0.000    0.002    0.000 common.py:2509(as_escaped_unicode)
       45    0.001    0.000    0.002    0.000 format.py:1815(_cond)
     2420    0.001    0.000    0.002    0.000 format.py:1790(<lambda>)
      429    0.001    0.000    0.004    0.000 common.py:2483(pprint_thing)
        7    0.001    0.000    0.007    0.001 format.py:1617(_format_strings)
        1    0.001    0.001    2.017    2.017 format.py:300(_to_str_columns)
     1756    0.001    0.000    0.001    0.000 numeric.py:1810(isscalar)
      120    0.001    0.000    0.001    0.000 format.py:1744(_format_datetime64)
      420    0.001    0.000    0.005    0.000 format.py:1629(_format)
        6    0.001    0.000    0.002    0.000 common.py:1723(adjoin)
      660    0.001    0.000    0.003    0.000 format.py:1670(_val)
      378    0.001    0.000    0.001    0.000 {method 'join' of 'str' objects}
       61    0.001    0.000    1.953    0.032 internals.py:1851(make_block)
     1602    0.001    0.000    0.001    0.000 {method 'rjust' of 'str' objects}
       11    0.001    0.000    0.002    0.000 format.py:1809(_trim_zeros)
      361    0.001    0.000    0.001    0.000 config.py:542(_get_deprecated_option)
       61    0.001    0.000    1.955    0.032 series.py:124(__init__)
      707    0.001    0.000    0.003    0.000 common.py:256(notnull)
      868    0.001    0.000    0.002    0.000 common.py:126(_isnull_new)
       11    0.001    0.000    0.006    0.001 format.py:1669(_format_with)
     2915    0.000    0.000    0.000    0.000 {method 'endswith' of 'str' objects}
        7    0.000    0.000    0.001    0.000 {pandas.lib.map_infer}
       61    0.000    0.000    1.954    0.032 internals.py:3399(__init__)
      430    0.000    0.000    0.001    0.000 common.py:2046(_is_sequence)
      180    0.000    0.000    0.002    0.000 index.py:615(__getitem__)
     1287    0.000    0.000    0.000    0.000 {method 'replace' of 'unicode' objects}
       11    0.000    0.000    0.015    0.001 format.py:1688(get_result)
      182    0.000    0.000    0.002    0.000 config.py:77(_get_single_key)
      868    0.000    0.000    0.001    0.000 {pandas.lib.isscalar}
      181    0.000    0.000    0.001    0.000 index.py:186(view)
       40    0.000    0.000    1.957    0.049 frame.py:1538(_ixs)
      437    0.000    0.000    0.000    0.000 {hasattr}
      838    0.000    0.000    0.000    0.000 {method 'rjust' of 'unicode' objects}
       20    0.000    0.000    0.004    0.000 series.py:475(_slice)
       22    0.000    0.000    0.001    0.000 index.py:98(__new__)
        1    0.000    0.000    0.002    0.002 format.py:380(_join_multiline)
      108    0.000    0.000    0.001    0.000 {max}
      409    0.000    0.000    0.004    0.000 format.py:1627(<lambda>)
      180    0.000    0.000    0.000    0.000 config.py:527(_get_root)
      244    0.000    0.000    0.000    0.000 {function view at 0x117f07c08}
       20    0.000    0.000    1.016    0.051 format.py:477(_format_col)
      180    0.000    0.000    0.002    0.000 config.py:95(_get_option)
      170    0.000    0.000    0.000    0.000 index.py:604(__contains__)
       61    0.000    0.000    0.002    0.000 fromnumeric.py:2048(amax)
     1541    0.000    0.000    0.000    0.000 {method 'ljust' of 'str' objects}
      868    0.000    0.000    0.002    0.000 common.py:109(isnull)
       62    0.000    0.000    0.000    0.000 series.py:252(_set_axis)
       21    0.000    0.000    0.034    0.002 format.py:1571(format_array)
     1280    0.000    0.000    0.000    0.000 {pandas.lib.checknull}
       82    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
       22    0.000    0.000    0.000    0.000 internals.py:1293(__init__)
      272    0.000    0.000    0.000    0.000 {getattr}
      179    0.000    0.000    0.001    0.000 config.py:583(_warn_if_deprecated)
       41    0.000    0.000    0.001    0.000 format.py:237(_get_formatter)
      122    0.000    0.000    0.000    0.000 abc.py:128(__instancecheck__)
       40    0.000    0.000    0.002    0.000 internals.py:2771(iget)
       22    0.000    0.000    0.001    0.000 fromnumeric.py:2249(prod)
       61    0.000    0.000    0.000    0.000 generic.py:82(__init__)
      420    0.000    0.000    0.000    0.000 common.py:1941(is_float)
       61    0.000    0.000    0.000    0.000 internals.py:54(__init__)
       40    0.000    0.000    0.001    0.000 internals.py:3051(_find_block)
      180    0.000    0.000    0.002    0.000 config.py:241(__call__)
       40    0.000    0.000    0.000    0.000 internals.py:271(get)
      182    0.000    0.000    0.000    0.000 config.py:570(_translate_key)
      181    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
      671    0.000    0.000    0.000    0.000 format.py:1696(<genexpr>)
      401    0.000    0.000    0.000    0.000 {issubclass}
        2    0.000    0.000    0.003    0.001 format.py:1734(get_result)
        1    0.000    0.000    0.979    0.979 frame.py:3415(_apply_standard)
      182    0.000    0.000    0.000    0.000 config.py:509(_select_options)
       40    0.000    0.000    0.000    0.000 {method 'get_loc' of 'pandas.index.IndexEngine' objects}
      122    0.000    0.000    0.001    0.000 common.py:1937(is_integer)
      126    0.000    0.000    0.000    0.000 _weakrefset.py:68(__contains__)
       21    0.000    0.000    0.000    0.000 index.py:1659(__new__)
       40    0.000    0.000    1.954    0.049 series.py:227(from_array)
        8    0.000    0.000    0.000    0.000 common.py:197(_isnull_ndarraylike)
      376    0.000    0.000    0.000    0.000 common.py:1746(_join_unicode)
       40    0.000    0.000    0.001    0.000 internals.py:2734(get)
       82    0.000    0.000    0.002    0.000 _methods.py:15(_amax)
       20    0.000    0.000    0.001    0.000 index.py:472(_convert_slice_indexer)
       21    0.000    0.000    0.001    0.000 series.py:2476(_sanitize_array)
       21    0.000    0.000    0.000    0.000 common.py:1542(_possibly_castable)
      429    0.000    0.000    0.000    0.000 {method 'ljust' of 'unicode' objects}
       42    0.000    0.000    0.001    0.000 format.py:220(_strlen_func)
       90    0.000    0.000    0.000    0.000 internals.py:157(__contains__)
        1    0.000    0.000    0.980    0.980 format.py:500(_get_formatted_column_labels)
      433    0.000    0.000    0.000    0.000 {iter}
        8    0.000    0.000    0.000    0.000 {pandas.lib.isnullobj}
       40    0.000    0.000    0.000    0.000 index.py:1009(get_loc)
       76    0.000    0.000    0.000    0.000 common.py:56(_check)
        1    0.000    0.000    0.000    0.000 {pandas.lib.fast_multiget}
       10    0.000    0.000    0.000    0.000 common.py:1559(_possibly_cast_to_datetime)
       21    0.000    0.000    0.000    0.000 series.py:2489(_try_cast)
       40    0.000    0.000    1.957    0.049 frame.py:1535(icol)
       30    0.000    0.000    0.000    0.000 _methods.py:31(_any)
       62    0.000    0.000    0.000    0.000 series.py:274(_set_subtyp)
       20    0.000    0.000    0.000    0.000 generic.py:1552(__finalize__)
       40    0.000    0.000    0.000    0.000 internals.py:3057(_check_have)
       61    0.000    0.000    0.000    0.000 internals.py:119(set_ref_locs)
       60    0.000    0.000    0.000    0.000 format.py:1725(<lambda>)
      161    0.000    0.000    0.000    0.000 internals.py:2140(_get_items)
        2    0.000    0.000    0.002    0.001 index.py:705(_format_with_header)
        1    0.000    0.000    0.000    0.000 {pandas.lib.maybe_convert_objects}
        1    0.000    0.000    0.001    0.001 format.py:1721(get_result)
        1    0.000    0.000    0.000    0.000 format.py:2049(_binify)
       22    0.000    0.000    0.001    0.000 _methods.py:27(_prod)
        7    0.000    0.000    0.014    0.002 format.py:1613(get_result)
        1    0.000    0.000    0.000    0.000 StringIO.py:258(getvalue)
       30    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
       41    0.000    0.000    0.000    0.000 internals.py:3520(values)
       20    0.000    0.000    0.000    0.000 index.py:449(_validate_slicer)
       40    0.000    0.000    0.000    0.000 index.py:413(is_integer)
        1    0.000    0.000    2.059    2.059 frame.py:1252(to_string)
       25    0.000    0.000    0.000    0.000 index.py:220(__array_finalize__)
        2    0.000    0.000    0.000    0.000 abc.py:148(__subclasscheck__)
       21    0.000    0.000    0.000    0.000 format.py:1603(__init__)
       60    0.000    0.000    0.000    0.000 index.py:476(validate)
       20    0.000    0.000    0.004    0.000 series.py:692(__getslice__)
        1    0.000    0.000    0.000    0.000 index.py:3504(_trim_front)
       60    0.000    0.000    0.000    0.000 format.py:1788(<lambda>)
       25    0.000    0.000    0.000    0.000 index.py:182(_reset_identity)
       30    0.000    0.000    0.001    0.000 {method 'any' of 'numpy.ndarray' objects}
       21    0.000    0.000    0.000    0.000 common.py:2012(is_float_dtype)
       40    0.000    0.000    0.000    0.000 common.py:1492(_values_from_object)
       20    0.000    0.000    0.000    0.000 frame.py:1435(<lambda>)
       21    0.000    0.000    0.000    0.000 series.py:1010(values)
        1    0.000    0.000    0.001    0.001 format.py:547(_get_formatted_index)
       21    0.000    0.000    0.000    0.000 series.py:300(dtype)
       20    0.000    0.000    0.000    0.000 common.py:1636(_is_bool_indexer)
        1    0.000    0.000    2.059    2.059 format.py:353(to_string)
       21    0.000    0.000    0.000    0.000 internals.py:3512(dtype)
       21    0.000    0.000    0.978    0.047 frame.py:3437(<genexpr>)
       11    0.000    0.000    0.000    0.000 format.py:1663(__init__)
       40    0.000    0.000    0.000    0.000 format.py:326(<genexpr>)
        1    0.000    0.000    0.000    0.000 {pandas.lib.list_to_object_array}
       40    0.000    0.000    0.000    0.000 index.py:491(is_int)
      170    0.000    0.000    0.000    0.000 {hash}
        1    0.000    0.000    2.060    2.060 base.py:37(__bytes__)
        6    0.000    0.000    0.000    0.000 internals.py:1479(__init__)
       48    0.000    0.000    0.000    0.000 {all}
        1    0.000    0.000    2.060    2.060 frame.py:438(__unicode__)
       20    0.000    0.000    0.000    0.000 series.py:1021(get_values)
       21    0.000    0.000    0.000    0.000 {method 'max' of 'numpy.ndarray' objects}
        6    0.000    0.000    0.000    0.000 _weakrefset.py:58(__iter__)
       61    0.000    0.000    0.000    0.000 index.py:1700(is_all_dates)
        1    0.000    0.000    0.979    0.979 frame.py:3322(apply)
        2    0.000    0.000    0.000    0.000 common.py:1880(_asarray_tuplesafe)
        1    0.000    0.000    0.000    0.000 format.py:1894(get_console_size)
        1    0.000    0.000    0.000    0.000 {method 'encode' of 'unicode' objects}
        1    0.000    0.000    0.000    0.000 index.py:682(take)
       21    0.000    0.000    0.000    0.000 index.py:205(_coerce_to_ndarray)
       22    0.000    0.000    0.000    0.000 index.py:3410(_ensure_index)
        8    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
        8    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
        9    0.000    0.000    0.000    0.000 common.py:1977(is_datetime64_dtype)
       34    0.000    0.000    0.000    0.000 {any}
       20    0.000    0.000    0.000    0.000 index.py:465(_convert_slice_indexer_getitem)
       10    0.000    0.000    0.000    0.000 common.py:1958(is_integer_dtype)
       21    0.000    0.000    0.000    0.000 internals.py:187(dtype)
        8    0.000    0.000    0.000    0.000 common.py:2041(is_list_like)
       20    0.000    0.000    0.000    0.000 format.py:503(is_numeric_dtype)
       40    0.000    0.000    0.000    0.000 index.py:1691(inferred_type)
        1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 format.py:539(has_index_names)
        8    0.000    0.000    0.000    0.000 index.py:392(values)
        7    0.000    0.000    0.000    0.000 common.py:1997(is_timedelta64_dtype)
        1    0.000    0.000    0.000    0.000 format.py:263(__init__)
       44    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        9    0.000    0.000    0.000    0.000 numeric.py:392(asarray)
        1    0.000    0.000    0.000    0.000 series.py:973(__iter__)
        2    0.000    0.000    0.002    0.001 index.py:690(format)
        1    0.000    0.000    0.000    0.000 generic.py:1579(__setattr__)
        3    0.000    0.000    0.000    0.000 index.py:561(<lambda>)
        1    0.000    0.000    0.000    0.000 common.py:2392(in_ipython_frontend)
        2    0.000    0.000    0.000    0.000 config.py:185(get_default_val)
       20    0.000    0.000    0.000    0.000 series.py:237(_constructor)
        1    0.000    0.000    0.000    0.000 internals.py:90(ref_locs)
        1    0.000    0.000    0.000    0.000 {method 'take' of 'numpy.ndarray' objects}
        1    0.000    0.000    2.060    2.060 <string>:1(<module>)
        3    0.000    0.000    0.000    0.000 format.py:1848(_has_names)
        6    0.000    0.000    0.000    0.000 {method 'insert' of 'list' objects}
        2    0.000    0.000    0.000    0.000 _weakrefset.py:81(add)
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.arange}
        1    0.000    0.000    0.979    0.979 frame.py:1433(dtypes)
        1    0.000    0.000    0.000    0.000 internals.py:128(set_ref_items)
        2    0.000    0.000    0.000    0.000 _weakrefset.py:20(__enter__)
        1    0.000    0.000    0.000    0.000 {pandas.algos.ensure_platform_int}
        1    0.000    0.000    0.000    0.000 {sorted}
        2    0.000    0.000    0.000    0.000 _weakrefset.py:16(__init__)
        2    0.000    0.000    0.000    0.000 _weakrefset.py:26(__exit__)
        3    0.000    0.000    0.000    0.000 index.py:558(_engine)
        1    0.000    0.000    0.000    0.000 internals.py:3493(set_axis)
        1    0.000    0.000    0.000    0.000 internals.py:1972(_set_axis)
        2    0.000    0.000    0.000    0.000 frame.py:545(__len__)
        1    0.000    0.000    2.060    2.060 base.py:25(__str__)
        1    0.000    0.000    0.000    0.000 generic.py:278(_get_axis_number)
        1    0.000    0.000    0.000    0.000 _weakrefset.py:36(__init__)
        2    0.000    0.000    0.000    0.000 _weakrefset.py:52(_commit_removals)
        2    0.000    0.000    0.000    0.000 index.py:582(__iter__)
        1    0.000    0.000    0.000    0.000 numerictypes.py:735(issubdtype)
        1    0.000    0.000    0.000    0.000 index.py:314(_set_names)
        1    0.000    0.000    0.000    0.000 StringIO.py:54(__init__)
        2    0.000    0.000    0.000    0.000 config.py:559(_get_registered_option)
        1    0.000    0.000    0.000    0.000 index.py:571(inferred_type)
        1    0.000    0.000    0.000    0.000 frame.py:432(_info_repr)
       31    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
        1    0.000    0.000    0.000    0.000 frame.py:368(shape)
        1    0.000    0.000    0.000    0.000 index.py:305(nlevels)
        1    0.000    0.000    0.000    0.000 base.py:62(_constructor)
        1    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 common.py:2342(in_interactive_session)
        1    0.000    0.000    0.000    0.000 numerictypes.py:667(issubclass_)
        2    0.000    0.000    0.000    0.000 {method 'remove' of 'set' objects}
        1    0.000    0.000    0.000    0.000 __init__.py:194(u)
        1    0.000    0.000    0.000    0.000 index.py:578(is_all_dates)
        1    0.000    0.000    0.000    0.000 format.py:543(has_column_names)
        4    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
        2    0.000    0.000    0.000    0.000 {method '__subclasshook__' of 'object' objects}
        1    0.000    0.000    0.000    0.000 internals.py:82(_is_single_block)
        1    0.000    0.000    0.000    0.000 interactiveshell.py:506(get_ipython)
        1    0.000    0.000    0.000    0.000 {method '__subclasses__' of 'type' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

@michaelaye
Copy link
Contributor Author

Interestingly, the call to df.info() works fine, it's only the automatic switching from a display of the df object to the df.info() call that doesn't seem to work.

@jreback
Copy link
Contributor

jreback commented Nov 26, 2013

@takluyver

this is exceeding slow

df = DataFrame(randn(10000,10))
%prun x = df.to_html()

takes 5s, with only 10k rows

@jreback
Copy link
Contributor

jreback commented Nov 26, 2013

@takluyver also let's add a vbench in vb_suite/frame_methods.py (put near one there for to_string), but with say 10000 x 10 or so (using mixed dtypes would be even better)

@ghost
Copy link

ghost commented Nov 26, 2013

I swear, every time someone touches the display code....

#5589

I don't think the HTML code itself has had a perf regression, it's been slow for a long time
due to pprint_thing, mostly, which is required to work around the unicode/byte strings mess
that is python2.

OTOH, with pprint_thing, we hardly get any unicode bugs anymore so...

thanks @michaelaye for reporting this so quickly.

@ghost
Copy link

ghost commented Nov 26, 2013

merged #5589

@ghost ghost closed this as completed Nov 26, 2013
@jreback
Copy link
Contributor

jreback commented Nov 26, 2013

yep...thanks @y-p !

@michaelaye
Copy link
Contributor Author

hm, now I get an overly large table. It's coming fast, but it's not, as before, adapting to current window width?
My dataframe is 10718114 rows × 20 columns, but my window is only wide enough for 5 or so (depending on content). I believe that this was before somehow adapting to current notebook width, wasn't it?

@jreback
Copy link
Contributor

jreback commented Nov 26, 2013

do you not see scrollers (on both axes)? can you post a link/pic?

@jreback jreback reopened this Nov 26, 2013
@takluyver
Copy link
Contributor

From what I saw of the code while refactoring it, no, there was never any attempt to detect the width of the browser window, and I don't think there's any good way to do so.

Previously, your DataFrame would have switched to info view because of the number of rows. Now it's displaying a truncated view, but the default for max_columns remains 20. I wonder if the default for that should be lower (10?). I suspect that for any real-world example, 20 columns won't fit on screen. On the other hand, people probably often want to see all columns, even if it doesn't fit on screen neatly.

In the notebook, clicking on the "Out [n]" prompt should put the output into a smaller div with scrollbars, which can make it easier to work with.

@michaelaye
Copy link
Contributor Author

I see. I was just irritated by the different behavior and actually kinda like the solution. And I can influence it with max_columns, so all fine here. Value of default max_columns is maybe debatable. Good to close for me.

@jreback
Copy link
Contributor

jreback commented Nov 26, 2013

gr8...thanks for the report @michaelaye

maybe @takluyver (or @michaelaye ) if you want....to update the new doc section on this change to emphasize that a user may want to change max_columns as well....

@jreback jreback closed this as completed Nov 26, 2013
@ghost ghost reopened this Dec 5, 2013
@ghost
Copy link

ghost commented Dec 5, 2013

reopening: https://github.com/TomAugspurger
Missed the MultiIndex case.

@ghost
Copy link

ghost commented Dec 5, 2013

merged #5649

@ghost ghost closed this as completed Dec 5, 2013
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

3 participants