Skip to content

BUG: Key Error: range exception when printing DataFrame #3869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dmlockhart opened this issue Jun 12, 2013 · 9 comments · Fixed by #4751
Closed

BUG: Key Error: range exception when printing DataFrame #3869

dmlockhart opened this issue Jun 12, 2013 · 9 comments · Fixed by #4751
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@dmlockhart
Copy link

Here's a reprodu

df = pd.DataFrame({ 'A' : ['foo',"~:{range}:0"], 'B' : ['bar','bah'] })
df
             A    B
0          foo  bar
1  ~:{range}:0  bah

df.set_index(['A']).info()
*** KeyError: 'range'

in core/index.py, check if head/tail is not already an instance of a str

    def summary(self, name=None):
        if len(self) > 0:
            head = self[0]
            if hasattr(head,'format'):
                head = head.format()
            tail = self[-1]
            if hasattr(tail,'format'):
                tail = tail.format()

Printing a DataFrame created from two Series objects (previously columns in other DataFrames) results in a "Key Error: 'range'" exception being raised. The DataFrame creation seems to work fine. Printing other DataFrames with the same "problematic" index also works okay.

Test code:

import pstats
import pandas as pd

# Import some cProfile data
run1 = pstats.Stats('run1.prof')
run2 = pstats.Stats('run2.prof')

# Utility function to convert pstats dict into a list of lists
def pstats_to_list( stats ):
  plist = []
  for key, value in stats.strip_dirs().stats.items():
    filename, lineno, func_name = key
    ccalls, ncalls, total_time, cum_time, callers = value
    name = "{}:{}:{}".format( filename, func_name, lineno )
    plist.append( [name, ncalls, total_time, cum_time] )
  return plist

jit_list   = pstats_to_list( run1 )
nojit_list = pstats_to_list( run2 )

# Create DataFrames for the profile run data
columns=['name','ncalls','ttime', 'ctime']
jdf = pd.DataFrame( jit_list,   columns = columns )
ndf = pd.DataFrame( nojit_list, columns = columns )

# Set the 'name' column to be the index (for plotting)
jdf = jdf.set_index( 'name' )
ndf = ndf.set_index( 'name' )

# These DataFrames print fine
print jdf
print ndf

# Extract out the 'ttime' columns
x = ndf['ttime']
y = jdf['ttime']

# Create a new DataFrame using the 'ttime' Series from jdf and ndf
z = pd.DataFrame( {'jit': x, 'nojit': y } )

# Print some data.... this works
print z[0:10]

# Print some data.... this raises "KeyError: 'range'"
print z
@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

can you post/link to these prof files, this is impossible to reprod otherwise

@dmlockhart
Copy link
Author

@jreback
Copy link
Contributor

jreback commented Jul 12, 2013

great

pandas version, numpy version, and platform?

@jreback
Copy link
Contributor

jreback commented Jul 12, 2013

python version as well

@jreback
Copy link
Contributor

jreback commented Jul 12, 2013

@dmlockhart if you look at the top part of the question, I put a reproducible example
the last element in the index has an element that 'looks' like it needs formatting (but not quite).

To workaround for now, just reset_index on z (so your index is a number index), rather than this odd string index

thanks for the report

@jtratner
Copy link
Contributor

Neat bug actually... Probably just need to change pprint thing slightly and/or make sure that we don't build up format strings dynamically unless sure that string is escaped.

Worth it to add something like escape_format something simple like:

def escape_format(strlike):
    return strlike.replace('{', '{{').replace('}', '}} ')

@jreback
Copy link
Contributor

jreback commented Jul 12, 2013

no...just a simple change....something like

    def summary(self, name=None):
        if len(self) > 0:
            head = self[0]
            if hasattr(head,'format') and not isinstance(head, basestring):
                head = head.format()
            tail = self[-1]
            if hasattr(tail,'format') and not isinstance(tail, basestring):
                tail = tail.format()

@dmlockhart
Copy link
Author

@jreback here is my version information:

Python: Python 2.7.3 (default, Mar 26 2013, 21:14:37)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

Pandas: pandas - 0.11.0
Numpy: numpy - 1.7.1

Platform: OSX 10.6.8

@jtratner
Copy link
Contributor

jtratner commented Sep 5, 2013

@dmlockhart this should work now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants