Skip to content

BUG: KeyError: 0 when running .info() on a DataFrame with integer column names that do not form a consecutive series starting at 0 #37408

Closed
@gsganden

Description

@gsganden
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

pd.DataFrame({1: [0]}).info()

Problem description

Stack trace:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.pyenv/versions/3.7.7/envs/ga/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-21-b154f24c816b> in <module>
----> 1 pd.DataFrame({1: [0]}).info()

~/.pyenv/versions/3.7.7/envs/ga/lib/python3.7/site-packages/pandas/core/frame.py in info(self, verbose, buf, max_cols, memory_usage, null_counts)
   2588     ) -> None:
   2589         return DataFrameInfo(
-> 2590             self, verbose, buf, max_cols, memory_usage, null_counts
   2591         ).info()
   2592

~/.pyenv/versions/3.7.7/envs/ga/lib/python3.7/site-packages/pandas/io/formats/info.py in info(self)
    248                 self._non_verbose_repr(lines, ids)
    249             else:
--> 250                 self._verbose_repr(lines, ids, dtypes, show_counts)
    251
    252         # groupby dtype.name to collect e.g. Categorical columns

~/.pyenv/versions/3.7.7/envs/ga/lib/python3.7/site-packages/pandas/io/formats/info.py in _verbose_repr(self, lines, ids, dtypes, show_counts)
    333
    334         for i, col in enumerate(ids):
--> 335             dtype = dtypes[i]
    336             col = pprint_thing(col)
    337

~/.pyenv/versions/3.7.7/envs/ga/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
    880
    881         elif key_is_scalar:
--> 882             return self._get_value(key)
    883
    884         if is_hashable(key):

~/.pyenv/versions/3.7.7/envs/ga/lib/python3.7/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
    987
    988         # Similar to Index.get_value, but we do not fall back to positional
--> 989         loc = self.index.get_loc(label)
    990         return self.index._get_values_for_loc(self, loc, label)
    991

~/.pyenv/versions/3.7.7/envs/ga/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898
   2899         if tolerance is not None:

KeyError: 0

It seems that this error arises when calling .info() on any DataFrame that has at least one integer column name unless the sorted column names form a sequence of consecutive integers starting at 0.

Expected Output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
1    1 non-null int64
dtypes: int64(1)
memory usage: 136.0 bytes

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : db08276bc116c438d3fdee492026f8223584c477
python           : 3.7.7.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : en_US.utf-8
LANG             : en_US.utf-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.3
numpy            : 1.18.2
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.1
setuptools       : 41.2.0
Cython           : None
pytest           : 5.4.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.1
IPython          : 7.13.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.2.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : 0.48.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions