Skip to content

BUG: pandas.read_excel() - AttributeError: 'ElementTree' object has no attribute 'getiterator' #37831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
timkofu opened this issue Nov 14, 2020 · 2 comments
Closed
2 of 3 tasks
Labels
Bug Duplicate Report Duplicate issue or pull request Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@timkofu
Copy link

timkofu commented Nov 14, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

pandas.read_excel("data.xlsx")

Problem description

Running the above code raises the exception below:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-21-524fa4fa6612> in <module>
      1 path_prefix: str = "data/raw/"
----> 2 df = pandas.read_excel(f"{path_prefix}data.xlsx")

..lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    294                 )
    295                 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296             return func(*args, **kwargs)
    297 
    298         return wrapper

..lib/python3.9/site-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(

..lib/python3.9/site-packages/pandas/io/excel/_base.py in __init__(self, path_or_buffer, engine)
    865         self._io = stringify_path(path_or_buffer)
    866 
--> 867         self._reader = self._engines[engine](self._io)
    868 
    869     def __fspath__(self):

..lib/python3.9/site-packages/pandas/io/excel/_xlrd.py in __init__(self, filepath_or_buffer)
     20         err_msg = "Install xlrd >= 1.0.0 for Excel support"
     21         import_optional_dependency("xlrd", extra=err_msg)
---> 22         super().__init__(filepath_or_buffer)
     23 
     24     @property

..lib/python3.9/site-packages/pandas/io/excel/_base.py in __init__(self, filepath_or_buffer)
    351             self.book = self.load_workbook(filepath_or_buffer)
    352         elif isinstance(filepath_or_buffer, str):
--> 353             self.book = self.load_workbook(filepath_or_buffer)
    354         elif isinstance(filepath_or_buffer, bytes):
    355             self.book = self.load_workbook(BytesIO(filepath_or_buffer))

..lib/python3.9/site-packages/pandas/io/excel/_xlrd.py in load_workbook(self, filepath_or_buffer)
     35             return open_workbook(file_contents=data)
     36         else:
---> 37             return open_workbook(filepath_or_buffer)
     38 
     39     @property

..lib/python3.9/site-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    128         if 'xl/workbook.xml' in component_names:
    129             from . import xlsx
--> 130             bk = xlsx.open_workbook_2007_xml(
    131                 zf,
    132                 component_names,

..lib/python3.9/site-packages/xlrd/xlsx.py in open_workbook_2007_xml(zf, component_names, logfile, verbosity, use_mmap, formatting_info, on_demand, ragged_rows)
    810     del zflo
    811     zflo = zf.open(component_names['xl/workbook.xml'])
--> 812     x12book.process_stream(zflo, 'Workbook')
    813     del zflo
    814     props_name = 'docprops/core.xml'

..lib/python3.9/site-packages/xlrd/xlsx.py in process_stream(self, stream, heading)
    264         self.tree = ET.parse(stream)
    265         getmethod = self.tag2meth.get
--> 266         for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
    267             if self.verbosity >= 3:
    268                 self.dump_elem(elem)

AttributeError: 'ElementTree' object has no attribute 'getiterator'

Expected Output

pandas.DataFrame

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d42
python : 3.9.0.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.4
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.2.3
setuptools : 49.2.1
Cython : None
pytest : 6.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@timkofu timkofu added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 14, 2020
@timkofu
Copy link
Author

timkofu commented Nov 14, 2020

I made the function below using the openpyxl library as a workaround. It's working well in my use-case, but I'm not sure it generalizes:

from openpyxl import load_workbook

def excel2df(file_path: str) -> pandas.DataFrame:
    
    # Load worksheet
    work_sheet = load_workbook(file_path).active  # First worksheet
    dataframe = pandas.DataFrame(work_sheet.values)
    
    # Set first row as headers
    headers = dataframe.iloc[0]
    dataframe = dataframe[1:]
    dataframe.columns = headers
    
    # Remove empty columns
    dataframe = dataframe.dropna(axis=1)
    
    # Reset the index
    dataframe = dataframe.reset_index(drop=True)
    
    return dataframe

@jreback
Copy link
Contributor

jreback commented Nov 14, 2020

duplicate of #37795

@jreback jreback added the Duplicate Report Duplicate issue or pull request label Nov 14, 2020
@jreback jreback added this to the No action milestone Nov 14, 2020
@jreback jreback closed this as completed Nov 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants