Skip to content

Better Message for xlrd Dependencies #28546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue Sep 20, 2019 · 25 comments
Closed

Better Message for xlrd Dependencies #28546

WillAyd opened this issue Sep 20, 2019 · 25 comments
Assignees
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas good first issue IO Excel read_excel, to_excel

Comments

@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019

Right now if you don't have xlrd installed and use read_excel without specifying the engine keyword you get the following message:

>>> pd.read_excel("test.xlsx")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 310, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 819, in __init__
    self._reader = self._engines[engine](self._io)
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 20, in __init__
    import_optional_dependency("xlrd", extra=err_msg)
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/compat/_optional.py", line 93, in import_optional_dependency
    raise ImportError(message.format(name=name, extra=extra)) from None
ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.

This is in spite of the fact that the user may have openpyxl installed and could do something like pd.read_excel(..., engine="openpyxl") to get the same code to work

Two issues need to be addressed here:

  • The default read_excel call with no engine argument should fall back to openpyxl, if installed
  • The default error message should direct the user to install openpyxl first and foremost, as xlrd is unmaintained
@WillAyd WillAyd added Error Reporting Incorrect or improved errors from pandas good first issue IO Excel read_excel, to_excel labels Sep 20, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Sep 20, 2019
@codinggosu
Copy link

Can I work on this?

@WillAyd
Copy link
Member Author

WillAyd commented Sep 20, 2019

Sure!

@punndcoder28
Copy link
Contributor

Can I work on this? I am new to contributing to issues on GitHub and this seems like a good start.

@punndcoder28
Copy link
Contributor

I removed xlrd from versions and changed the default message to only include the pip install message like follows.

VERSIONS = {
"bs4": "4.6.0",
"bottleneck": "1.2.1",
"fastparquet": "0.2.1",
"gcsfs": "0.2.2",
"lxml.etree": "3.8.0",
"matplotlib": "2.2.2",
"numexpr": "2.6.2",
"odfpy": "1.3.0",
"openpyxl": "2.4.8",
"pandas_gbq": "0.8.0",
"pyarrow": "0.9.0",
"pytables": "3.4.2",
"s3fs": "0.0.8",
"scipy": "0.19.0",
"sqlalchemy": "1.1.4",
"tables": "3.4.2",
"xarray": "0.8.2",
"xlwt": "1.2.0",
"xlsxwriter": "0.9.8",
}
message = (
"Use pip or conda to install {name}."
)

Is what I am doing correct?

@codinggosu
Copy link

I changed the default engine to be openpyxl in pandas/io/excel/base to be openpyxl for the purpose of making the "default read_excel call with no engine argument" fall back to openpyxl.

Also I added an error message to pandas/io/excel/_openpyxl for the purpose of directing "the user to install openpyxl first and foremost,"

Am I heading in the right direction? Is this OK?

@WillAyd
Copy link
Member Author

WillAyd commented Sep 30, 2019

I changed the default engine to be openpyxl in pandas/io/excel/base to be openpyxl for the purpose of making the "default read_excel call with no engine argument" fall back to openpyxl

Don't want to make that change just yet - simply warn the user that the default will change in the future and they can explicitly say engine="openpyxl" to suppress or simply wait for the swap to be made in a future release

@codinggosu
Copy link

On it! thanks for the feedback

@tab1tha
Copy link
Contributor

tab1tha commented Dec 1, 2019

take

@SuvigyaJain1
Copy link

I'd like to work on this but this is my first time contributing to pandas as well as first time working with such a huge codebase. What general direction should i work in?

@namedtoaster
Copy link

namedtoaster commented Feb 25, 2020

@tab1tha / @SuvigyaJain1 are you still working on this? @WillAyd is this just a matter of modifying the 'read_excel' function and changing the default engine to openpyxl? Are there any checks that need to be done to see if the module is already installed?

@SuvigyaJain1
Copy link

No I'm no longer working on this feel free to take the issue

@WillAyd
Copy link
Member Author

WillAyd commented Feb 26, 2020

I think this can be handled by #29375 which @cruzzoe was working on but I think stalled; if that’s the case then certainly would welcome you taking over

@namedtoaster
Copy link

@WillAyd I took a look at the PR. I thought this would be a relatively easy fix by just adding a FutureWarning ExcelFile when using xlrd and/or just defaulting to openpyxl but looking at the reviews it seems to be more complicated than I thought.

In my own fork, I just added that warning which works as expected when not specifying the engine. This would be my first contribution so I can use some guidance.

@WillAyd
Copy link
Member Author

WillAyd commented Feb 26, 2020

I don’t think the existing PR is too far off just needs to be pushed over the finish line. If you pull that branch locally, fix the merge conflicts, and add a filter for the FutureWarning that is now getting raised at the module level of test_readers.py and test_xlrd.py I think should get you most of the way there:

https://docs.pytest.org/en/latest/warnings.html#pytest-mark-filterwarnings

@3nrique0
Copy link

3nrique0 commented Apr 1, 2020

So... I do have this error and I have xlrd installed. I tried to update xlrd and force-reinstall with pip and it doesn't work. I get ImportError: Install xlrd >= 1.0.0 for Excel support.
I have a virtualenv with python3.7.5, pandas==1.03 and xlrd==1.2.0.
Any ideas where the error might come from ?

@Rynndalyn
Copy link

Having the exact same problem. Wasn't happening on a different version of jupyter yesterday but now it is and I can't seem to use pip install.

@3nrique0
Copy link

3nrique0 commented Apr 6, 2020

I think that for some reason the version of python called from that command is not the version on use: My system has python 2.7 as standard, and the virtual environment I'm using is 3.7. The errors were calling python2.7. So maybe someone hardcoded something like #! /bin/python somewhere instead of #! /usr/bin/env python ??

@Rynndalyn
Copy link

Yep. That’s what it was. Using pip3 to install and also installing openpyxl worked for me.

@ikeuwanuakwa
Copy link

I have the same problem, I removed xlrd and delete all xlrd folder in C:\Users\anaconda3\Lib\site-packages
and reinstall with conda install -c anaconda xlrd
it worked for me.

@pramirpro12
Copy link

Yep. That’s what it was. Using pip3 to install and also installing openpyxl worked for me.

I am new to python. What did you install using pip3? could you please explain more thoroughly? Thanks.

@3nrique0
Copy link

3nrique0 commented Apr 16, 2020

When you are in a virtual environment of python3, pip is the same as pip3.
Probably the error is how xlrd is being handled during installation. The comments are the outputs of the command

source  path/to/your/env/bin/activate

command -v pip
#  path/to/your/env/bin/pip
command -v pip3
#  path/to/your/env/bin/pip3

ls -l path/to/your/env/bin/pip*
# -rwxrwxr-x 1 user group 242 Feb  7 08:50 /home/user/path/to/your/env/bin/pip*
# -rwxrwxr-x 1 user group 242 Feb  7 08:50 /home/user/path/to/your/env/bin/pip3*
# -rwxrwxr-x 1 user group 242 Feb  7 08:50 /home/user/path/to/your/env/bin/pip3.6*

md5sum /home/user/path/to/your/env/bin/pip*
# e85ad2c43787183884634c694a4f9c15  /home/user/path/to/your/env/bin/pip
# e85ad2c43787183884634c694a4f9c15  /home/user/path/to/your/env/bin/pip3
# e85ad2c43787183884634c694a4f9c15  /home/user/path/to/your/env/bin/pip3.6

@justanotherdataperson
Copy link

I'm still getting the same issue as above. Running an anaconda environment with pandas=1.3.0 and xlrd=1.2.0

Using pd.read_excel returns "ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd."

@dsalaj
Copy link

dsalaj commented Sep 1, 2020

If you are using conda and you installed the requirements but are still getting the same error message:

ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd

It might be the case that you are getting this error only in jupyter/jupyter-lab. If that is the case, the following solved the issue for me:

conda install -c anaconda ipykernel
python -m ipykernel install --user --name=[NAME_OF_YOUR_ENV]

@ahawryluk
Copy link
Contributor

I believe this bug is obsolete now that openpyxl has become the only engine for xlsx files.

@Pritesh9988
Copy link

ImportError: Pandas requires version '1.2.0' or newer of 'xlrd' (version '1.1.0' currently installed).
getting this error while reading xls file even after installing version '1.2.0'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas good first issue IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.