Skip to content

Conversation

onsimon
Copy link

@onsimon onsimon commented Oct 7, 2019

Discovered that 0.25.x reintroduces #20437. Proposing the same fix here.

@simonjayhawkins simonjayhawkins added IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version labels Oct 7, 2019
@simonjayhawkins simonjayhawkins added this to the 0.25.2 milestone Oct 7, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK thanks. I think this was intentionally removed here https://github.com/pandas-dev/pandas/pull/26233/files#r279220473 so @tdamsma might have thoughts as well.

In any case is there a way to add a test for this?

@tdamsma
Copy link
Contributor

tdamsma commented Oct 7, 2019

I just tested with 25.1 (on osx), can't reproduce. I took this from #20437:

import pandas

url = "https://raw.github.com/pandas-dev/pandas/master/pandas/tests/io/data/test1.xls"
df = pandas.read_excel(url)
print(df)

I believe there are tests for downloading from a url included in the test suite.

@WillAyd
Copy link
Member

WillAyd commented Oct 7, 2019

Thanks @tdamsma ! I actually was not able to reproduce this on master either. @hellbe can you try there?

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Oh, slight difference in use actually. The reason is that I downloaded first using requests. The following code reproduces:

import pandas
import requests
url = "https://raw.github.com/pandas-dev/pandas/master/pandas/tests/io/data/test1.xls"
r = requests.get(url, stream=True)
df = pandas.read_excel(r.raw)
print(df)

Throws UnsupportedOperation: seek

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

So in my case I can easily fix my code by passing the url directly instead. However, there might be other situations where extra functionality from requests is needed, e.g. for authentication.

https://requests.kennethreitz.org/en/master/user/quickstart/#raw-response-content

So excel_read() is fed an urllib3.response.HTTPResponse object which has a read() but not a a seek() method.

So I think the patch is still valid, but maybe there's a nicer way to mitigate?

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Updated the issue with the above information.

@onsimon onsimon changed the title BUG: Patch for skipping seek() when loading Excel files from url BUG: Patch for skipping seek() when loading Excel files from urllib3.response.HTTPResponse object Oct 8, 2019
@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Updated the PR commit to match above and refer to new issue

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Rebased

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Surely possible to create a test case specifically for reading urllib3's HTTPResponse objects but I'm not sure if it's meaningful tbh

@WillAyd
Copy link
Member

WillAyd commented Oct 8, 2019

@ocefpaf would this fall under the umbrella of what you were trying to accomplish in #21504? Wonder if it would be better to address comprehensively in a common func rather than one-off for excel files

@ocefpaf
Copy link

ocefpaf commented Oct 8, 2019

@ocefpaf would this fall under the umbrella of what you were trying to accomplish in #21504? Wonder if it would be better to address comprehensively in a common func rather than one-off for excel files

Probably. Let me check. I still have the old PR here waiting for me. (I'm at a code sprint at the moment, Might tackle it again this week.)

@ocefpaf
Copy link

ocefpaf commented Oct 9, 2019

@WillAyd I can confirm that #21504 (or the new incarnation that I'll push in a few hours) will "fix this" b/c pandas will use requests and wrap it in a BytesIO object, with seek, if requests is installed.

@ocefpaf ocefpaf mentioned this pull request Oct 9, 2019
4 tasks
@WillAyd
Copy link
Member

WillAyd commented Oct 11, 2019

Thanks for the PR @hellbe but I think this is getting more comprehensively addressed in #28874 by @ocefpaf so let's track this there

@WillAyd WillAyd closed this Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnsupportedOperation 'seek' when loading excel files from urllib3.response.HTTPResponse object
5 participants