-
-
Notifications
You must be signed in to change notification settings - Fork 259
Fix file-like objects without seek support not working #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wrote a blog article detailing how to read from a ZIP file without actually unzipping which depends on this pull request: Any feedback is welcome. |
This is a great initiative. In a way, this PR seems to internalize Joel's recipe for reading directly from zipfiles. However, the use-case is also a relatively rare one, so that the cost of adding a new argument (confusion and complexity for the user) seems to outweighs the benefits. So my two cents for now is to not merge this. I would however be interested in a PR that automatically supports this functionality only when necessary, e.g. loading with BytesIO if the path is inside a zipfile. PS: The link to your article seems to be broken. |
Thanks for your feedback. The link didnt work due to to a server misconfiguration on my side, it has been fixed now. Maybe I can first tell you about my usecase in order to convince you that the usecase isn't that esoteric ;-) That is why I decided to develop MapzMaker which allows anyone to create vector or rasterized contour maps with customizable projections, colors and formats. MapzMaker uses NaturalEarth. Although, as I said, I'm not often dealing with shapefiles, I assume files being distributed just like NE (as a ZIP) is very common. That being said I'd like to offer you a compromise. The reason why I added the extra option is to avoid a problem I often encounter with large datasets: Some libraries copy data to memory without asking - but I might not have enough RAM to deal with that. Therefore, my computer starts to thrash. The option I added in the PR tries to avoid that by forcing the user to actively allow the implementation to copy the data ( But as your complaint seems to be only the option (which is not required to implement the fix for ZIP files and directly-from-the-internet features), I'll be happy to remove it in order to avoid confusing the users. Just to make sure its purpose is clear to you: The
Copying the data is required in some cases to avoid issues with a file-like object being invalidated (this is the case e.g. for ZIP files). In all other cases, I hope this compromise and information will allow you to reiterate the decision not to merge that ;-) . If you'd like the option to be removed, please feel free to tell me. I'll implement it as soon as possible. |
Thanks for your detailed followup. Shapefiles do indeed frequently come inside zipfiles, so it would be useful with out-of-the-box support for reading them, esp reading from the internet via urllib. Curious to hear what @GeospatialPython thinks, but I think its worth it to drop the option and just default to loading the entire file in memory when there is no seek method (eg zip and urllib file objects). This memory aspect can be mentioned in the readme, i think theres already a section on reading from zipfiles, so great if you update that too. So if you make a new PR without the option, I think this would be a nice convenience to the user, and I would be happy to merge it. |
Updated and merged in #96. |
This is the revised version of #80
Currently the
Reader
constructor checks ifhasattr(shp, "seek")
. If not, thenseek(0)
is not called - however, the other functions in theReader
class will callseek()
in any case, so it doesn't make sense to avoid callingseek()
here.This PR fixes this by introducing a new argument to
Reader.__init__
(allow_copy
). If this is set toTrue
, the code will initialize aio.BytesIO
with the file-like object's content.Docs for
allow_copy
:One usecase where this is useful is to support reading directly from a ZIP file without unzipping, because the file-likes which can be opened using the standard python
zipfile
library derive fromio.BufferedIOBase
which raisesio.UnsupportedOperation
when callingseek()
.