Fix file-like objects without seek support not working #81

ulikoehler · 2017-02-22T20:59:50Z

This is the revised version of #80

Currently the Reader constructor checks if hasattr(shp, "seek"). If not, then seek(0) is not called - however, the other functions in the Reader class will call seek() in any case, so it doesn't make sense to avoid calling seek() here.

This PR fixes this by introducing a new argument to Reader.__init__ (allow_copy). If this is set to True, the code will initialize a io.BytesIO with the file-like object's content.

Docs for allow_copy:

    If initializing the reader with a file-like object which
    does not support seek(), you must set allow_copy=True
    to allow the Reader to copy the entire file in memory.
    This is set to False by default in order to avoid
    large files being copyied into memory without user intention.

One usecase where this is useful is to support reading directly from a ZIP file without unzipping, because the file-likes which can be opened using the standard python zipfile library derive from io.BufferedIOBase which raises io.UnsupportedOperation when calling seek().

ulikoehler · 2017-02-22T21:34:35Z

I wrote a blog article detailing how to read from a ZIP file without actually unzipping which depends on this pull request:
https://techoverflow.net/2017/02/22/reading-a-shapefile-directly-from-a-zip-using-pyshp/

Any feedback is welcome.

karimbahgat · 2017-04-14T14:17:54Z

This is a great initiative. In a way, this PR seems to internalize Joel's recipe for reading directly from zipfiles.

However, the use-case is also a relatively rare one, so that the cost of adding a new argument (confusion and complexity for the user) seems to outweighs the benefits.

So my two cents for now is to not merge this. I would however be interested in a PR that automatically supports this functionality only when necessary, e.g. loading with BytesIO if the path is inside a zipfile.

PS: The link to your article seems to be broken.

ulikoehler · 2017-04-14T21:08:30Z

Thanks for your feedback. The link didnt work due to to a server misconfiguration on my side, it has been fixed now.

Maybe I can first tell you about my usecase in order to convince you that the usecase isn't that esoteric ;-)
I dont really deal with GIS data on a day-to-day level but I noticed there is no openly available, customizable source for country contours (including vector graphics).

That is why I decided to develop MapzMaker which allows anyone to create vector or rasterized contour maps with customizable projections, colors and formats.

MapzMaker uses NaturalEarth. Although, as I said, I'm not often dealing with shapefiles, I assume files being distributed just like NE (as a ZIP) is very common.

That being said I'd like to offer you a compromise.

The reason why I added the extra option is to avoid a problem I often encounter with large datasets: Some libraries copy data to memory without asking - but I might not have enough RAM to deal with that. Therefore, my computer starts to thrash. The option I added in the PR tries to avoid that by forcing the user to actively allow the implementation to copy the data (allow_copy=True).

But as your complaint seems to be only the option (which is not required to implement the fix for ZIP files and directly-from-the-internet features), I'll be happy to remove it in order to avoid confusing the users.

Just to make sure its purpose is clear to you: The allow_copy option is only relevant if the following conditions are met simultaneously:

The user reads a dataset which is too large to be copied into RAM and
The file-like object does not support the seek operation
The 2nd condition is true for all types of in-memory files and other wrapper structures. AFAIK there is no technical reason why io.BytesIO etc. cant implement seek. It just has not been done yet.

Copying the data is required in some cases to avoid issues with a file-like object being invalidated (this is the case e.g. for ZIP files).

In all other cases, allow_copy won't have any effect at all.

I hope this compromise and information will allow you to reiterate the decision not to merge that ;-) . If you'd like the option to be removed, please feel free to tell me. I'll implement it as soon as possible.

karimbahgat · 2017-04-27T00:46:13Z

Thanks for your detailed followup. Shapefiles do indeed frequently come inside zipfiles, so it would be useful with out-of-the-box support for reading them, esp reading from the internet via urllib.

Curious to hear what @GeospatialPython thinks, but I think its worth it to drop the option and just default to loading the entire file in memory when there is no seek method (eg zip and urllib file objects). This memory aspect can be mentioned in the readme, i think theres already a section on reading from zipfiles, so great if you update that too.

So if you make a new PR without the option, I think this would be a nice convenience to the user, and I would be happy to merge it.

karimbahgat · 2017-05-09T23:08:45Z

Updated and merged in #96.

ulikoehler added 2 commits February 22, 2017 21:51

Fix file like objects without seek support not being readable

72771fb

Added doc for new allow_copy option

0a38d3c

karimbahgat closed this Apr 14, 2017

karimbahgat reopened this Apr 27, 2017

karimbahgat added the enhancement label Apr 29, 2017

ulikoehler mentioned this pull request May 7, 2017

Fix file like objects without seek support not being readable (without allow_copy) #96

Merged

karimbahgat closed this May 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix file-like objects without seek support not working #81

Fix file-like objects without seek support not working #81

Uh oh!

ulikoehler commented Feb 22, 2017

Uh oh!

ulikoehler commented Feb 22, 2017

Uh oh!

karimbahgat commented Apr 14, 2017

Uh oh!

ulikoehler commented Apr 14, 2017

Uh oh!

karimbahgat commented Apr 27, 2017

Uh oh!

karimbahgat commented May 9, 2017

Uh oh!

Uh oh!

Uh oh!

Fix file-like objects without seek support not working #81

Fix file-like objects without seek support not working #81

Uh oh!

Conversation

ulikoehler commented Feb 22, 2017

Uh oh!

ulikoehler commented Feb 22, 2017

Uh oh!

karimbahgat commented Apr 14, 2017

Uh oh!

ulikoehler commented Apr 14, 2017

Uh oh!

karimbahgat commented Apr 27, 2017

Uh oh!

karimbahgat commented May 9, 2017

Uh oh!

Uh oh!