write large dataset without iterating? #311

ear9mrn · 2025-03-06T15:56:50Z

What's your question?

Hi,
I have my data (and records) in list/array. Is there a way to create a new shape file from these without having to iterate over all the points/records (its a bit slow), i.e. a pass list/array direct?

Thanks.

JamesParrott · 2025-03-10T10:52:07Z

Hi there,
Unfortunately, I'm fairly sure this is a no.

Can PyShp skip any unneccessary extra internal iterations, and add a shortcut to the user?
No. There aren't any to skip. After defining fields and shape types, sequential definition of shapes and records, e.g. via iteration, is fundamentally how PyShp must be used. Under the hood, PyShp expects to write to a file-like or stream-like object, which is naturally traversed by iteration (although they may have tell and seek methods).

Can any Shapefile library do this?
Creating an async/non-sequential parallelisable way of creating a shapefile is challenging for any Shapefile library that supports interspersing with null shapes. It's not unreasonable to drop null shapes (certainly not to offer the option to do so). This would allow the record indexes in the .shp file to be calculated a-priori (and use f.seek).

For there to be any speed up in practise however, some way to allow multiple processes to write to the same file would be needed. I don't know about SSDs, but I'm pretty sure that won't work with a traditional spinning HDD with only a single magnetic read/write head (nor with tape). And I'm not sure how that works with hard disk buses and operating system file permissions. Let alone how to manage it from Python. It's an IO-bound task too (not CPU-bound). Ultimately the speed the user sees when writing a shapefile (using anything) is limited by the maximum write speed of their file system. Even if there are storage systems and OSs, that allow multiple concurrent writes to the same file, in GIS applications, it often makes sense to split the shapes and records over multiple shapefiles, according to some geographical or other criteria the user chooses.

I recommend looking into alternative data formats to shapefiles, e.g. geospatial databases, that are designed for backend storage options, that better support multiple concurrent write operations.

ear9mrn added the question label Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

write large dataset without iterating? #311

write large dataset without iterating? #311

ear9mrn commented Mar 6, 2025

JamesParrott commented Mar 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

write large dataset without iterating? #311

write large dataset without iterating? #311

Comments

ear9mrn commented Mar 6, 2025

What's your question?

JamesParrott commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesParrott commented Mar 10, 2025 •

edited

Loading