You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to load some zipped data from my collaborator into pandas, but the zip was created under a version of OS X that adds an extraneous __MACOSX folder in the created zipfile, which causes pandas to error when loading the file.
The zipfile itself can obviously be easily fixed with e.g. zip -d filename.zip __MACOSX/\*, but it may cause a headache for a less experienced user. There is a similary problem involving the hidden file .DS_STORE on macOS (see Wikipedia for more on this), which stores metadata on the user's icon preferences for the contents of the zip...
Describe the solution you'd like
I have made the following one line change locally:
Unfortunately this approach will not work for read_csv(...) as the compression is handled in the C code.
I can make a PR following the submission of this issue.
API breaking implications
Not that I can foresee, the only code path that this effects is changing from a ValueError to a correctly loaded the zip file.
Describe alternatives you've considered
As mentioned, this could be fixed directly in code before invoking pandas, but I think my suggestion is more convenient and doesn't not require any temporary files or extra disk space.
Additional context
Additional context: a lot of scientific data on e.g. figshare suffers from this issue. There may be other hidden files added by other operating systems that could be treated in a similar way.
The text was updated successfully, but these errors were encountered:
ml-evs
changed the title
ENH: Support for loading OS X zip files containing extraneous "__MACOSX" file
ENH: Support for loading pickles from OS X/macOS zip files containing extraneous "__MACOSX" folder and ".DS_STORE" file
Oct 13, 2020
Thanks for the suggestion. While this problem does sound annoying, pandas to make assumption of incoming data contents related to platform (as this would be specific to mac OS). Additionally from #37101 there doesn't seem much appetite from the reviewers for this feature and is probably best suited as a pre-processing step. Closing.
Is your feature request related to a problem?
I am trying to load some zipped data from my collaborator into pandas, but the zip was created under a version of OS X that adds an extraneous
__MACOSX
folder in the created zipfile, which causes pandas to error when loading the file.The zipfile itself can obviously be easily fixed with e.g.
zip -d filename.zip __MACOSX/\*
, but it may cause a headache for a less experienced user. There is a similary problem involving the hidden file.DS_STORE
on macOS (see Wikipedia for more on this), which stores metadata on the user's icon preferences for the contents of the zip...Describe the solution you'd like
I have made the following one line change locally:
pandas/pandas/io/common.py
Line 567 in 9cb3723
becomes
Unfortunately this approach will not work for
read_csv(...)
as the compression is handled in the C code.I can make a PR following the submission of this issue.
API breaking implications
Not that I can foresee, the only code path that this effects is changing from a
ValueError
to a correctly loaded the zip file.Describe alternatives you've considered
As mentioned, this could be fixed directly in code before invoking pandas, but I think my suggestion is more convenient and doesn't not require any temporary files or extra disk space.
Additional context
Additional context: a lot of scientific data on e.g. figshare suffers from this issue. There may be other hidden files added by other operating systems that could be treated in a similar way.
The text was updated successfully, but these errors were encountered: