Skip to content

ENH: Support for loading pickles from OS X/macOS zip files containing extraneous "__MACOSX" folder and ".DS_STORE" file #37098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ml-evs opened this issue Oct 13, 2020 · 1 comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@ml-evs
Copy link

ml-evs commented Oct 13, 2020

Is your feature request related to a problem?

I am trying to load some zipped data from my collaborator into pandas, but the zip was created under a version of OS X that adds an extraneous __MACOSX folder in the created zipfile, which causes pandas to error when loading the file.

The zipfile itself can obviously be easily fixed with e.g. zip -d filename.zip __MACOSX/\*, but it may cause a headache for a less experienced user. There is a similary problem involving the hidden file .DS_STORE on macOS (see Wikipedia for more on this), which stores metadata on the user's icon preferences for the contents of the zip...

Describe the solution you'd like

I have made the following one line change locally:

zip_names = zf.namelist()

becomes

zip_names = [_ for _ in zf.namelist if not (_.startswith("__MACOSX/") or _.startswith(".DS_STORE"))]

Unfortunately this approach will not work for read_csv(...) as the compression is handled in the C code.

I can make a PR following the submission of this issue.

API breaking implications

Not that I can foresee, the only code path that this effects is changing from a ValueError to a correctly loaded the zip file.

Describe alternatives you've considered

As mentioned, this could be fixed directly in code before invoking pandas, but I think my suggestion is more convenient and doesn't not require any temporary files or extra disk space.

Additional context

Additional context: a lot of scientific data on e.g. figshare suffers from this issue. There may be other hidden files added by other operating systems that could be treated in a similar way.

@ml-evs ml-evs added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 13, 2020
@ml-evs ml-evs changed the title ENH: Support for loading OS X zip files containing extraneous "__MACOSX" file ENH: Support for loading pickles from OS X/macOS zip files containing extraneous "__MACOSX" folder and ".DS_STORE" file Oct 13, 2020
@mroeschke
Copy link
Member

Thanks for the suggestion. While this problem does sound annoying, pandas to make assumption of incoming data contents related to platform (as this would be specific to mac OS). Additionally from #37101 there doesn't seem much appetite from the reviewers for this feature and is probably best suited as a pre-processing step. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants