Skip to content

read_csv Pass File System (AWS, GCP) as Option. #30359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
phdnzdeveloper opened this issue Dec 19, 2019 · 7 comments
Closed

read_csv Pass File System (AWS, GCP) as Option. #30359

phdnzdeveloper opened this issue Dec 19, 2019 · 7 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label Needs Discussion Requires discussion from core team before further action

Comments

@phdnzdeveloper
Copy link

phdnzdeveloper commented Dec 19, 2019

Code Sample

pd.read_csv('s3://other_owner_bucket_name/data', file_system=s3fs.FileSystem(key='abc', secret='123'))

Or

pd.read_csv('s3://other_owner_bucket_name/data', storage_options={'key':'abc', 'secret':'123'})

Problem description

Hi Team, I am currently using pandas for query a lot of CSV data from different file system such as AWS, GCP. However, with a current option, it just allows default key/secret credential from configuration file [default]. So I would like to ask for an option to pass a pre-created file system to read CSV files from different clouds with different credentials.

Best Regards
Giang

@TomAugspurger
Copy link
Contributor

Seems reasonable. Are you interested in working on this?

The first step would be to write a detailed docstring so that we can understand how things work and interact with the other parameters in read_csv.

Would this also apply to to_csv? What about other readers?

@TomAugspurger TomAugspurger added Enhancement IO Data IO issues that don't fit into a more specific label Needs Discussion Requires discussion from core team before further action labels Dec 20, 2019
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Dec 20, 2019
@jreback
Copy link
Contributor

jreback commented Dec 20, 2019

note that was a prior PR that was stalled to pass credential in (closed now) might be a good starting point -

@suzutomato
Copy link
Contributor

suzutomato commented Jan 10, 2020

Let me try and work on this, unless @phdnzdeveloper or someone else are willing to.
I'll read s3fs and gcsfs docs as well as the prior PR, then start with a doc string.

@phdnzdeveloper
Copy link
Author

phdnzdeveloper commented Jan 20, 2020

Hi Everyone,

Sorry! I have just come back from holiday. I have posted a doc string for explanation.

Regards
Giang

storage_options : dict, optional
A dictionary of parameters to pass on as part of filepath_or_buffer URL including s3://, gcs://, ... which based on different file systems such as s3fs, gcsfs, ...
example: pd.read_csv('s3://bucket_name/key_name.csv', storage_options={'key':'abc', 'secret':'123'}).

@suzutomato
Copy link
Contributor

Thank you! Then I'll release this (by removing my comment "take", hopefully this works as intended)

@martindurant martindurant mentioned this issue Jul 23, 2020
5 tasks
@JMBurley
Copy link
Contributor

Weirdly, s3fs documentation acts like the storage_options are already live in pandas, even though this is only due to go live in 1.2 https://s3fs.readthedocs.io/en/latest/ .

Which isn't a pandas problem, but I do want to mention it here so anyone else who goes on a google hunt gets an answer to why that is the case somewhat faster than I did...

@TomAugspurger
Copy link
Contributor

cc @martindurant about the s3fs docs. Pandas 1.2 probably won't be released till November / December.

This issue was closed by #35381 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

5 participants