Skip to content

Update documentation to better reflect how PyArrow interoperability works #634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 14, 2021

Conversation

AndreiBarsan
Copy link
Contributor

@AndreiBarsan AndreiBarsan commented May 12, 2021

In the context of #295, it seems it will be non-trivial to seamlessly support pyarrow interoperability moving forward.

It took me a bit of digging to realize why I couldn't easily interop between pyarrow and fsspec in my code base (I'm using pyarrow 4.0 but fsspec only supports pyarrow < 2.0). To this end, I added a small warning banner to the documentation to highlight this.

fsspec no longer adds pyarrow superclasses to its fs files for pyarrow >= 2.0 since this is not necessary.

Made two other minor cosmetic fixes near my main change in features.rst.

Testing plan: I built the Sphinx docs on my machine and inspected my formatting changes visually, making sure that the new link works.

Let me know what you think!

@martindurant
Copy link
Member

Could you please clarify what doesn't work? We'd rather fix that.

@AndreiBarsan
Copy link
Contributor Author

Oh, I believe it's because they changed the API a lot and (I think) it's no longer in pure Python.

Currently fsspec tries to detect pyarrow and make its filesystem class a base class for AbstractFileSystem, but disables this behavior on L85 of spec.py: https://github.com/intake/filesystem_spec/blob/e734622e2b837625d5c8f27477d6968d837f68b8/fsspec/spec.py#L85

Based on #295 it seems because actually inheriting from pyarrow.fs.FileSystem (the new-school 2.0+ pyarrow FS class) does not work.

@martindurant
Copy link
Member

As far as I understand, it is no longer necessary to inherit from pyarrow's filesystem class, we expect full functionality without it.

@AndreiBarsan
Copy link
Contributor Author

Good point. Upon closer inspection of the pyarrow docs, it seems that this is actually now still supported, just from the pyarrow side. So the user would just need to pass fsspec filesystems to, e.g., pyarrow.dataset.dataset and things should "just work".

I was originally confused because I was porting some custom code which used pyarrow's FileSystem to fsspec and was expecting the interfaces to be mostly compatible. They are in principle, except that pyarrow.fs.FileSystem just uses some different naming schemes.

I will rephrase the docs in the PR to remove the claim that things are not incompatible. Thank you!

@AndreiBarsan AndreiBarsan changed the title Add warning about PyArrow 2.0+ incompatibility to the documentation Update documentation to better reflect how PyArrow interoperability works May 13, 2021
@martindurant martindurant merged commit 3891ff1 into fsspec:master May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants