Skip to content

FTPFileSystem incompatibility with pandas/pyarrow #705

Open
@matteosantama

Description

@matteosantama

I've run across an issue, and to be honest I'm not quite sure which library it belongs in.

It starts with the line

df.to_parquet(
    path=path,
    partition_cols=partition_cols,
    storage_options=fs.storage_options
)

which takes me through the pandas library until reaching this function in pyarrow

https://github.com/apache/arrow/blob/e2238582e2a2bf20a68a967145fe1a7b2337a997/python/pyarrow/parquet.py#L1891

which internally calls the helper function _mkdir_if_not_exists()

https://github.com/apache/arrow/blob/e2238582e2a2bf20a68a967145fe1a7b2337a997/python/pyarrow/parquet.py#L1883

The problem is that for FTPFileSystem, this helper function does not actually create the directory because of this line

https://github.com/intake/filesystem_spec/blob/85bb2f3fef2aa12f7ec8497ea116c78c644b49ec/fsspec/spec.py#L1219

The simplest solution would be to override this _is_filestore() method in FTPFileSystem to return True.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions