Description
Not sure this is even supported by gcsfs - I don't see
encoding
as an available option there:https://github.com/dask/gcsfs/blob/523eb65b3e7feb05f9c10ce84523d1058716fecf/gcsfs/core.py#L1133
Might need to start upstream if wanted to make this possible
Hi, @WillAyd
In the current gcsf master GCSFileSystem.open()
has been removed and fsspec.AbstractFileSystem.open()
has works instead:
where applying of passed encoding
for the text reading\writing is now implemented:
if "b" not in mode:
mode = mode.replace("t", "") + "b"
text_kwargs = {
k: kwargs.pop(k)
for k in ["encoding", "errors", "newline"]
if k in kwargs
}
return io.TextIOWrapper(
self.open(path, mode, block_size, **kwargs), **text_kwargs
)
Note also this issue from gcsfs
So, it looks that in pandas
ignoring of encoding
parameter happens because in the pandas.io.gcs.get_filepath_or_buffer
the mode = 'rb' is passed to call of GCSFileSystem.open(filepath_or_buffer, mode)
Tracing back to the moment of the first actual setting the mode
parameter we have stop on this line:
pandas.io.common.py
def get_filepath_or_buffer(
filepath_or_buffer, encoding=None, compression=None, mode=None
)
It is so, because in the call of get_filepath_or_buffer()
performed from here
Line 430 in 29d6b02
we do not pass value of mode
and default mode=None
works.
As I could suggest for read_csv() we need pass mode=r
and for to_csv() we need pass mode=w
in the call of get_filepath_or_buffer()
. But I'm not sure where it's better to implement this change.
Originally posted by @EgorBEremeev in #26124 (comment)