Skip to content

BUG: read_csv with memory_map=True on BytesIO object fails #45630

Open
@RehanSD

Description

@RehanSD

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from io import BytesIO
df = pd.DataFrame([[1, 2]])
bio = BytesIO()
df.to_csv(bio)
bio.seek(0)
pd.read_csv(bio, memory_map=True)

Issue Description

The read_csv fails, and provides this error:

UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-7-c255f3bb77b8> in <module>
----> 1 pd.read_csv(bio, memory_map=True)

~/.miniconda3/envs/modin/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312
    313         return wrapper

~/.miniconda3/envs/modin/lib/python3.8/site-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    678     kwds.update(kwds_defaults)
    679
--> 680     return _read(filepath_or_buffer, kwds)
    681
    682

~/.miniconda3/envs/modin/lib/python3.8/site-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds)
    573
    574     # Create the parser.
--> 575     parser = TextFileReader(filepath_or_buffer, **kwds)
    576
    577     if chunksize or iterator:

~/.miniconda3/envs/modin/lib/python3.8/site-packages/pandas/io/parsers/readers.py in __init__(self, f, engine, **kwds)
    931
    932         self.handles: IOHandles | None = None
--> 933         self._engine = self._make_engine(f, self.engine)
    934
    935     def close(self):

~/.miniconda3/envs/modin/lib/python3.8/site-packages/pandas/io/parsers/readers.py in _make_engine(self, f, engine)
   1215             # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
   1216             # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217             self.handles = get_handle(  # type: ignore[call-overload]
   1218                 f,
   1219                 mode,

~/.miniconda3/envs/modin/lib/python3.8/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    680
    681     # memory mapping needs to be the first step
--> 682     handle, memory_map, handles = _maybe_memory_map(
    683         handle,
    684         memory_map,

~/.miniconda3/envs/modin/lib/python3.8/site-packages/pandas/io/common.py in _maybe_memory_map(handle, memory_map, encoding, mode, errors, decode)
   1085         wrapped = cast(
   1086             BaseBuffer,
-> 1087             _MMapWrapper(handle, encoding, errors, decode),  # type: ignore[arg-type]
   1088         )
   1089     finally:

~/.miniconda3/envs/modin/lib/python3.8/site-packages/pandas/io/common.py in __init__(self, f, encoding, errors, decode)
    959                 continue
    960             self.attributes[attribute] = getattr(f, attribute)()
--> 961         self.mmap = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    962
    963     def __getattr__(self, name: str):

UnsupportedOperation: fileno

It seems that the error is because fileno is being called on a BytesIO object. This code does work in pandas 1.3.4, which is odd, so I took a look at the sources to see what was different and noticed that in _maybe_memory_map, the except when trying to instant the _MMapWrapper was removed from common.py. (Old common.py for reference).

Expected Behavior

I would expect the read_csv to succeed and the DataFrame to be read.

Installed Versions

INSTALLED VERSIONS

commit : bb1f651
python : 3.8.12.final.0
python-bits : 64
OS : Darwin
OS-release : 21.1.0
Version : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:23 PDT 2021; root:xnu-8019.41.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.0
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.0.4
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : 4.3.1
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.6.4
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.30.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : 2021.11.1
gcsfs : None
matplotlib : 3.2.2
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : 0.16.0
pyarrow : 3.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2021.11.1
scipy : 1.7.3
sqlalchemy : 1.4.27
tables : 3.6.1
tabulate : None
xarray : 0.20.1
xlrd : 2.0.1
xlwt : None
zstandard : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions