Skip to content

BUG: set src->buffer = NULL after garbage collecting it in buffer_rd_… #12135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 31 additions & 31 deletions doc/source/whatsnew/v0.18.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -201,10 +201,6 @@ In addition, ``.round()``, ``.floor()`` and ``.ceil()`` will be available thru t
s
s.dt.round('D')

.. _whatsnew_0180.api:

- ``pandas.merge()`` and ``DataFrame.merge()`` will show a specific error message when trying to merge with an object that is not of type ``DataFrame`` or a subclass (:issue:`12081`)

.. _whatsnew_0180.api_breaking:

Backwards incompatible API changes
Expand Down Expand Up @@ -319,29 +315,6 @@ other anchored offsets like ``MonthBegin`` and ``YearBegin``.
d = pd.Timestamp('2014-02-15')
d + pd.offsets.QuarterBegin(n=0, startingMonth=2)


Other API Changes
^^^^^^^^^^^^^^^^^

- ``DataFrame.between_time`` and ``Series.between_time`` now only parse a fixed set of time strings. Parsing of date strings is no longer supported and raises a ``ValueError``. (:issue:`11818`)

.. ipython:: python

s = pd.Series(range(10), pd.date_range('2015-01-01', freq='H', periods=10))
s.between_time("7:00am", "9:00am")

This will now raise.

.. code-block:: python

In [2]: s.between_time('20150101 07:00:00','20150101 09:00:00')
ValueError: Cannot convert arg ['20150101 07:00:00'] to a time.

- ``.memory_usage`` now includes values in the index, as does memory_usage in ``.info`` (:issue:`11597`)

- ``DataFrame.to_latex()`` now supports non-ascii encodings (eg utf-8) in Python 2 with the parameter ``encoding`` (:issue:`7061`)


Changes to eval
^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -397,6 +370,32 @@ assignments are valid for multi-line expressions.
g = f / 2.0""", inplace=True)
df


.. _whatsnew_0180.api:

Other API Changes
^^^^^^^^^^^^^^^^^

- ``DataFrame.between_time`` and ``Series.between_time`` now only parse a fixed set of time strings. Parsing of date strings is no longer supported and raises a ``ValueError``. (:issue:`11818`)

.. ipython:: python

s = pd.Series(range(10), pd.date_range('2015-01-01', freq='H', periods=10))
s.between_time("7:00am", "9:00am")

This will now raise.

.. code-block:: python

In [2]: s.between_time('20150101 07:00:00','20150101 09:00:00')
ValueError: Cannot convert arg ['20150101 07:00:00'] to a time.

- ``.memory_usage`` now includes values in the index, as does memory_usage in ``.info`` (:issue:`11597`)

- ``DataFrame.to_latex()`` now supports non-ascii encodings (eg utf-8) in Python 2 with the parameter ``encoding`` (:issue:`7061`)

- ``pandas.merge()`` and ``DataFrame.merge()`` will show a specific error message when trying to merge with an object that is not of type ``DataFrame`` or a subclass (:issue:`12081`)

.. _whatsnew_0180.deprecations:

Deprecations
Expand Down Expand Up @@ -502,7 +501,7 @@ Bug Fixes
- Bug in ``pd.read_clipboard`` and ``pd.to_clipboard`` functions not supporting Unicode; upgrade included ``pyperclip`` to v1.5.15 (:issue:`9263`)
- Bug in ``DataFrame.query`` containing an assignment (:issue:`8664`)

- Bug in ``from_msgpack`` where ``__contains__()`` fails for columns of the unpacked ``DataFrame``, if the ``DataFrame`` has object columns. (:issue: `11880`)
- Bug in ``from_msgpack`` where ``__contains__()`` fails for columns of the unpacked ``DataFrame``, if the ``DataFrame`` has object columns. (:issue:`11880`)


- Bug in timezone info lost when broadcasting scalar datetime to ``DataFrame`` (:issue:`11682`)
Expand All @@ -521,16 +520,15 @@ Bug Fixes
- Bug in ``Index`` prevents copying name of passed ``Index``, when a new name is not provided (:issue:`11193`)
- Bug in ``read_excel`` failing to read any non-empty sheets when empty sheets exist and ``sheetname=None`` (:issue:`11711`)
- Bug in ``read_excel`` failing to raise ``NotImplemented`` error when keywords ``parse_dates`` and ``date_parser`` are provided (:issue:`11544`)
- Bug in ``read_sql`` with pymysql connections failing to return chunked data (:issue:`11522`)
- Bug in ``read_sql`` with ``pymysql`` connections failing to return chunked data (:issue:`11522`)
- Bug in ``.to_csv`` ignoring formatting parameters ``decimal``, ``na_rep``, ``float_format`` for float indexes (:issue:`11553`)
- Bug in ``Int64Index`` and ``Float64Index`` preventing the use of the modulo operator (:issue:`9244`)


- Bug in ``DataFrame`` when masking an empty ``DataFrame`` (:issue:`11859`)


- Bug in ``.plot`` potentially modifying the ``colors`` input when the number
of columns didn't match the number of series provided (:issue:`12039`).
- Bug in ``.plot`` potentially modifying the ``colors`` input when the number of columns didn't match the number of series provided (:issue:`12039`).


- Bug in ``.groupby`` where a ``KeyError`` was not raised for a wrong column if there was only one row in the dataframe (:issue:`11741`)
Expand All @@ -545,3 +543,5 @@ of columns didn't match the number of series provided (:issue:`12039`).
- Big in ``.style`` indexes and multi-indexes not appearing (:issue:`11655`)

- Bug in ``.skew`` and ``.kurt`` due to roundoff error for highly similar values (:issue:`11974`)

- Bug in ``buffer_rd_bytes`` src->buffer could be freed more than once if reading failed, causing a segfault (:issue:`12098`)
38 changes: 38 additions & 0 deletions pandas/io/tests/test_parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -3667,6 +3667,25 @@ def test_buffer_overflow(self):
self.assertIn(
'Buffer overflow caught - possible malformed input file.', str(cperr))

def test_buffer_rd_bytes(self):
# GH 12098
# src->buffer can be freed twice leading to a segfault if a corrupt
# gzip file is read with read_csv and the buffer is filled more than
# once before gzip throws an exception

data = '\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x03\xED\xC3\x41\x09' \
'\x00\x00\x08\x00\xB1\xB7\xB6\xBA\xFE\xA5\xCC\x21\x6C\xB0' \
'\xA6\x4D' + '\x55' * 267 + \
'\x7D\xF7\x00\x91\xE0\x47\x97\x14\x38\x04\x00' \
'\x1f\x8b\x08\x00VT\x97V\x00\x03\xed]\xefO'
for i in range(100):
try:
_ = self.read_csv(StringIO(data),
compression='gzip',
delim_whitespace=True)
except Exception as e:
pass

def test_single_char_leading_whitespace(self):
# GH 9710
data = """\
Expand Down Expand Up @@ -4208,6 +4227,25 @@ def test_buffer_overflow(self):
self.assertIn(
'Buffer overflow caught - possible malformed input file.', str(cperr))

def test_buffer_rd_bytes(self):
# GH 12098
# src->buffer can be freed twice leading to a segfault if a corrupt
# gzip file is read with read_csv and the buffer is filled more than
# once before gzip throws an exception

data = '\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x03\xED\xC3\x41\x09' \
'\x00\x00\x08\x00\xB1\xB7\xB6\xBA\xFE\xA5\xCC\x21\x6C\xB0' \
'\xA6\x4D' + '\x55' * 267 + \
'\x7D\xF7\x00\x91\xE0\x47\x97\x14\x38\x04\x00' \
'\x1f\x8b\x08\x00VT\x97V\x00\x03\xed]\xefO'
for i in range(100):
try:
_ = self.read_csv(StringIO(data),
compression='gzip',
delim_whitespace=True)
except Exception as e:
pass

def test_single_char_leading_whitespace(self):
# GH 9710
data = """\
Expand Down
1 change: 1 addition & 0 deletions pandas/src/parser/io.c
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ void* buffer_rd_bytes(void *source, size_t nbytes,

/* delete old object */
Py_XDECREF(src->buffer);
src->buffer = NULL;
args = Py_BuildValue("(i)", nbytes);

func = PyObject_GetAttrString(src->obj, "read");
Expand Down