Memory leak on to_json? #26347

jorgecarleitao · 2019-05-11T18:37:31Z

Code Sample, a copy-pastable example if possible

import resource

import pandas as pd

# some random data, some of them as array columns
path = 'data.parquet'
batches = 5000
df = pd.DataFrame({
    't': [pd.np.array(range(0, 180 * 60, 5))] * batches,
})


# read the data above and convert it to json (e.g. the backend of a restful API)
for i in range(100):
    # comment any of the 2 lines for the leak to vanish.
    print(df['t'].iloc[0].shape, df['t'].iloc[0].dtype)
    df['t'] = df['t'].apply(lambda x: pd.np.array(list(x)))
    df['t'].to_json()
    print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

Problem description

The code above gives the following result (only works on linux/Mac)

i.e. the memory peak is increasing, aka a memory leak.

Expected Output

The values above should not be unbounded.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1075-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.0.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: 0.13.0
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml.etree: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.2.1
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

jreback · 2019-05-11T18:44:51Z

you don’t need 2 issues

patrickeganfoley · 2021-03-11T02:16:52Z

I think the other issue was this one, which has been resolved.

jreback closed this as completed May 11, 2019

gfyoung added the Duplicate Report Duplicate issue or pull request label May 12, 2019

gfyoung added this to the No action milestone May 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Memory leak on to_json? #26347

Memory leak on to_json? #26347

jorgecarleitao commented May 11, 2019

INSTALLED VERSIONS

jreback commented May 11, 2019

Uh oh!

patrickeganfoley commented Mar 11, 2021

Uh oh!

Uh oh!

Memory leak on to_json? #26347

Memory leak on to_json? #26347

Comments

jorgecarleitao commented May 11, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented May 11, 2019

Uh oh!

patrickeganfoley commented Mar 11, 2021

Uh oh!

Output of `pd.show_versions()`