-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Pandas 1.0.1 - .rolling().min() and .rolling().max() create memory leak at <__array_function__ internals>:6 #32266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Workaround to this is to use numpy with the following strides based functions. Apply and lambda from pandas can be used to on top of rolling, but it is very slow. def rolling_window_nan_filled(a_org, window):
a = np.concatenate(( np.full(window-1,np.nan), a_org))
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def numpy_rolling_min(values, periods):
return np.min(rolling_window_nan_filled(values, periods), axis=1)
def numpy_rolling_max(values, periods):
return np.max(rolling_window_nan_filled(values, periods), axis=1)
|
@regmeg can you check if you see the same problem with pandas 0.25, or whether it is new in 1.0? |
Hi @jorisvandenbossche. thanks for your reply. I've just rerun the script with 0.25, the memory does not seem to accumulate, so there is no memory leak. The script ive submitted is a copy paste script, it should be easy to replicate with 1.0.1, you just need to download the dataset un run the script. The leak occurs on both my local linux machine and the docker linux-python based images on instances. |
This is a pretty severe bug in my eyes - so i think this should get higher priority. It's happening with the latest version of Pandas, too (1.0.3 as of the time of writing, + in the current master) if that helps any. Doing some investigation:Running
The last good commit seems to be a46806c, while the one introducing the problem is 6e5d148 . Now i fail to see why it would work for mean, but not for min/max ... but i hope this helps someone with more knowledge in the pandas code to find the problem quickly. |
Additional info - #33693 will fix this issue ... |
fixed by #33693 in 1.0.4 i think |
Confirmed fixed in 1.0.4 |
Uh oh!
There was an error while loading. Please reload this page.
Code Sample, a copy-pastable example if possible
Problem description
Pandas
rolling().min()
androlling().max()
functions create memory leaks. I've run a tracemalloc line based memory profiling and<__array_function__ internals>:6
seems to always grow in size for every loop iteration in the script above with both of these functions present. For 1000 itereations it will consume around 650MB or RAM, whereas for example ifrolling().min()
androlling().max()
is changed torolling().mean()
androlling().median()
an run for 1000 iterations, RAM consumption will stay constant at around 4MB. Thereforerolling().min()
androlling().max()
seem to be the problem.The output of this script running for 100 iterations with
<__array_function__ internals>:6
constantly increasing in size can be found here: https://pastebin.com/nvGKgmPqCSV file
mem_debug_data.csv
used in the script can be found here: http://www.sharecsv.com/s/ad8485d8a0a24a5e12c62957de9b13bd/mem_debug_data.csvExpected Output
Running
rolling().min()
androlling().max()
constantly over time should not grow RAM consumption.Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: