Better rolling reductions #4915

dcherian · 2021-02-16T14:37:49Z

%load_ext memory_profiler

import numpy as np
import xarray as xr

temp = xr.DataArray(np.zeros((5000, 500)), dims=("x", "y"))

roll = temp.rolling(x=10, y=20)

%memit roll.sum()
%memit roll.reduce(np.sum)
%memit roll.reduce(np.nansum)  # master  branch behaviour

peak memory: 245.18 MiB, increment: 81.92 MiB
peak memory: 226.09 MiB, increment: 62.69 MiB
peak memory: 4493.82 MiB, increment: 4330.43 MiB

xref Optimize ndrolling nanreduce #4325
asv benchmarks added
Passes pre-commit run --all-files
User visible changes (including notable bug fixes) are documented in whats-new.rst

dcherian · 2021-02-16T17:50:06Z

xarray/core/rolling.py

@@ -494,6 +527,14 @@ def _numpy_or_bottleneck_reduce(
                bottleneck_move_func, keep_attrs=keep_attrs, **kwargs
            )
        else:
+            if fillna is not None:
+                if fillna is dtypes.INF:
+                    fillna = dtypes.get_pos_infinity(self.obj.dtype, max_for_int=True)


useless since we always pad with NaN which ends up promoting to float. We should add fill_value support to rolling

mathause

Clever, looks good.

xarray/core/dtypes.py

mathause

I had another look and this looks ready (unless you'd like to also do mean)

dcherian · 2021-02-18T16:00:26Z

I got mean to work. var is a little involved so haven't done that yet.

fujiisoup

Very clever implementation @dcherian !

I got mean to work.

Nice ;)

var is a little involved so haven't done that yet.

I think we can just leave the difficult ones with the TODO comment.

dcherian · 2021-02-19T14:24:33Z

Thanks for the reviews @mathause and @fujiisoup

tbloch1 · 2023-04-12T16:27:55Z

Has there been any progress on this for var/std?

dcherian · 2023-04-12T16:34:55Z

We would welcome a PR. Looking at the implementation of mean should help:

xarray/xarray/core/rolling.py

Line 160 in 67ff171

def _mean(self, keep_attrs, **kwargs):

tbloch1 · 2023-04-13T10:47:38Z

I think I may have found a way to make it more memory efficient, but I don't know enough about writing the sort of code that would be needed for a PR.

I basically wrote out the calculation for variance trying to only use the functions that have already been optimsed. Derived from:

$$ var = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2 $$

$$ var = \frac{1}{n} \left( (x_1 - \mu)^2 + (x_2 - \mu)^2 + (x_3 - \mu)^2 + ... \right) $$

$$ var = \frac{1}{n} \left(x_1^2 -2x_1\mu + \mu^2 + \ x_2^2 -2x_2\mu + \mu^2 + \ x_3^2 -2x_3\mu + \mu^2 + ... \right) $$

$$ var = \frac{1}{n} \left( \sum_{i=1}^{n} x_i^2 - 2\mu\sum_{i=1}^{n} x_i + n\mu^2 \right)$$

I coded this up and demonstrate that it uses approximately 10% of the memory as the current .var() implementation:

%load_ext memory_profiler

import numpy as np
import xarray as xr

temp = xr.DataArray(np.random.randint(0, 10, (5000, 500)), dims=("x", "y"))

def new_var(da, x=10, y=20):
    # Defining the re-used parts
    roll = da.rolling(x=x, y=y)
    mean = roll.mean()
    count = roll.count()
    # First term: sum of squared values
    term1 = (da**2).rolling(x=x, y=y).sum()
    # Second term cross term sum
    term2 = -2 * mean * roll.sum()
    # Third term 'sum' of squared means
    term3 = count * mean**2
    # Combining into the variance
    var = (term1 + term2 + term3) / count
    return var

def old_var(da, x=10, y=20):
    roll = da.rolling(x=x, y=y)
    var = roll.var()
    return var

%memit new_var(temp)
%memit old_var(temp)

peak memory: 429.77 MiB, increment: 134.92 MiB
peak memory: 5064.07 MiB, increment: 4768.45 MiB

I wanted to double check that the calculation was working correctly:

print((var_o.where(~np.isnan(var_o), 0) == var_n.where(~np.isnan(var_n), 0)).all().values)
print(np.allclose(var_o, var_n, equal_nan = True))

False
True

I think the difference here is just due to floating point errors, but maybe someone who knows how to check that in more detail could have a look.

The standard deviation can be trivially implemented from this if the approach works.

dcherian · 2023-04-13T15:46:18Z

Can you copy your comment to #4325 please?

Better rolling reductions

ad61a66

dcherian added the topic-performance label Feb 16, 2021

avoid dtype change for int

81b6b2b

dcherian commented Feb 16, 2021

View reviewed changes

mathause reviewed Feb 16, 2021

View reviewed changes

xarray/core/dtypes.py Show resolved Hide resolved

Add rolling memory benchmark

bcf21dd

mathause approved these changes Feb 18, 2021

View reviewed changes

Add mean

dcc5c97

fix attrs

cea60d1

dcherian added the topic-rolling label Feb 18, 2021

fujiisoup approved these changes Feb 18, 2021

View reviewed changes

dcherian merged commit 9a4313b into pydata:master Feb 19, 2021

dcherian deleted the rolling-reductions branch February 19, 2021 19:44

slevang mentioned this pull request Sep 21, 2022

Rolling mean on dask array does not preserve dtype #7062

Closed

4 tasks

dcherian mentioned this pull request Sep 26, 2024

rolling(...).construct(...) blows up chunk size #9550

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Better rolling reductions #4915

Better rolling reductions #4915

Uh oh!

dcherian commented Feb 16, 2021 •

edited

Loading

Uh oh!

dcherian Feb 16, 2021

Uh oh!

mathause left a comment

Uh oh!

Uh oh!

mathause left a comment

Uh oh!

dcherian commented Feb 18, 2021

Uh oh!

fujiisoup left a comment

Uh oh!

dcherian commented Feb 19, 2021

Uh oh!

tbloch1 commented Apr 12, 2023

Uh oh!

dcherian commented Apr 12, 2023

Uh oh!

tbloch1 commented Apr 13, 2023

Uh oh!

dcherian commented Apr 13, 2023

Uh oh!

Uh oh!

Uh oh!

Better rolling reductions #4915

Better rolling reductions #4915

Uh oh!

Conversation

dcherian commented Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian Feb 16, 2021

Choose a reason for hiding this comment

Uh oh!

mathause left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mathause left a comment

Choose a reason for hiding this comment

Uh oh!

dcherian commented Feb 18, 2021

Uh oh!

fujiisoup left a comment

Choose a reason for hiding this comment

Uh oh!

dcherian commented Feb 19, 2021

Uh oh!

tbloch1 commented Apr 12, 2023

Uh oh!

dcherian commented Apr 12, 2023

Uh oh!

tbloch1 commented Apr 13, 2023

Uh oh!

dcherian commented Apr 13, 2023

Uh oh!

Uh oh!

dcherian commented Feb 16, 2021 •

edited

Loading