Skip to content

Conversation

dcherian
Copy link
Collaborator

@dcherian dcherian commented Nov 19, 2021

I found that ravel_multi_index was taking a lot of time with reductions I tend to run; nD array, 1D group_idx, axis=-1 .

This is an alternate algorithm from https://stackoverflow.com/questions/46256279/bin-elements-per-row-vectorized-2d-bincount-for-numpy

I timed it with this script:

import timeit

import numpy as np
import numpy_groupies as npg


def time_call(method):
    import numpy_groupies as npg

    group_idx = np.repeat([1, 2, 3, 4], repeats=3)
    times = []
    for exp in np.arange(6):
        a = np.ones(
            (
                10 ** exp,
                100,
                12,
            ),
            dtype=np.int32,
        )
        time = timeit.timeit(
            f"npg.utils_numpy.input_validation(group_idx, a, axis=-1, func='sum', method={method!r})",
            number=10,
            globals=locals(),
        )

        times.append(time)

    np.testing.assert_array_equal(
        npg.utils_numpy.input_validation(group_idx, a, axis=-1, func="sum", method="ravel")[0],
        npg.utils_numpy.input_validation(group_idx, a, axis=-1, func="sum", method="offset")[0],
    )
    return times


ravel = time_call("ravel")
offset = time_call("offset")

import matplotlib.pyplot as plt

numel = 12 * 100 * 10**np.arange(len(ravel))
plt.plot(numel, ravel)
plt.plot(numel, offset)
plt.legend(["current npg", "proposed"])
plt.yscale("log")
plt.xscale("log")
plt.grid(True)

It's an ≈ 2x speedup for decent sized arrays
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants