Skip to content

CHANGE: Performance Optimizations for IteratorRandom::choose() and related methods #1266

@wainwrightmark

Description

@wainwrightmark

Summary

IteratorRandom::choose() and related methods call gen_range() much more frequently than strictly necessary when choosing from iterators without size hints.
By reducing the number of these calls it is possible to speed up the relevant benchmarks by about 1.5-2x

Specifically

# Using Pcg32
test seq_iter_unhinted_choose_from_1000      ... bench:       4,049 ns/iter (+/- 377) #old
test seq_iter_unhinted_choose_from_1000      ... bench:       2,730 ns/iter (+/- 335) #new

# Using ChaChaRng
test seq_iter_unhinted_choose_from_1000      ... bench:       6,831 ns/iter (+/- 317) #old
test seq_iter_unhinted_choose_from_1000      ... bench:       3,447 ns/iter (+/- 627) #new

This would not require a breaking change to the API or violate any explicit promises but if this change is made then different items will be randomly returned from the choose() method and the Rng will be in a different state afterwards which may affect some users.

Details

The IteratorRandom::choose() and related methods would have to change.
The way they work now (for iterators without size hints) is that for every item, they generate a random number in the range 0..n where n is the number of items seen so far. If the generated number is zero (i.e. a 1/n chance) the current result changes to the new number.

This means that for the first twenty items, twenty random numbers are generated. My suggestion is to instead generate a number in the range 0..20! (20 factorial is the largest factorial less than u64::MAX) and use this random number instead of the first twenty random numbers.

A number in that range has a 1 in 2 chance of being 0 mod 2, then after division by 2, it has a 1 in 3 chance of being 0 mod 3 and so on up to 20.

After the first twenty numbers you then have to use 33!/20! which only gives you 13 numbers and the efficiency gradually decreases with larger n.
This seems like a lot of work but it's still much cheaper than generating new random numbers, especially with cryptographically secure rngs.

There is no measurable performance change when using choose() with a fixed slice iterator (the code for that part is unchanged).
When using an iterator with a size hint but not a fixed size, performance seems to have degraded very slightly. It would be possible to apply this optimization to that case as well though.

I believe the results of the functions would be consistent between 32 and 64 bit architectures (I am generating u64, not usize). I don't have 32 bit hardware to test on so I can't comment on the performance differences.

The following methods would also probably benefit from similar optimizations:

IteratorRandom::choose_stable()
IteratorRandom::choose_multiple_fill()
IteratorRandom::choose_multiple()
SliceRandom::shuffle()
SliceRandom::partial_shuffle()

Motivation

Performance.

Alternatives

The code could be left as it is and performance hungry users could use my crate which has a fast version of the choose function (I made the crate just to have a random_max function but got carried away optimizing it).

If this is considered worth doing, I'm happy to submit a PR. I have written most of the code already for my own crate.

Have a lovely day, Mark

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions