Skip to content

Optimize sum of Durations by using custom function #51598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 27, 2018

Conversation

Pazzaz
Copy link
Contributor

@Pazzaz Pazzaz commented Jun 16, 2018

The current impl Sum for Duration uses fold to perform several adds (or really checked_adds) of durations. In doing so, it has to guarantee the number of nanoseconds is valid after every addition. If you squeese the current implementation into a single function it looks kind of like this:

fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
    let mut sum = Duration::new(0, 0);
    for rhs in iter {
        if let Some(mut secs) = sum.secs.checked_add(rhs.secs) {
            let mut nanos = sum.nanos + rhs.nanos;
            if nanos >= NANOS_PER_SEC {
                nanos -= NANOS_PER_SEC;
                if let Some(new_secs) = secs.checked_add(1) {
                    secs = new_secs;
                } else {
                    panic!("overflow when adding durations");
                }
            }
            sum = Duration { secs, nanos }
        } else {
            panic!("overflow when adding durations");
        }
    }
    sum
}

We only need to check if nanos is in the correct range when giving our final answer so we can have a more optimized version like so:

fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
    let mut total_secs: u64 = 0;
    let mut total_nanos: u64 = 0;

    for entry in iter {
        total_secs = total_secs
            .checked_add(entry.secs)
            .expect("overflow in iter::sum over durations");
        total_nanos = match total_nanos.checked_add(entry.nanos as u64) {
            Some(n) => n,
            None => {
                total_secs = total_secs
                    .checked_add(total_nanos / NANOS_PER_SEC as u64)
                    .expect("overflow in iter::sum over durations");
                (total_nanos % NANOS_PER_SEC as u64) + entry.nanos as u64
            }
        };
    }
    total_secs = total_secs
        .checked_add(total_nanos / NANOS_PER_SEC as u64)
        .expect("overflow in iter::sum over durations");
    total_nanos = total_nanos % NANOS_PER_SEC as u64;
    Duration {
        secs: total_secs,
        nanos: total_nanos as u32,
    }
}

We now only convert total_nanos to total_secs (1) if total_nanos overflows and (2) at the end of the function when we have to output a valid Duration. This gave a 5-22% performance improvement when I benchmarked it, depending on how big the nano value of the Durations in iter were.

@pietroalbini
Copy link
Member

r? @sfackler (someone from the libs team)

@pietroalbini pietroalbini added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 16, 2018
@pietroalbini
Copy link
Member

Ping from triage @sfackler! This PR needs your review.

@sfackler
Copy link
Member

@bors r+

Sorry for the delay!

@bors
Copy link
Collaborator

bors commented Jun 27, 2018

📌 Commit d22ad76 has been approved by sfackler

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 27, 2018
@bors
Copy link
Collaborator

bors commented Jun 27, 2018

⌛ Testing commit d22ad76 with merge 612c280...

bors added a commit that referenced this pull request Jun 27, 2018
Optimize sum of Durations by using custom function

The current `impl Sum for Duration` uses `fold` to perform several `add`s (or really `checked_add`s) of durations. In doing so, it has to guarantee the number of nanoseconds is valid after every addition. If you squeese the current implementation into a single function it looks kind of like this:
````rust
fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
    let mut sum = Duration::new(0, 0);
    for rhs in iter {
        if let Some(mut secs) = sum.secs.checked_add(rhs.secs) {
            let mut nanos = sum.nanos + rhs.nanos;
            if nanos >= NANOS_PER_SEC {
                nanos -= NANOS_PER_SEC;
                if let Some(new_secs) = secs.checked_add(1) {
                    secs = new_secs;
                } else {
                    panic!("overflow when adding durations");
                }
            }
            sum = Duration { secs, nanos }
        } else {
            panic!("overflow when adding durations");
        }
    }
    sum
}
````
We only need to check if `nanos` is in the correct range when giving our final answer so we can have a more optimized version like so:
````rust
fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
    let mut total_secs: u64 = 0;
    let mut total_nanos: u64 = 0;

    for entry in iter {
        total_secs = total_secs
            .checked_add(entry.secs)
            .expect("overflow in iter::sum over durations");
        total_nanos = match total_nanos.checked_add(entry.nanos as u64) {
            Some(n) => n,
            None => {
                total_secs = total_secs
                    .checked_add(total_nanos / NANOS_PER_SEC as u64)
                    .expect("overflow in iter::sum over durations");
                (total_nanos % NANOS_PER_SEC as u64) + entry.nanos as u64
            }
        };
    }
    total_secs = total_secs
        .checked_add(total_nanos / NANOS_PER_SEC as u64)
        .expect("overflow in iter::sum over durations");
    total_nanos = total_nanos % NANOS_PER_SEC as u64;
    Duration {
        secs: total_secs,
        nanos: total_nanos as u32,
    }
}
````
We now only convert `total_nanos` to `total_secs` (1) if `total_nanos` overflows and (2) at the end of the function when we have to output a valid `Duration`. This gave a 5-22% performance improvement when I benchmarked it, depending on how big the `nano` value of the `Duration`s in `iter` were.
@bors
Copy link
Collaborator

bors commented Jun 27, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing 612c280 to master...

@bors bors merged commit d22ad76 into rust-lang:master Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants