Skip to content

rustc -O results in larger stack frames than no-opt #39791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yzarubin opened this issue Feb 13, 2017 · 13 comments
Closed

rustc -O results in larger stack frames than no-opt #39791

yzarubin opened this issue Feb 13, 2017 · 13 comments
Labels
E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@yzarubin
Copy link

yzarubin commented Feb 13, 2017

I am rewriting some C++ code in Rust, and ran into an issue where certain code compiled with optimizations, actually results in larger stack frames than code compiled without optimizations leading to poorer recursive performance.

The code in question:

use std::collections::*;
static N: usize = 1800;

fn go(i: usize, a: i64, b: i64, memo: &mut Vec<HashMap<i64, HashMap<i64, i64>>>) -> i64 {
  if i == N { return 0 }

  let ans = go(i + 1, a + 2, b - 1, memo);
  memo[i].entry(a).or_insert(HashMap::new()).insert(b, ans);

  return ans ^ a ^ b;
}

fn main() {
  let mut memo = vec![HashMap::<i64, HashMap<i64, i64>>::new(); N];
  let k = go(0, 0, 0, &mut memo);
  println!("{}", k);
}

On my machine, this code runs fine when compiled with rustc, but will result in SO when compiled with rustc -O. From my testing, optimizations result in a stack frame twice the size of no-opt. I suspect it has something to do with the hashmap usage, but I haven't dug deeper as I wanted to see if this is a known issue first.

Another comment I'd like to make, is that it seems to me, that both the -O and no-opt variants result in much poorer performance than equivalent C++ compiled with clang.

The equivalent C++11 program:

#include <vector>
#include <unordered_map>

using namespace std;

typedef long long ll;
ll N = 100000;

ll go(ll i, ll a, ll b, vector<unordered_map<ll, unordered_map<ll, ll>>> &memo) {
  if (i == N) return 0;

  auto ans = go(i + 1, a + 2, b - 1, memo);
  memo[i][a][b] = ans;

  return ans ^ a ^ b;
}

int main () {
  vector<unordered_map<ll, unordered_map<ll, ll>>> memo(N);
  auto k = go(0, 0, 0, memo);
  printf("%lld\n", k);
  return 0;
}

When compiled on my machine (Darwin 14.5.0) with g++ -std=c++11 -O3 , it works for N up to 100000, which is almost 100x better than Rust.

rustc --version --verbose
rustc 1.15.1 (021bd294c 2017-02-08)
binary: rustc
commit-hash: 021bd294c039bd54aa5c4aa85bcdffb0d24bc892
commit-date: 2017-02-08
host: x86_64-apple-darwin
release: 1.15.1
LLVM version: 3.9

g++ --version
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
@RReverser
Copy link
Contributor

RReverser commented Feb 13, 2017

@yzarubin The second one is just a difference in what builtin HashMap does on top of C++ unordered_map, which was discussed several times before. You might want to see some previous discussions here https://users.rust-lang.org/t/hashmap-performance/6476 and FAQ here https://www.rust-lang.org/en-US/faq.html#why-are-rusts-hashmaps-slow.

@yzarubin
Copy link
Author

yzarubin commented Feb 13, 2017

@RReverser Thanks for the comment. I was aware that Rust's HashMap was slower than unordered_map, but why is there such a difference in stack space? I'd expect that Rust code to work up to N=10000 on my machine similarly to C++, albeit slower. Please correct me if I'm wrong, but I did not see this issue being mentioned in the links you provided.

@RReverser
Copy link
Contributor

@yzarubin Sure, that's why I said "the second one" only :)

@sfackler
Copy link
Member

C++'s hash function for integers is trivial - I think it may be a no-op? SipHash is very much nontrivial and if it's inlined you'd see some amount of increased stack use. If unordered_map isn't header-only, HashMap internals may be more aggressively inlined in Rust as well.

@RReverser
Copy link
Contributor

@sfackler I'm not sure inlining should result in increased stack usage, rather decreased one as at the very least return addresses can be eliminated from stack, and ideally some local variables can be propagated to registers too. So it appears to me that this is a bug.

@yzarubin
Copy link
Author

In addition, changing HashMap to BTreeMap had little to no impact on performance.

@sfackler
Copy link
Member

@RReverser if the functions are not inlined, they are not consuming stack at the point the go function recurses.

@RReverser
Copy link
Contributor

@sfackler D'oh, I didn't actually notice it recurses. Fair point.

@arielb1
Copy link
Contributor

arielb1 commented Apr 26, 2017

Looks like some change in 1.17 made HashMap::resize no longer be inlined, which fixes the issue (reducing space to 200 again). Looks like we should mark it as #[inline(never)] #[cold] like RawVec::double to make sure this keeps being the case.

@Mark-Simulacrum Mark-Simulacrum added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jun 22, 2017
@Mark-Simulacrum
Copy link
Member

I'm marking as E-easy since the initial patch to file a PR should involve adding the two annotations to this method here: https://github.com/rust-lang/rust/blob/master/src/libstd/collections/hash/map.rs#L757.

@Mark-Simulacrum Mark-Simulacrum added the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Jun 22, 2017
@rthomas
Copy link
Contributor

rthomas commented Jul 6, 2017

I'll take this on, if that's alright.

rthomas added a commit to rthomas/rust that referenced this issue Jul 6, 2017
This adds the `inline(never)` and `cold` annotations to the
HashMap::resize function.
@rthomas
Copy link
Contributor

rthomas commented Jul 6, 2017

I've sent a PR for the changes.

bors added a commit that referenced this issue Jul 7, 2017
Add annotations to the resize fn #39791

This adds the `inline(never)` and `cold` annotations to the
HashMap::resize function.
@venkatagiri
Copy link
Contributor

@Mark-Simulacrum This should have been closed by #43093.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

7 participants