-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[DO NOT MERGE] perf run for rustc-hash candidate #125133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rustbot has assigned @Mark-Simulacrum. Use |
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
r? @thomcc |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[DO NOT MERGE] perf run for rustc-hash candidate See rust-lang/rustc-hash#37.
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (9c2b2da): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (secondary -6.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.4%, secondary -2.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 678.372s -> 677.551s (-0.12%) |
Looks green on cycles, walltime, task clock, cpu clock, ... Instructions are somewhat surprisingly a mixed bag (surprising as it does strictly less work, IIUC). |
Some changes occurred in exhaustiveness checking cc @Nadrieril rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead. cc @rust-lang/rust-analyzer rustdoc-json-types is a public (although nightly-only) API. If possible, consider changing |
@thomcc It is not strictly less work, it adds an extra bit rotation for the pure integer hash case where there wasn't anything before. These results are a bit more of a mixed bag than the results in my previous experiments, I believe because I gated some string optimizations behind a |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[DO NOT MERGE] perf run for rustc-hash candidate See rust-lang/rustc-hash#37.
The job Click to see the possible cause of the failure (guessed by this bot)
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c358310): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -1.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary -3.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 679.642s -> 678.195s (-0.21%) |
it's mostly neutral, but there is a bit of green, that's awesome, congrats! makes sense then to move ahead and replace it in rustc_hash |
I see a small but clear instructions win for |
It's intresting that cache-misses and branch-misses are so green. |
@nnethercote If you'd like I could do a run where I remove the final This PR isn't strictly a performance PR - it's also one that makes rustc-hash more robust against accidental heavy multicollisions if your only differences happens to be in the high bits of the last hashed word. Also, for cycle counts there are 186 regressions but 353 improvements if you include all the results. |
So it seems the argument is that the new hash gives slightly better performance and has better protection against bad cases. On the latter item, from the design I can believe that the new algorithm is less collision-prone, but it would be good to have some empirical evidence. A couple of ideas:
Good results on one or both of those would be enough to convince me to switch algorithms. What do you think? I don't want to be a roadblock, but I also want to be cautious when changing such a critical piece of code. |
@nnethercote Thanks for bringing #91660 to my attention, because this is precisely what this hash mitigates. I quote from the linked PR: // The order here has direct impact on `FxHash` quality because we have far more `DefIndex` per
// crate than we have `Crate`s within one compilation. Or in other words, this arrangement puts
// more entropy in the low bits than the high bits. The reason this matters is that `FxHash`, which
// is used throughout rustc, has problems distributing the entropy from the high bits, so reversing
// the order would lead to a large number of collisions and thus far worse performance.
//
// On 64-bit big-endian systems, this compiles to a 64-bit rotation by 32 bits, which is still
// faster than another `FxHash` round.
#[cfg(target_pointer_width = "64")]
impl Hash for DefId {
fn hash<H: Hasher>(&self, h: &mut H) {
(((self.krate.as_u32() as u64) << 32) | (self.index.as_u32() as u64)).hash(h)
}
} If you don't mind, I'd like to do a simulated benchmark to demonstrate the effectiveness of the new hash in this scenario. Suppose we have 64 crates and 1024 use core::hash::{BuildHasher, BuildHasherDefault};
fn compute_max_collisions(name: &str, hasher: impl BuildHasher) {
let table_mask = (1 << 17) - 1;
let mut table_orig_order = vec![0u64; 1 << 17];
let mut table_swap_order = vec![0u64; 1 << 17];
for krate in 0..64_u32 {
for def_index in 0..1024_u32 {
let h_orig = hasher.hash_one(((def_index as u64) << 32) | krate as u64);
let h_swap = hasher.hash_one(((krate as u64) << 32) | def_index as u64);
table_orig_order[(h_orig & table_mask) as usize] += 1;
table_swap_order[(h_swap & table_mask) as usize] += 1;
}
}
let max_orig = table_orig_order.into_iter().max().unwrap();
let max_swap = table_swap_order.into_iter().max().unwrap();
println!("{name} max orig collisions: {max_orig}");
println!("{name} max swap collisions: {max_swap}");
}
fn main() {
compute_max_collisions("current", BuildHasherDefault::<current::FxHasher>::default());
compute_max_collisions("orlp", BuildHasherDefault::<orlp::FxHasher>::default());
} You can run it yourself here, I just copy/pasted the current and my implementation: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=7ca94ca09b9ab282874e6ae4c08d7c88. So what are the results?
Now, hashbrown is a combination of larger hash table buckets + a tag instead of only a bucket index which mitigates this, but the principle stays the same. |
Ok, that satisfies me, thanks! Perhaps the PR for this repo that incorporates the new FxHash should simplify the |
This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133.
This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133.
…ogaloo, r=<try> Use new faster fxhash. This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133. r? `@ghost` `@bors` try `@rust-timer` queue
Update rustc-hash to version 2 This brings in the new algorithm. see rust-lang/rustc-hash#37 and rust-lang#125133
See rust-lang/rustc-hash#37.