diff --git a/src/profiling.md b/src/profiling.md index 8eaa8e94f..e8139dd25 100644 --- a/src/profiling.md +++ b/src/profiling.md @@ -12,5 +12,83 @@ Depending on what you're trying to measure, there are several different approach See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information. - If you want function level performance data or even just more details than the above approaches: - - Consider using a native code profiler such as [perf](profiling/with_perf.html). + - Consider using a native code profiler such as [perf](profiling/with_perf.html) + - or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision, + full-featured graphical interface. + +- If you want a nice visual representation of the compile times of your crate graph, + you can use [cargo's `-Ztimings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings), + eg. `cargo -Ztimings build`. + You can use this flag on the compiler itself with `CARGOFLAGS="-Ztimings" ./x.py build` + +## Optimizing rustc's bootstrap times with `cargo-llvm-lines` + +Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the +number of lines of LLVM IR across all instantiations of a generic function. +Since most of the time compiling rustc is spent in LLVM, the idea is that by +reducing the amount of code passed to LLVM, compiling rustc gets faster. + +Example usage: +``` +cargo install cargo-llvm-lines +# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P + +# Do a clean before every run, to not mix in the results from previous runs. +./x.py clean +RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc + +# Single crate, eg. rustc_middle +cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle-a539a639bdab6513.ll > llvm-lines-middle.txt +# Specify all crates of the compiler. (Relies on the glob support of your shell.) +cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt +``` + +Example output: +``` + Lines Copies Function name + ----- ------ ------------- + 11802479 (100%) 52848 (100%) (TOTAL) + 1663902 (14.1%) 400 (0.8%) rustc_query_system::query::plumbing::get_query_impl::{{closure}} + 683526 (5.8%) 10579 (20.0%) core::ptr::drop_in_place + 568523 (4.8%) 528 (1.0%) rustc_query_system::query::plumbing::get_query_impl + 472715 (4.0%) 1134 (2.1%) hashbrown::raw::RawTable::reserve_rehash + 306782 (2.6%) 1320 (2.5%) rustc_middle::ty::query::plumbing::::start_query::{{closure}}::{{closure}}::{{closure}} + 212800 (1.8%) 514 (1.0%) rustc_query_system::dep_graph::graph::DepGraph::with_task_impl + 194813 (1.7%) 124 (0.2%) rustc_query_system::query::plumbing::force_query_impl + 158488 (1.3%) 1 (0.0%) rustc_middle::ty::query::::alloc_self_profile_query_strings + 119768 (1.0%) 418 (0.8%) core::ops::function::FnOnce::call_once + 119644 (1.0%) 1 (0.0%) rustc_target::spec::load_specific + 104153 (0.9%) 7 (0.0%) rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults:: for rustc_middle::ty::context::TypeckResults>::decode::{{closure}} + 81173 (0.7%) 1 (0.0%) rustc_middle::ty::query::stats::query_stats + 80306 (0.7%) 2029 (3.8%) core::ops::function::FnOnce::call_once{{vtable.shim}} + 78019 (0.7%) 1611 (3.0%) stacker::grow::{{closure}} + 69720 (0.6%) 3286 (6.2%) <&T as core::fmt::Debug>::fmt + 56327 (0.5%) 186 (0.4%) rustc_query_system::query::plumbing::incremental_verify_ich + 49714 (0.4%) 14 (0.0%) rustc_mir::dataflow::framework::graphviz::BlockFormatter::write_node_label +``` + +Since this doesn't seem to work with incremental compilation or `x.py check`, +you will be compiling rustc _a lot_. +I recommend changing a few settings in `config.toml` to make it bearable: +``` +[rust] +# A debug build takes _a third_ as long on my machine, +# but compiling more than stage0 rustc becomes unbearably slow. +optimize = false + +# We can't use incremental anyway, so we disable it for a little speed boost. +incremental = false +# We won't be running it, so no point in compiling debug checks. +debug = false + +# Using a single codegen unit gives less output, but is slower to compile. +codegen-units = 0 # num_cpus +``` + +The llvm-lines output is affected by several options. +`optimize = false` increases it from 2.1GB to 3.5GB and `codegen-units = 0` to 4.1GB. + +MIR optimizations have little impact. Compared to the default `RUSTFLAGS="-Zmir-opt-level=1"`, +level 0 adds 0.3GB and level 2 removes 0.2GB. +Inlining currently only happens in LLVM, but this might change in the future.