diff --git a/posts/inside-rust/2020-02-25-intro-rustc-self-profile.md b/posts/inside-rust/2020-02-25-intro-rustc-self-profile.md new file mode 100644 index 000000000..83234a4cb --- /dev/null +++ b/posts/inside-rust/2020-02-25-intro-rustc-self-profile.md @@ -0,0 +1,251 @@ +--- +layout: post +title: "Intro to rustc's self profiler" +author: Wesley Wiser +description: "Learn how to use the -Zself-profile rustc flag" +team: the self-profile working group +--- + +Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc` because we often hear requests to know where compilation time is being spent. +This is useful when optimizing the compiler, one of the Compiler Team's ongoing efforts to improve compile times, but it's also useful to users who want to refactor their crate so that it will compile faster. +We've been working on a new feature that will help with that and this blog post gives a preview of how it works. +Be warned, though: it is still experimental and we expect the interface to change over time. +The Rust Unstable Book has [documentation for this feature] and we'll keep that up to date so you can always find the latest instructions there. + +In this post, we'll look at the tools currently available and use them to profile `rustc` while it compiles an example crate. + +## Setup + +First, we'll install the tools we're going to use from the `measureme` repository to analyze self-profile trace data. + +```sh +$ cargo install --git https://github.com/rust-lang/measureme crox flamegraph summarize +``` + +Now that we have our tools, let's download an example crate to profile its build. + +```sh +$ cd .. +$ git clone https://github.com/rust-lang/regex.git +$ cd regex +``` + +We'll need to use a recent nightly compiler to get access to unstable `-Z` flags. + +```sh +$ rustup override set nightly +``` + +If you haven't installed a nightly compiler before, this will download the nightly compiler for you. +If you have, then update it to make sure you're on a recent version. + +```sh +$ rustup update nightly +``` + +## Profiling the compiler + +Now we can build it and tell `rustc` to profile the build of the `regex` crate. +This will cause three new files to be created in the working directory which contain the profling data. + +```sh +$ cargo rustc -- -Zself-profile +$ ls +CHANGELOG.md LICENSE-APACHE UNICODE.md regex-17088.string_data regex-syntax target +Cargo.lock LICENSE-MIT bench regex-17088.string_index rustfmt.toml test +Cargo.toml PERFORMANCE.md examples regex-capi scripts tests +HACKING.md README.md regex-17088.events regex-debug src +``` + +The new files follow the format `{crate name}-{rustc process id}.{events,string_data,string_index}`. + +We'll use each of the three main tools to analyze the profiling data: + +### `summarize` + +As its name suggests, this tool summarizes the data found in the trace files. +Additionally, `summarize` can also show a "diff" between two trace files but we won't be using this mode. + +Let's run the tool, passing just the file name (but not the extension) for the trace: + +```sh +$ summarize summarize regex-17088 ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| Item | Self time | % of total time | Time | Item count | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| LLVM_module_codegen_emit_obj | 4.89s | 42.752 | 4.89s | 159 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| codegen_module | 1.25s | 10.967 | 1.37s | 159 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| LLVM_module_optimize_module_passes | 1.15s | 10.022 | 1.15s | 159 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| LLVM_module_codegen_make_bitcode | 786.56ms | 6.875 | 960.73ms | 159 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| typeck_tables_of | 565.18ms | 4.940 | 689.39ms | 848 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| LLVM_module_codegen | 408.01ms | 3.566 | 6.26s | 159 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| mir_borrowck | 224.03ms | 1.958 | 543.33ms | 848 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| LLVM_module_codegen_emit_compressed_bitcode | 174.17ms | 1.522 | 174.17ms | 159 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| optimized_mir | 157.91ms | 1.380 | 205.29ms | 1996 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| evaluate_obligation | 146.50ms | 1.281 | 184.17ms | 8304 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| codegen_crate | 139.48ms | 1.219 | 1.58s | 1 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| mir_built | 123.88ms | 1.083 | 168.01ms | 848 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| metadata_decode_entry | 88.36ms | 0.772 | 117.77ms | 55642 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| incr_comp_copy_cgu_workproducts | 64.21ms | 0.561 | 64.21ms | 1 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| monomorphization_collector_graph_walk | 54.11ms | 0.473 | 344.00ms | 1 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| link_rlib | 43.21ms | 0.378 | 43.21ms | 1 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| check_impl_item_well_formed | 41.36ms | 0.362 | 77.14ms | 736 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| codegen_fulfill_obligation | 40.36ms | 0.353 | 51.56ms | 1759 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| expand_crate | 37.24ms | 0.326 | 48.52ms | 1 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| symbol_name | 36.31ms | 0.317 | 39.06ms | 5513 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +| free_global_ctxt | 34.34ms | 0.300 | 34.34ms | 1 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ +... +Total cpu time: 11.440758871s +``` + +The output is sorted by the self time (time spent in the query or activity but not other queries or activities called by itself). +As you can see, most of the compilation time is spent in LLVM generating the binary code for the executable. + +### `flamegraph` + +As you may have guessed, `flamegraph` will produce a [flame graph] of the profiling data. +To run the tool, we'll pass just the filename without a file extension like we did for `summarize`: + +```sh +$ flamegraph regex-17088 +``` + +This will create a file called `rustc.svg` in the working directory: + +[![Image of flamegraph output][flame graph img]][flame graph img] + +[Click here] to try the interactive svg. + +### `crox` + +This tool processes self-profiling data into the JSON format that the Chromium profiler understands. +You can use it to create a graphical timeline showing exactly when various traced events occurred. + +In this section, we'll cover a few different modes `crox` can run in such as profiling an entire crate compilation including dependencies and filtering out small events. +Let's get started with the basics! + +#### Basic usage + +To run the tool, we'll just pass the filename without a file extension like we've done before: + +```sh +$ crox regex-17088 +``` + +This creates a file called `chrome_profiler.json` in the working directory. +To open it, we'll use the regular Chromium performance tools you might already be familiar with: + +1. Open Chrome +2. Open the Developer Tools console by pressing `Ctrl` + `Shift` + `i` (Windows/Linux) or `Cmd` + `Option` + `i` (macOS) +3. Click the Performance tab at the top of the console. +4. Click the "Load profile" button which looks like an arrow pointing up. +5. Select the `chrome_profiler.json` file we created. + +You should now see something similar to this: + +[![Image of chrome profiler][chrome profiler img1]][chrome profiler img1] + +You can use the scroll wheel on a mouse or the appropriate gesture on a touchpad to zoom in or out of the timeline. + +#### Filtering short events + +If the `chrome_profiler.json` file gets too large, the normal Chromium performance tools have issues opening the file. +One easy way to deal with this is to tell `crox` to remove events shorter than a chosen duration: + +```sh +$ crox --minumum-duration 2 regex-17088 +``` + +Filtering out events less than 2 microseconds shrinks our `chrome_profiler.js` file from 27mb to 11mb. + +#### Capturing event arguments + +The self-profiler can be configured to record event arguments during compilation. +For example, queries will include their query key. +This functionality is turned off by default because it increases the self-profiler overhead. + +To turn this feature on, we'll need to record a new compilation, passing an additional argument to `rustc`: + +```sh +$ cargo clean +$ cargo rustc -- -Zself-profile -Zself-profile-events=default,args +``` + +And then process the new output files: + +```sh +$ crox regex-23649 +``` + +Now in the Chromium profiler, if you click on a node, you can see additional data about many of the events at the bottom of the screen: + +[![Image of Chrome profiler details][chrome profiler img2]][chrome profiler img2] + +Which shows this `optimized_mir` query was processing the `regex::compile::{{impl}}::new` function body. + +#### Profiling an entire crate graph + +By using the `RUSTFLAGS` environment variable, we can profile every `rustc` invocation, not just the final crate's. +`crox` can then combine all of the profiles together into one output file. +Since this will create a lot of files, we'll first create a folder to put all the traces in. + +```sh +$ rm regex-17088.* regex-23649.* # clean up the old trace files since we're done with them +$ mkdir profiles +$ cargo clean +$ RUSTFLAGS="-Zself-profile=./profiles -Zself-profile-events=default,args" cargo build +``` + +This creates quite a few trace files in the working directory. +Now, we'll tell `crox` to combine all of the trace files in the current directory together: + +```sh +$ crox --dir . +``` + +Opening this file shows all of the crates compiled: + +[![Image of Chrome profiler with all crates][chrome profiler img3]][chrome profiler img3] + +Clicking on a crate will expand it to show the threads and event data inside it: + +[![Image of Chrome profiler with a crate expanded][chrome profiler img4]][chrome profiler img4] + +Thanks for reading! + +We've been using these tools extensively ourselves over the last few months and they've helped us tremendously in understanding where the compiler spends its time. +In the future we'll be adding more features and we'll work on making the tooling easier to use. +If you have questions or would like to get involved with the Self-Profile Working Group, please check out the [measureme repository] or stop by our [Zulip stream]. + +[chrome profiler img1]: /images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler1.png +[chrome profiler img2]: /images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler2.png +[chrome profiler img3]: /images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler3.png +[chrome profiler img4]: /images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler4.png +[Click here]: /images/inside-rust/2020-02-25-intro-rustc-self-profile/rustc.svg +[documentation for this feature]: https://doc.rust-lang.org/unstable-book/compiler-flags/self-profile.html +[flame graph]: http://www.brendangregg.com/flamegraphs.html +[flame graph img]: /images/inside-rust/2020-02-25-intro-rustc-self-profile/flamegraph_image.png +[measureme repository]: https://github.com/rust-lang/measureme +[Self-Profile Working Group]: https://rust-lang.github.io/compiler-team/working-groups/self-profile/ +[Zulip stream]: https://rust-lang.zulipchat.com/#narrow/stream/187831-t-compiler.2Fwg-self-profile diff --git a/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler1.png b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler1.png new file mode 100644 index 000000000..4964d8a23 Binary files /dev/null and b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler1.png differ diff --git a/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler2.png b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler2.png new file mode 100644 index 000000000..87ba101b2 Binary files /dev/null and b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler2.png differ diff --git a/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler3.png b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler3.png new file mode 100644 index 000000000..a76f9b7d9 Binary files /dev/null and b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler3.png differ diff --git a/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler4.png b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler4.png new file mode 100644 index 000000000..ec991bac6 Binary files /dev/null and b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/chrome_profiler4.png differ diff --git a/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/flamegraph_image.png b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/flamegraph_image.png new file mode 100644 index 000000000..dbbed7126 Binary files /dev/null and b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/flamegraph_image.png differ diff --git a/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/rustc.svg b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/rustc.svg new file mode 100644 index 000000000..086b7b21e --- /dev/null +++ b/static/images/inside-rust/2020-02-25-intro-rustc-self-profile/rustc.svg @@ -0,0 +1,419 @@ +Flame Graph Reset ZoomSearch mir_borrowck (18,228,971 samples, 0.16%)is_freeze_raw (22,543,989 samples, 0.20%)evaluate_obligation (15,619,755 samples, 0.14%)is_copy_raw (11,460,926 samples, 0.10%)needs_drop_raw (21,529,709 samples, 0.19%)mir_built (160,164,161 samples, 1.40%)mir_const (215,178,469 samples, 1.88%)m..unsafety_check_result (204,635,053 samples, 1.79%)u..mir_validated (235,295,886 samples, 2.06%)m..type_op_prove_predicate (30,501,338 samples, 0.27%)MIR_borrow_checking (524,702,964 samples, 4.59%)MIR_b..mir_borrowck (522,615,259 samples, 4.57%)mir_b..misc_checking_1 (13,523,529 samples, 0.12%)liveness_and_intrinsic_checking (15,996,252 samples, 0.14%)check_mod_liveness (13,935,980 samples, 0.12%)misc_checking_2 (46,889,373 samples, 0.41%)match_checking (30,890,300 samples, 0.27%)check_match (30,051,172 samples, 0.26%)lint_checking (20,451,292 samples, 0.18%)privacy_checking_modules (22,786,254 samples, 0.20%)check_mod_privacy (22,745,555 samples, 0.20%)misc_checking_3 (60,093,514 samples, 0.53%)adt_def (14,686,192 samples, 0.13%)metadata_decode_entry (12,634,663 samples, 0.11%)trait_impls_of (37,417,591 samples, 0.33%)type_of (28,001,003 samples, 0.24%)metadata_decode_entry (22,442,529 samples, 0.20%)coherent_trait (75,794,955 samples, 0.66%)specialization_graph_of (72,652,346 samples, 0.64%)coherence_checking (76,837,763 samples, 0.67%)dropck_outlives (11,808,617 samples, 0.10%)evaluate_obligation (45,218,474 samples, 0.40%)method_autoderef_steps (12,329,102 samples, 0.11%)item_bodies_checking (664,759,057 samples, 5.81%)item_bo..typeck_item_bodies (664,749,797 samples, 5.81%)typeck_..typeck_tables_of (662,584,947 samples, 5.79%)typeck_..evaluate_obligation (20,096,437 samples, 0.18%)item_types_checking (43,967,664 samples, 0.38%)check_mod_item_types (43,893,073 samples, 0.38%)typeck_tables_of (26,688,100 samples, 0.23%)generics_of (17,336,676 samples, 0.15%)predicates_of (16,996,637 samples, 0.15%)predicates_defined_on (11,983,303 samples, 0.10%)type_collecting (50,290,293 samples, 0.44%)collect_mod_item_types (50,209,680 samples, 0.44%)region_scope_tree (16,844,722 samples, 0.15%)check_impl_item_well_formed (77,142,384 samples, 0.67%)check_item_well_formed (47,453,817 samples, 0.41%)analysis (1,616,260,722 samples, 14.13%)analysistype_check_crate (970,555,916 samples, 8.48%)type_check_c..wf_checking (130,771,210 samples, 1.14%)const_caller_location (13,508,496 samples, 0.12%)is_freeze_raw (36,630,109 samples, 0.32%)evaluate_obligation (27,617,770 samples, 0.24%)layout_raw (34,130,142 samples, 0.30%)codegen_module (1,373,123,189 samples, 12.00%)codegen_moduleincr_comp_encode_serialized_dep_graph (14,663,951 samples, 0.13%)incr_comp_persist_dep_graph (33,536,636 samples, 0.29%)incr_comp_encode_dep_graph (26,500,914 samples, 0.23%)incr_comp_serialize_dep_graph (11,570,514 samples, 0.10%)encode_query_results (28,587,528 samples, 0.25%)serialize_dep_graph (66,773,928 samples, 0.58%)incr_comp_persist_result_cache (33,163,498 samples, 0.29%)incr_comp_serialize_result_cache (30,464,627 samples, 0.27%)codegen_crate (1,583,096,052 samples, 13.84%)codegen_crateLLVM_module_codegen_emit_obj (4,891,194,022 samples, 42.75%)LLVM_module_codegen_emit_objLLVM_module_codegen (6,259,936,731 samples, 54.72%)LLVM_module_codegenLLVM_module_codegen_make_bitcode (960,732,475 samples, 8.40%)LLVM_module_..LLVM_module_codegen_emit_compressed_bitcode (174,167,805 samples, 1.52%)codegen_module_optimize (7,422,035,896 samples, 64.87%)codegen_module_optimizeLLVM_module_optimize (1,161,064,635 samples, 10.15%)LLVM_module_opt..LLVM_module_optimize_module_passes (1,146,561,711 samples, 10.02%)LLVM_module_op..macro_expand_crate (48,538,161 samples, 0.42%)expand_crate (48,518,187 samples, 0.42%)configure_and_expand (82,431,245 samples, 0.72%)resolve_crate (25,377,918 samples, 0.22%)build_hir_map (31,467,333 samples, 0.28%)create_global_ctxt (35,287,114 samples, 0.31%)free_global_ctxt (34,335,543 samples, 0.30%)codegen_fulfill_obligation (45,192,642 samples, 0.40%)needs_drop_raw (14,000,886 samples, 0.12%)metadata_decode_entry (17,139,726 samples, 0.15%)optimized_mir (180,770,717 samples, 1.58%)monomorphization_collector_graph_walk (343,999,688 samples, 3.01%)mon..monomorphization_collector (345,253,780 samples, 3.02%)mon..assert_symbols_are_distinct (37,527,866 samples, 0.33%)symbol_name (30,513,088 samples, 0.27%)collect_and_partition_mono_items (400,143,006 samples, 3.50%)col..partition_and_assert_distinct_symbols (53,314,159 samples, 0.47%)cgu_partitioning (15,765,770 samples, 0.14%)exported_symbols (405,898,080 samples, 3.55%)expo..optimized_mir (21,097,420 samples, 0.18%)generate_crate_metadata (447,315,859 samples, 3.91%)gene..hir_lowering (17,579,232 samples, 0.15%)incr_comp_copy_cgu_workproducts (64,209,377 samples, 0.56%)finish_ongoing_codegen (64,223,943 samples, 0.56%)link_binary_remove_temps (26,234,888 samples, 0.23%)link_crate (71,704,288 samples, 0.63%)link_binary (71,700,171 samples, 0.63%)link_rlib (43,213,260 samples, 0.38%)link (149,682,085 samples, 1.31%)parse_crate (27,595,844 samples, 0.24%)all (11,440,758,871 samples, 100%)rustc (11,440,758,871 samples, 100.00%)rustc \ No newline at end of file