You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tool for exploring performance by varying JIT behavior (#381)
Initial version of a tool that can run BenchmarkDotNet (BDN) over a set
of benchmarks in a feedback loop. The tool can vary JIT behavior,
observe the impact this modification on jitted code or benchmark perf,
and then plan and try out further variations in pursuit of some goal
(say higher perf, or smaller code, etc).
Requires access to InstructionsRetiredExplorer as a helper tool, for
parsing the ETW that BDN produces. Also requires a local enlistment of
the performance repo. You will need to modify file paths within the
source to adapt all this to your local setup. Must be run with admin
priveleges so that BDN can collect ETW.
The only supported variation right now is modification of which CSEs we
allow the JIT to perform for the hottest Tier-1 method in each
benchmark. If a benchmark does not have a sufficiently hot Tier-1
method, then it is effectively left out of the experiment.
The experiments on each benchmark are prioritized to explore variations
in performance for subsets of currently performed CSEs. For methods with
many CSEs we can realistically afford to only explore a small fraction
of all possibilities. So we try and bias the exploration towards CSEs
that have higher performance impacts.
Results are locally cached so that rerunning the tool will not rerun
experiments.
Experiments are summarized by CSV file with a schema that lists
benchmark name, number of CSEs, code size, perf score, and perf.
You will need to do both release and checked builds of the runtime repo, and create the associated
16
+
test directories (aka Core_Roots).
17
+
18
+
You will need to build the instructions retired explorer.
19
+
20
+
You will need to modify file paths in the performance explorer code to refer to the locations
21
+
of the above repos and builds, an to specify a results directory.
22
+
23
+
Finally, you will likely want to customize the list of benchmarks to explore; the names of these
24
+
are the names used in the performance repo. Note the names often contain quotes or other special
25
+
characters so you will likely need to read up on how to handle these when they appear in C# literal strings.
26
+
27
+
Once you have made these modifications, you can then build the performance explorer.
28
+
29
+
The tool must be run as admin, in order to perform the necessary profiling.
30
+
31
+
### How It Works
32
+
33
+
For each benchmark in the list, performance explorer will:
34
+
* run the benchmark from the perf directory, with `-p ETW` so that profile data is collected
35
+
* parse the profile data using instructions retired explorer to find the hot methods
36
+
* also parse the BenchmarkDotNet json to determine the performance of the benchmark
37
+
* determine if there's a hot method that would be a good candidate for exploration. Currently we look for a Tier-1 method that accounts for at least 20% of the benchmark time.
38
+
* if there is a suitable hot method:
39
+
* run an SPMI collection for that benchmark
40
+
* use that SPMI to get an assembly listing for the hot method
41
+
* determine from that listing how many CSEs were performed (the "default set" of N CSEs)
42
+
* if there were any CSEs, start the experimentation process:
43
+
* run the benchmark with all CSEs disabled (0 CSEs), and measure perf. Add to the exploration queue.
44
+
* then, repeatedly, until we have run out of experiment to try, or hit some predetermined limit
45
+
* pick the best performing experiment from the queue
46
+
* Determine which CSEs in the default set were not done in the experiment. Say there are M (<=N) of these
47
+
* Run M more experiments, each adding one of the missing CSEs
48
+
49
+
Each benchmark's data is stored in a subfolder in the results directory; we also create disassembly for all the
50
+
experiments tried, and copies of all the intermediate files.
51
+
52
+
There is also a master results.csv that has data from all experiments in all benchmarks, suitable for use
53
+
in excel or as input to a machine learning algorithm.
54
+
55
+
If you re-run the tool with the same benchmark list and results directory, it will use the cached copies of
56
+
data and won't re-run the experiments.
57
+
58
+
If along the way anything goes wrong then an "error.txt" file is added to the results subdirectory for
59
+
that benchmark, and future runs will skip that benchmark.
60
+
61
+
So say there are 2 CSEs by default. The explorer will run:
62
+
* one experiment with 0 CSEs
63
+
* two experiments each with 1 CSE
64
+
* one experiment with 2 CSEs
65
+
and then stop as all possibilities have been explored.
66
+
67
+
For larger values of N the number of possible experiments 2^N grows rapidly and we cannot hope to explore
68
+
the full space. The exploration process is intended to prioritize for those experiments that likely have
69
+
the largest impact on performance.
70
+
71
+
### Future Enhancements
72
+
73
+
* add option to offload benchmark runs to the perf lab
74
+
* capture more details about CSEs so we can use the data to develop better CSE heuristics
75
+
* generalize the experiment processing to allow other kinds of experiments
76
+
* parameterize the config settings so we don't need to modify the sources
77
+
* add options to characterize the noise level of benchmarks and (perhaps) do more runs if noisy
78
+
* leverage SPMI instead of perf runs, if we can trust perf scores
0 commit comments