Open
Description
I am despairing at the nosiness of our benchmarks. For example, here are two different runs from identical code (I forgot to run pystats, so I re-ran the benchmark, and since yesterday's run one commit was added and then reverted):
The commit merge base is the same too.
How can we derive useful signal from this data?