Change the 'significance' filter on the comparison page to a 'relevance' filter #1277

rylev · 2022-04-05T09:32:10Z

Since we really care about helping the user focus on "relevant" changes (and not just significance which is only a part of relevance), we should provide a filter for relevance instead of a filter for significance. Since relevance also includes magnitude, we don't need to have a filter for small results either.

Additionally, the filters usually follow a pattern of showing more data when activated. This change also makes it so that the filter is for "showing non-relevant data" which is off by default.

As a next step, I'd love to have a detail view for each test case comparison which shows the details of how we arrive at the conclusion that a result is relevant perhaps by showing things like historical data and where this change fits in the historical trend.

Kobzol · 2022-04-05T09:50:08Z

Nice change, I was quite confused by the previous two checkboxes. And I also agree with the second part, it would be nice to (optionally) show the user the full data that was used to compute the result desription instead of saying "this result is quite significant, although only partly relevant, but it is definitely dodgy with a medium magnitude".

…e filter

rylev · 2022-04-05T12:02:52Z

The CI failure is due to the fact that hyper no longer builds. This has already been reported here.

nnethercote · 2022-04-06T02:07:33Z

This is a nice simplification. One after-the-fact nitpick: in some places the code has a notion of high, medium, and low relevance. But here relevance is just a binary yes/no thing. Is this difference meaningful? Can it be removed? E.g. is "non-relevant" actually equal to "low relevance"?

rylev · 2022-04-06T10:22:02Z

@nnethercote I think there's still confusion caused by the fact that we talk about "relevancy" in two places:

The relevancy of a single test result comparison (i.e., whether the two tests of serde-1.0.139 doc full for the two artifacts being compared produces a relevant result). This is a bool - the test result comparison is either relevant or not. It's calculated here
The relevancy of an entire artifact comparison (which is composed of many individual test result comparisons discussed in the first bullet point). This is an enum of high, medium, and low relevancy. It is calculated here.

How we ended up here is long story. We are now much better at calculating the relevancy of individual result comparisons now. The reason we have a relevancy for the entire artifact comparison is because historically we've struggled to answer the question "should the user pay attention to the results of an artifact comparison?".

We could adopt the strategy of saying as long as an artifact comparison has 1 or more test result comparisons that are deemed relevant, the entire artifact comparison is relevant. However, in practice this means there are a lot artifact comparisons with only a few relatively small test result comparisons.

For example, this artifact comparison has 2 relevant results but they're rather small and they're for secondary benchmarks. Should we signal to the user that they don't need to look at this comparison or should we always encourage users to look at comparisons even if they're this small?

Kobzol · 2022-04-06T10:27:48Z

This is not strictly relevant to the current discussion, but maybe we could alter the relevancy calculation logic for the whole artifact comparison to take benchmark category into account? For example in the case that you have showed, we could make comparison relevancy smaller, since only secondary benchmarks are relevant on their own account.

Then we could design some rule like "the whole comparison is relevant if there is at least a single primary benchmark relevant result", which might be slightly better than just taking all benchmark results into account.

nnethercote · 2022-04-06T11:06:52Z

Some principles that seem good to me:

Don't use the same word in subtly different ways.
If in doubt, prefer simplicity.

We could satisfy the first principle by more clearly distinguishing the two kinds of relevance. E.g. perhaps "test relevance" vs "artifact relevance", or even a different word from "relevance" for one of the concepts.

We could satisfy both principles by going with your suggestion "as long as an artifact comparison has 1 or more test result comparisons that are deemed relevant, the entire artifact comparison is relevant".

Of these two I would lean towards the latter. I would hope that people would look at the full results more often than not -- it's a single click to get there, and the summary is never going to paint the full picture. So pushing people more in that direction seems fine to me.

rylev requested a review from Kobzol April 5, 2022 09:32

Kobzol approved these changes Apr 5, 2022

View reviewed changes

Change the 'significance' filter on the comparison page to a relevanc…

a13d12c

…e filter

rylev force-pushed the relevance-filter branch from 5d0d640 to a13d12c Compare April 5, 2022 11:33

rylev merged commit 04fccd8 into rust-lang:master Apr 5, 2022

rylev deleted the relevance-filter branch April 5, 2022 12:02

rylev mentioned this pull request Apr 6, 2022

Simplify artifact comparison relevance #1281

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change the 'significance' filter on the comparison page to a 'relevance' filter #1277

Change the 'significance' filter on the comparison page to a 'relevance' filter #1277

Uh oh!

rylev commented Apr 5, 2022

Uh oh!

Kobzol commented Apr 5, 2022

Uh oh!

rylev commented Apr 5, 2022

Uh oh!

nnethercote commented Apr 6, 2022

Uh oh!

rylev commented Apr 6, 2022 •

edited

Loading

Uh oh!

Kobzol commented Apr 6, 2022

Uh oh!

nnethercote commented Apr 6, 2022

Uh oh!

Uh oh!

Change the 'significance' filter on the comparison page to a 'relevance' filter #1277

Change the 'significance' filter on the comparison page to a 'relevance' filter #1277

Uh oh!

Conversation

rylev commented Apr 5, 2022

Uh oh!

Kobzol commented Apr 5, 2022

Uh oh!

rylev commented Apr 5, 2022

Uh oh!

nnethercote commented Apr 6, 2022

Uh oh!

rylev commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kobzol commented Apr 6, 2022

Uh oh!

nnethercote commented Apr 6, 2022

Uh oh!

Uh oh!

rylev commented Apr 6, 2022 •

edited

Loading