-
Notifications
You must be signed in to change notification settings - Fork 152
Change the 'significance' filter on the comparison page to a 'relevance' filter #1277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Nice change, I was quite confused by the previous two checkboxes. And I also agree with the second part, it would be nice to (optionally) show the user the full data that was used to compute the result desription instead of saying "this result is quite significant, although only partly relevant, but it is definitely dodgy with a medium magnitude". |
The CI failure is due to the fact that hyper no longer builds. This has already been reported here. |
This is a nice simplification. One after-the-fact nitpick: in some places the code has a notion of high, medium, and low relevance. But here relevance is just a binary yes/no thing. Is this difference meaningful? Can it be removed? E.g. is "non-relevant" actually equal to "low relevance"? |
@nnethercote I think there's still confusion caused by the fact that we talk about "relevancy" in two places:
How we ended up here is long story. We are now much better at calculating the relevancy of individual result comparisons now. The reason we have a relevancy for the entire artifact comparison is because historically we've struggled to answer the question "should the user pay attention to the results of an artifact comparison?". We could adopt the strategy of saying as long as an artifact comparison has 1 or more test result comparisons that are deemed relevant, the entire artifact comparison is relevant. However, in practice this means there are a lot artifact comparisons with only a few relatively small test result comparisons. For example, this artifact comparison has 2 relevant results but they're rather small and they're for secondary benchmarks. Should we signal to the user that they don't need to look at this comparison or should we always encourage users to look at comparisons even if they're this small? |
This is not strictly relevant to the current discussion, but maybe we could alter the relevancy calculation logic for the whole artifact comparison to take benchmark category into account? For example in the case that you have showed, we could make comparison relevancy smaller, since only secondary benchmarks are relevant on their own account. Then we could design some rule like "the whole comparison is relevant if there is at least a single primary benchmark relevant result", which might be slightly better than just taking all benchmark results into account. |
Some principles that seem good to me:
We could satisfy the first principle by more clearly distinguishing the two kinds of relevance. E.g. perhaps "test relevance" vs "artifact relevance", or even a different word from "relevance" for one of the concepts. We could satisfy both principles by going with your suggestion "as long as an artifact comparison has 1 or more test result comparisons that are deemed relevant, the entire artifact comparison is relevant". Of these two I would lean towards the latter. I would hope that people would look at the full results more often than not -- it's a single click to get there, and the summary is never going to paint the full picture. So pushing people more in that direction seems fine to me. |
Since we really care about helping the user focus on "relevant" changes (and not just significance which is only a part of relevance), we should provide a filter for relevance instead of a filter for significance. Since relevance also includes magnitude, we don't need to have a filter for small results either.
Additionally, the filters usually follow a pattern of showing more data when activated. This change also makes it so that the filter is for "showing non-relevant data" which is off by default.
As a next step, I'd love to have a detail view for each test case comparison which shows the details of how we arrive at the conclusion that a result is relevant perhaps by showing things like historical data and where this change fits in the historical trend.