Short circuit non-tagged unions #430

adriangb · 2023-03-11T06:55:03Z

The idea here is to do a first pass where we do not do exhaustive validation and instead try to optimistically find a match. Then if we don't find one we go back and do exhaustive validation so that we can build up the errors and such. In the example benchmark (which I know is a list but I think is representative of more complex objects) this is a 2x speedup (and it would be a 100x speedup if there were 100 options in the union).

The penalty of this is the same as doing the strict mode pass first (although I understand there's non-performance reasons to do that): we have to re-validate in an exhaustive match later. I think it would be interesting to explore some sort of "state cache" in the future that can be used for this as well as the strict mode first pass so that a validator can "pick up where it left off". I think the easiest way to do this would be to convert:

fn validate(...) -> ValResult<...>

Into something like:

fn validate(..., state: Option<ValStateToken>) -> (Option<ValStateToken>, ValResult<...>)

And then each Validator can come up with it's own state to store if it makes sense for it. I think the only ones that would make sense are lists and typed-dict where it'd be trivial to store "the last field/index that passed validation" or something like that.

codspeed-hq · 2023-03-11T06:58:54Z

CodSpeed Performance Report

Merging #430 short-circuit (d86254a) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 93 untouched benchmarks

🆕 1 new benchmarks
⁉️ 0 dropped benchmarks

Benchmarks breakdown

	Benchmark	`main`	`short-circuit`	Change
🆕	`test_smart_union_deep`	N/A	1.2 ms	N/A

codecov-commenter · 2023-03-11T06:59:43Z

Codecov Report

Merging #430 (d86254a) into main (e0008e9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #430   +/-   ##
=======================================
  Coverage   95.50%   95.50%           
=======================================
  Files          90       90           
  Lines       10735    10761   +26     
  Branches       10       10           
=======================================
+ Hits        10252    10277   +25     
- Misses        482      483    +1     
  Partials        1        1

Impacted Files	Coverage Δ
src/input/return_enums.rs	`97.89% <100.00%> (+0.02%)`	⬆️
src/validators/generator.rs	`91.90% <100.00%> (+0.04%)`	⬆️
src/validators/mod.rs	`98.76% <100.00%> (+0.04%)`	⬆️
src/validators/typed_dict.rs	`98.95% <100.00%> (+<0.01%)`	⬆️
src/validators/union.rs	`98.09% <100.00%> (-0.28%)`	⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0008e9...d86254a. Read the comment docs.

adriangb · 2023-03-11T07:02:00Z

@art049 I just added a benchmark here. It would be really cool if I could hit a button and say "run these benchmarks on main and show me the difference". Obviously this would fail in a lot of scenarios but I think it would work in quite a few, at least for optimization work like this where one should not be changing public APIs, adding features, etc.

adriangb · 2023-03-11T07:10:21Z

src/input/return_enums.rs

+                if !extra.exhaustive {
+                    return Err(ValError::Omit);
+                }
                errors.extend(line_errors.into_iter().map(|err| err.with_outer_location(index.into())));
            }
-            Err(ValError::Omit) => (),
+            Err(ValError::Omit) => {
+                if !extra.exhaustive {
+                    return Err(ValError::Omit);
+                }
+            }


This is just an example, I think we'd have to implement this in other places, wherever it makes sense.

adriangb · 2023-03-11T18:26:39Z

We should figure out at build time if exhaustiveness applies to the child validators. E.g. a StringValidator would say no and then the caller would know to not do a first exhaustive pass.

dmontagu · 2023-03-11T18:42:41Z

@art049 I just added a benchmark here. It would be really cool if I could hit a button and say "run these benchmarks on main and show me the difference". Obviously this would fail in a lot of scenarios but I think it would work in quite a few, at least for optimization work like this where one should not be changing public APIs, adding features, etc.

For what it's worth @adriangb, might make sense to make a separate PR to main to add this benchmark to make the before/after more obvious.

adriangb · 2023-03-11T18:44:26Z

For what it's worth @adriangb, might make sense to make a separate PR to main to add this benchmark to make the before/after more obvious.

There's 2 cons to that:

Manual work, this one is obvious
If this change doesn't end up landing that benchmark in main may not make sense. In other words, we want to know how that benchmark performs on main but maybe not necessarily add it to main.

adriangb · 2023-03-11T18:46:12Z

From discussion today: this is an interesting idea, but we have to:

Avoid doing an first non-exhaustive pass for types that don't support it. This means either (1) adding code to immediately return for those types or (2) doing something at validator build time to detect if that validator supports these checks or not.
We should implement the state thing first so that we don't add more passes.

art049 · 2023-03-11T19:51:29Z

@art049 I just added a benchmark here. It would be really cool if I could hit a button and say "run these benchmarks on main and show me the difference". Obviously this would fail in a lot of scenarios but I think it would work in quite a few, at least for optimization work like this where one should not be changing public APIs, adding features, etc.

Super interesting @adriangb, I think this could fit well with the historical run feature(basically running the latest benches on older revisions). It might just be a bit complicated regarding dependency management though. But I'll think about it when I work on it!

adriangb · 2023-03-13T13:30:13Z

Closing this PR for now

adriangb added 2 commits March 11, 2023 00:15

short circuit validation for faster union matching

c9aeba3

add benchmark

d86254a

adriangb requested a review from samuelcolvin March 11, 2023 06:58

adriangb self-assigned this Mar 11, 2023

adriangb commented Mar 11, 2023

View reviewed changes

adriangb mentioned this pull request Mar 11, 2023

Allow validators to keep state during validation #432

Open

adriangb mentioned this pull request Mar 11, 2023

Add a "cost" concept to validations #433

Open

adriangb closed this Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Short circuit non-tagged unions #430

Short circuit non-tagged unions #430

Uh oh!

adriangb commented Mar 11, 2023 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 11, 2023

Uh oh!

codecov-commenter commented Mar 11, 2023 •

edited

Loading

Uh oh!

adriangb commented Mar 11, 2023 •

edited

Loading

Uh oh!

adriangb Mar 11, 2023

Uh oh!

adriangb commented Mar 11, 2023

Uh oh!

dmontagu commented Mar 11, 2023 •

edited

Loading

Uh oh!

adriangb commented Mar 11, 2023

Uh oh!

adriangb commented Mar 11, 2023

Uh oh!

art049 commented Mar 11, 2023

Uh oh!

adriangb commented Mar 13, 2023

Uh oh!

Uh oh!

Short circuit non-tagged unions #430

Short circuit non-tagged unions #430

Uh oh!

Conversation

adriangb commented Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 11, 2023

CodSpeed Performance Report

Summary

Benchmarks breakdown

Uh oh!

codecov-commenter commented Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

adriangb commented Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb Mar 11, 2023

Choose a reason for hiding this comment

Uh oh!

adriangb commented Mar 11, 2023

Uh oh!

dmontagu commented Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented Mar 11, 2023

Uh oh!

adriangb commented Mar 11, 2023

Uh oh!

art049 commented Mar 11, 2023

Uh oh!

adriangb commented Mar 13, 2023

Uh oh!

Uh oh!

adriangb commented Mar 11, 2023 •

edited

Loading

codecov-commenter commented Mar 11, 2023 •

edited

Loading

adriangb commented Mar 11, 2023 •

edited

Loading

dmontagu commented Mar 11, 2023 •

edited

Loading