Skip to content

Short circuit non-tagged unions #430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Short circuit non-tagged unions #430

wants to merge 2 commits into from

Conversation

adriangb
Copy link
Member

@adriangb adriangb commented Mar 11, 2023

The idea here is to do a first pass where we do not do exhaustive validation and instead try to optimistically find a match. Then if we don't find one we go back and do exhaustive validation so that we can build up the errors and such. In the example benchmark (which I know is a list but I think is representative of more complex objects) this is a 2x speedup (and it would be a 100x speedup if there were 100 options in the union).

The penalty of this is the same as doing the strict mode pass first (although I understand there's non-performance reasons to do that): we have to re-validate in an exhaustive match later. I think it would be interesting to explore some sort of "state cache" in the future that can be used for this as well as the strict mode first pass so that a validator can "pick up where it left off". I think the easiest way to do this would be to convert:

fn validate(...) -> ValResult<...>

Into something like:

fn validate(..., state: Option<ValStateToken>) -> (Option<ValStateToken>, ValResult<...>)

And then each Validator can come up with it's own state to store if it makes sense for it. I think the only ones that would make sense are lists and typed-dict where it'd be trivial to store "the last field/index that passed validation" or something like that.

@adriangb adriangb requested a review from samuelcolvin March 11, 2023 06:58
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 11, 2023

CodSpeed Performance Report

Merging #430 short-circuit (d86254a) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 93 untouched benchmarks

🆕 1 new benchmarks
⁉️ 0 dropped benchmarks

Benchmarks breakdown

Benchmark main short-circuit Change
🆕 test_smart_union_deep N/A 1.2 ms N/A

@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2023

Codecov Report

Merging #430 (d86254a) into main (e0008e9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #430   +/-   ##
=======================================
  Coverage   95.50%   95.50%           
=======================================
  Files          90       90           
  Lines       10735    10761   +26     
  Branches       10       10           
=======================================
+ Hits        10252    10277   +25     
- Misses        482      483    +1     
  Partials        1        1           
Impacted Files Coverage Δ
src/input/return_enums.rs 97.89% <100.00%> (+0.02%) ⬆️
src/validators/generator.rs 91.90% <100.00%> (+0.04%) ⬆️
src/validators/mod.rs 98.76% <100.00%> (+0.04%) ⬆️
src/validators/typed_dict.rs 98.95% <100.00%> (+<0.01%) ⬆️
src/validators/union.rs 98.09% <100.00%> (-0.28%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0008e9...d86254a. Read the comment docs.

@adriangb
Copy link
Member Author

adriangb commented Mar 11, 2023

@art049 I just added a benchmark here. It would be really cool if I could hit a button and say "run these benchmarks on main and show me the difference". Obviously this would fail in a lot of scenarios but I think it would work in quite a few, at least for optimization work like this where one should not be changing public APIs, adding features, etc.

@adriangb adriangb self-assigned this Mar 11, 2023
Comment on lines +67 to +76
if !extra.exhaustive {
return Err(ValError::Omit);
}
errors.extend(line_errors.into_iter().map(|err| err.with_outer_location(index.into())));
}
Err(ValError::Omit) => (),
Err(ValError::Omit) => {
if !extra.exhaustive {
return Err(ValError::Omit);
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an example, I think we'd have to implement this in other places, wherever it makes sense.

@adriangb
Copy link
Member Author

We should figure out at build time if exhaustiveness applies to the child validators. E.g. a StringValidator would say no and then the caller would know to not do a first exhaustive pass.

@dmontagu
Copy link
Collaborator

dmontagu commented Mar 11, 2023

@art049 I just added a benchmark here. It would be really cool if I could hit a button and say "run these benchmarks on main and show me the difference". Obviously this would fail in a lot of scenarios but I think it would work in quite a few, at least for optimization work like this where one should not be changing public APIs, adding features, etc.

For what it's worth @adriangb, might make sense to make a separate PR to main to add this benchmark to make the before/after more obvious.

@adriangb
Copy link
Member Author

For what it's worth @adriangb, might make sense to make a separate PR to main to add this benchmark to make the before/after more obvious.

There's 2 cons to that:

  • Manual work, this one is obvious
  • If this change doesn't end up landing that benchmark in main may not make sense. In other words, we want to know how that benchmark performs on main but maybe not necessarily add it to main.

@adriangb
Copy link
Member Author

From discussion today: this is an interesting idea, but we have to:

  1. Avoid doing an first non-exhaustive pass for types that don't support it. This means either (1) adding code to immediately return for those types or (2) doing something at validator build time to detect if that validator supports these checks or not.
  2. We should implement the state thing first so that we don't add more passes.

@art049
Copy link
Contributor

art049 commented Mar 11, 2023

@art049 I just added a benchmark here. It would be really cool if I could hit a button and say "run these benchmarks on main and show me the difference". Obviously this would fail in a lot of scenarios but I think it would work in quite a few, at least for optimization work like this where one should not be changing public APIs, adding features, etc.

Super interesting @adriangb, I think this could fit well with the historical run feature(basically running the latest benches on older revisions). It might just be a bit complicated regarding dependency management though. But I'll think about it when I work on it!

@adriangb
Copy link
Member Author

Closing this PR for now

@adriangb adriangb closed this Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants