-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27105][SQL][test-hadoop3.2] Optimize away exponential complexity in ORC predicate conversion #24783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #106119 has finished for PR 24783 at commit
|
|
I have updated the benchmark result. This PR is ready for review. |
|
It's true that this PR results in a smaller code change because it reuses the existing Namely, I mentioned a few benefits to the "filter-and-build in the same case-match" approach here: #24068 (comment) :
So, while I'm obviously biased, I still think that the code in the other PR results in a better end state for the implementation, despite the change being a bit larger. It also does exactly what your PR does, but it's structured in a different way (which I think has the benefits I mentioned above). |
|
@IvanVergiliev I think the code in this PR is much simpler and readable. The PR #24068 introduces two
This PR builds a fully convertible tree first, and then convert the tree to SearchArgument very straightforwardly. Putting the two procedures into two functions makes the logic cleaner. We can also see that the method With respect, this PR uses the benchmark in #24068, and it will be co-authored with you. I know there is a lot of work in #24068, but I prefer the simple implementation in this one. |
| saveAsTable(df, dir) | ||
| val benchmark = | ||
| new Benchmark("Select data with filters", numRows, minNumIters = 5, output = output) | ||
| Seq(100, 500, 1000).foreach { numFilter => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried with 5000 filters, and the execution becomes very slow. For end-to-end tests, we need to have a smaller size here, comparing to the benchmark Convert filters to ORC filter
|
Test build #106145 has finished for PR 24783 at commit
|
|
retest this please. |
|
Test build #106232 has finished for PR 24783 at commit
|
| Parquet Vectorized 10561 / 10565 1.5 671.4 1.0X | ||
| Parquet Vectorized (Pushdown) 711 / 716 22.1 45.2 14.9X | ||
| Native ORC Vectorized 6791 / 6806 2.3 431.8 1.6X | ||
| Native ORC Ve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a separate file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the benchmark results of FilterPushdownBenchmark will be in this file, unless we move the new benchmarks to another micro benchmark.
|
theoretically #24068 has better perf because it builds the |
Sure, I am fine with that. |
|
Hi, @gengliangwang . Are you going to use this PR for the followup after #24068 ? |
|
@dongjoon-hyun Yes, I think so. |
|
@cloud-fan cool, this sounds good to me too! I can also bring my PR back to a state similar to before I merged https://github.com/IvanVergiliev/spark/pull/2/files - with |
|
I have created a new PR for this: #24910 |
What changes were proposed in this pull request?
In #24068, @IvanVergiliev reports that
OrcFilters.createBuilderhas exponential complexity in the height of the filter tree due to the way the check-and-build pattern is implemented.This is because the same method
createBuilderis called twice recursively for any children underAnd/Or/Notnodes, so that inside the first call, the second call is called as well(See description in #24068 for details).Comparing to the approach in #24068, I propose a very simple solution for the issue. We can rely on the result of
convertibleFilters, which can build a fully convertible tree. With it, we don't need to concern about the children of a certain node is not convertible in methodcreateBuilder.How was this patch tested?
Unit test