Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

In #24068, @IvanVergiliev fixes the issue that OrcFilters.createBuilder has exponential complexity in the height of the filter tree due to the way the check-and-build pattern is implemented.

Comparing to the approach in #24068, I propose a simple solution for the issue:

  1. separate the logic of building a convertible filter tree and the actual SearchArgument builder, since the two procedures are different and their return types are different. Thus the new introduced class ActionType,TrimUnconvertibleFilters and BuildSearchArgument in [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion #24068 can be dropped. The code is more readable.
  2. For most of the leaf nodes, the convertible result is always Some(node), we can abstract it like this PR.
  3. The code is actually small changes on the previous code. See [SPARK-27105][SQL][test-hadoop3.2] Optimize away exponential complexity in ORC predicate conversion #24783

How was this patch tested?

Run the benchmark provided in #24068:

val schema = StructType.fromDDL("col INT")
(20 to 30).foreach { width =>
  val whereFilter = (1 to width).map(i => EqualTo("col", i)).reduceLeft(Or)
  val start = System.currentTimeMillis()
  OrcFilters.createFilter(schema, Seq(whereFilter))
  println(s"With $width filters, conversion takes ${System.currentTimeMillis() - start} ms")
}

Result:

With 20 filters, conversion takes 6 ms
With 21 filters, conversion takes 0 ms
With 22 filters, conversion takes 0 ms
With 23 filters, conversion takes 0 ms
With 24 filters, conversion takes 0 ms
With 25 filters, conversion takes 0 ms
With 26 filters, conversion takes 0 ms
With 27 filters, conversion takes 0 ms
With 28 filters, conversion takes 0 ms
With 29 filters, conversion takes 0 ms
With 30 filters, conversion takes 0 ms

Also verified with Unit tests.

@gengliangwang
Copy link
Member Author

None
} else {
Some(Or(leftResultOptional.get, rightResultOptional.get))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part can be simplified to:

for {
  lhs <- convertibleFiltersHelper(left, canPartialPushDown)
  rhs <- convertibleFiltersHelper(right, canPartialPushDown)
} yield Or(lhs, rhs)

Some(other)
} else {
None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to:

for (_ <- buildSearchArgument(dataTypeMap, other, newBuilder())) yield other

@SparkQA
Copy link

SparkQA commented Jun 19, 2019

Test build #106679 has finished for PR 24910 at commit c0cd6f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@IvanVergiliev
Copy link
Contributor

I think we can still benefit from some of the naming and code structure we decided on in the other PR. Will comment with 1-2 specific small suggestions tomorrow.

@SparkQA
Copy link

SparkQA commented Jun 19, 2019

Test build #106684 has finished for PR 24910 at commit ce08f04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jun 20, 2019

separate the logic of building a convertible filter tree and the actual SearchArgument builder, since the two procedures are different and their return types are different... The code is more readable.

I don't quite agree with this. The benefit of putting them together is: they share a similar procedure and it's easy to maintain and read if these 2 are combined. For example, when we want to support a new leaf predicate, we will always support it in both "building a convertible filter tree" and the "actual SearchArgument builder".

One idea: we can separate the And/Or/Not part, and keep the leaf predicate part combined.

def buildLeafSearchArguments... = {
  case EqualTo(...) => Some(...)
  ...
  case _ => None
}

def trimUnconvertibleFilters... = {
  def helper()... = {
    case And...
    case Or...
    ...
    case other => if (buildLeafSearchArguments(other).isDefined) Some(other) else None
  }
  ...
}

def buildSearchArgument... = {
  case And...
  ...
  case other => buildLeafSearchArguments(other)
}

Then we can still remove ActionType stuff.

@SparkQA
Copy link

SparkQA commented Jun 20, 2019

Test build #106709 has finished for PR 24910 at commit f2051ec.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2019

Test build #106708 has finished for PR 24910 at commit 77eaff3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 20, 2019

Test build #106714 has finished for PR 24910 at commit f2051ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 23, 2019

Test build #106802 has finished for PR 24910 at commit 1792bb6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 23, 2019

Test build #106803 has finished for PR 24910 at commit 1792bb6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang gengliangwang changed the title [SPARK-28108][SQL] Simplify OrcFilters [SPARK-28108][SQL][test-hadoop3.2] Simplify OrcFilters Jun 23, 2019
@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 23, 2019

Test build #106808 has finished for PR 24910 at commit 1792bb6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in b5e183c Jun 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants