[SPARK-25306][SQL] Avoid skewed filter trees to speed up `createFilter` in ORC #22313

dongjoon-hyun · 2018-09-01T22:35:12Z

What changes were proposed in this pull request?

In both ORC data sources, createFilter function has exponential time complexity due to its skewed filter tree generation. This PR aims to improve it by using new buildTree function.

REPRODUCE

// Create and read 1 row table with 1000 columns
sql("set spark.sql.orc.filterPushdown=true")
val selectExpr = (1 to 1000).map(i => s"id c$i")
spark.range(1).selectExpr(selectExpr: _*).write.mode("overwrite").orc("/tmp/orc")
print(s"With 0 filters, ")
spark.time(spark.read.orc("/tmp/orc").count)

// Increase the number of filters
(20 to 30).foreach { width =>
  val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
  print(s"With $width filters, ")
  spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
}

RESULT

With 0 filters, Time taken: 653 ms                                              
With 20 filters, Time taken: 962 ms
With 21 filters, Time taken: 1282 ms
With 22 filters, Time taken: 1982 ms
With 23 filters, Time taken: 3855 ms
With 24 filters, Time taken: 6719 ms
With 25 filters, Time taken: 12669 ms
With 26 filters, Time taken: 25032 ms
With 27 filters, Time taken: 49585 ms
With 28 filters, Time taken: 98980 ms    // over 1 min 38 seconds
With 29 filters, Time taken: 198368 ms   // over 3 mins
With 30 filters, Time taken: 393744 ms   // over 6 mins

AFTER THIS PR

With 0 filters, Time taken: 774 ms
With 20 filters, Time taken: 601 ms
With 21 filters, Time taken: 399 ms
With 22 filters, Time taken: 679 ms
With 23 filters, Time taken: 363 ms
With 24 filters, Time taken: 342 ms
With 25 filters, Time taken: 336 ms
With 26 filters, Time taken: 352 ms
With 27 filters, Time taken: 322 ms
With 28 filters, Time taken: 302 ms
With 29 filters, Time taken: 307 ms
With 30 filters, Time taken: 301 ms

How was this patch tested?

Pass the Jenkins with newly added test cases.

SparkQA · 2018-09-02T02:31:36Z

Test build #95583 has finished for PR 22313 at commit ac06b0c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class FilterWithTypeMap(filter: Filter, typeMap: Map[String, DataType])
case class FilterWithTypeMap(filter: Filter, typeMap: Map[String, DataType])

dongjoon-hyun · 2018-09-02T02:40:02Z

Could you review this PR, @gatorsmile and @cloud-fan ?

xuanyuanking · 2018-09-02T04:18:31Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala

+        }
+      })
+
+  private def getOrBuildSearchArgumentWithNewBuilder(


Just a little question about is any possible to reuse code with https://github.com/apache/spark/pull/22313/files#diff-224b8cbedf286ecbfdd092d1e2e2f237R61?

@xuanyuanking . This already reuses cacheExpireTimeout.

For the cache value, SearchArgument, SearchArgumentFactory and Builder are different classes. (They only share the same names.)

Here, they comes from org.apache.hadoop.hive.ql.io.sarg.*.

There, they comes from org.apache.orc.storage.ql.io.sarg.*.

The only exception I made is FilterWithTypeMap. I wanted to keep them separately since it's also related to cache key.

kiszk · 2018-09-02T05:11:13Z

General question: Why do we use time instead of entry size to control cache? I am neutral on this decision. I would like to hear the reason of this decision.

dongjoon-hyun · 2018-09-02T05:41:13Z

Thank you for review, @kiszk .

First, I don't want to hold the memory up after query completion. If we do, it will be a regression. So, I wanted time first.

Second, It's difficult to estimate the enough limit for the number of filters.

As we know, in several codegen JVM limit issues, there are several attempts to generate a single complex query for wide tables (thousands of columns).
Spark's optimizer like InferFiltersFromConstraints adds more constraints like 'IsNotNull(col1)`. Usually, the number of filters becomes double here.
Also, it's not a good design if we need to increase this limitation whenever we add a new SQL optimizer like InferFiltersFromConstraints.
If the limit is too high, we waste the memory. If the limit is small, the eviction will bite us again.

In short, time was enough and the simplest for this purpose.

viirya · 2018-09-02T23:15:58Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala

+    if (cacheExpireTimeout > 0) {
+      searchArgumentCache.get(FilterWithTypeMap(expression, dataTypeMap))
+    } else {
+      buildSearchArgument(dataTypeMap, expression, SearchArgumentFactory.newBuilder())


When we set timeout to zero on the cache, the loaded element can be removed immediately. Maybe we don't need to check timeout like this and we can simplify the code.

Ya. It's possible. But, if we create a Guava loading cache and pass through all the cache management logic in Guava, it means a more overhead than this PR. In this PR, spark.sql.orc.cache.sarg.timeout=0 means not creating the loading cache at all.

viirya · 2018-09-02T23:21:43Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala

+        if (cacheExpireTimeout > 0) {
+          // Build in a bottom-up manner
+          getOrBuildSearchArgumentWithNewBuilder(dataTypeMap, newFilter)
+        }


Why we need to cache all sub filters? Don't we just need to cache the final conjunction?

Final conjunction? All sub function results will be cached in the end.

cloud-fan · 2018-09-03T05:12:49Z

Do you know why createFilter function has exponential time complexity? Let's make sure the algorithm is good before adding the cache.

This reverts commit ac06b0c.

dongjoon-hyun · 2018-09-03T20:11:00Z

Thank you for review and advice, @cloud-fan . It turns out that my initial assessment is not enough.

First of all, from the beginning, SPARK-2883 is designed as a recursive function like the following. Please see tryLeft and tryRight. It's a pure computation to check if it succeeds. There is no reuse here. So, I tried to cache the first two tryLeft and tryRight operations since they can be re-used.

val tryLeft = buildSearchArgument(left, newBuilder)
val tryRight = buildSearchArgument(right, newBuilder)
val conjunction = for {
  _ <- tryLeft
  _ <- tryRight
  lhs <- buildSearchArgument(left, builder.startAnd())
  rhs <- buildSearchArgument(right, lhs)
} yield rhs.end()

However, before that, createFilter generates the target tree with reduceOption(And) as a deeply skewed tree. That was the root cause. I'll update this PR.

SparkQA · 2018-09-04T00:20:35Z

Test build #95637 has finished for PR 22313 at commit 4acbaf8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-09-04T00:29:43Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala

+    val schema = new StructType(Array(StructField("a", IntegerType, nullable = true)))
+    val filters = (1 to 2000).map(LessThan("a", _)).toArray[Filter]
+    failAfter(2 seconds) {
+      OrcFilters.createFilter(schema, filters)


This test looks tricky... It's a bad practice to assume some code will return in a certain time. Can we just add a microbenchmark for it?

Sure.

Something like the test code in the PR description? And marked as ignore(...) instead of test(...) here?

Or, do you want another test case in FilterPushdownBenchmark?

I'll choose (2), @cloud-fan .

cloud-fan · 2018-09-04T00:30:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala

    for {
      // Combines all convertible filters using `And` to produce a single conjunction
-      conjunction <- convertibleFilters.reduceOption(org.apache.spark.sql.sources.And)
+      conjunction <- buildTree(convertibleFilters)


does parquet has the same problem?

In parquet, this is done as

filters .flatMap(ParquetFilters.createFilter(requiredSchema, _)) .reduceOption(FilterApi.and)

can we follow it?

For the first question, I don't think Parquet has the same issue because Parquet uses canMakeFilterOn while ORC is trying to build a full result (with a fresh builder) to check if it's okay or not.

For the second question,

in ORC, we did the first half(flatMap) to compute convertibleFilters, but we can change it with filters.filter. I'll update like that

val convertibleFilters = for { filter <- filters _ <- buildSearchArgument(dataTypeMap, filter, SearchArgumentFactory.newBuilder()) } yield filter

The second half reduceOption(FilterApi.and) was the original ORC code which generated a skewed tree having exponential time complexity. We need to use buildTree.

BTW, Parquet has another issue here due to .reduceOption(FilterApi.and). When I make a benchmark, Parquet seems to be unable to handle 1000 filters, @cloud-fan .

viirya · 2018-09-04T05:03:51Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala

+    withTempPath { dir =>
+      val columns = (1 to width).map(i => s"id c$i")
+      val df = spark.range(1).selectExpr(columns: _*)
+      withTempTable("orcTable", "patquetTable") {


nit: a typo, patquetTable.

Oh, thanks!

SparkQA · 2018-09-04T07:05:01Z

Test build #95651 has finished for PR 22313 at commit 5c46693.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-04T07:05:02Z

Test build #95652 has finished for PR 22313 at commit 4a372a3.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-09-04T07:06:52Z

retest this please

cloud-fan · 2018-09-04T07:24:09Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala

-    } yield builder.build()
+    buildTree(filters.filter(buildSearchArgument(dataTypeMap, _, newBuilder).isDefined))
+      .flatMap(buildSearchArgument(dataTypeMap, _, newBuilder))
+      .map(_.build)


ah i see what you mean now. Can we restore to the previous version? That seems better. Sorry for the back and forth!

Sure. No problem, @cloud-fan . :)

SparkQA · 2018-09-04T10:11:56Z

Test build #95658 has finished for PR 22313 at commit 4a372a3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-04T10:54:33Z

Test build #95665 has finished for PR 22313 at commit 3cd4443.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-09-04T11:01:18Z

Retest this please.

SparkQA · 2018-09-04T12:59:26Z

Test build #95669 has finished for PR 22313 at commit 3cd4443.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-09-04T18:10:34Z

Retest this please.

dongjoon-hyun · 2018-09-04T18:17:01Z

The previous failures are irrelevant to this PR.

org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.HDFSMetadataLog: metadata directory collision
org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.SuiteSelector)
org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.SuiteSelector)

SparkQA · 2018-09-04T22:10:54Z

Test build #95680 has finished for PR 22313 at commit 3cd4443.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-09-04T22:14:51Z

Retest this please.

SparkQA · 2018-09-05T02:07:46Z

Test build #95685 has finished for PR 22313 at commit 3cd4443.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-09-05T02:24:24Z

thanks, merging to master!

cloud-fan · 2018-09-05T02:25:23Z

@dongjoon-hyun please also update the title of the JIRA ticket, thanks!

dongjoon-hyun · 2018-09-05T02:36:33Z

Thank you, @cloud-fan . Sure. I'll update them.

dongjoon-hyun · 2018-09-05T04:17:09Z

Also, thank you for review, @xuanyuanking, @kiszk , @viirya , @HyukjinKwon .

[SPARK-25306][SQL] Use cache to speed up createFilter

ac06b0c

dongjoon-hyun changed the title ~~[SPARK-25306][SQL] Use cache to speed up createFilter~~ [SPARK-25306][SQL] Use cache to speed up createFilter in ORC Sep 1, 2018

xuanyuanking reviewed Sep 2, 2018

View reviewed changes

viirya reviewed Sep 2, 2018

View reviewed changes

Revert "[SPARK-25306][SQL] Use cache to speed up createFilter"

7720134

This reverts commit ac06b0c.

Address comments

4acbaf8

dongjoon-hyun changed the title ~~[SPARK-25306][SQL] Use cache to speed up createFilter in ORC~~ [SPARK-25306][SQL] Avoid skewed filter trees to speed up createFilter in ORC Sep 3, 2018

cloud-fan reviewed Sep 4, 2018

View reviewed changes

Address comments

5c46693

viirya reviewed Sep 4, 2018

View reviewed changes

fix typo

4a372a3

cloud-fan reviewed Sep 4, 2018

View reviewed changes

Use old style.

3cd4443

asfgit closed this in 103f513 Sep 5, 2018

dongjoon-hyun deleted the SPARK-25306 branch September 5, 2018 02:36

dongjoon-hyun mentioned this pull request Sep 5, 2018

[SPARK-25306][SQL][FOLLOWUP] Change test to ignore in FilterPushdownBenchmark #22336

Closed

IvanVergiliev mentioned this pull request Jun 6, 2019

[SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion #24068

Closed

[SPARK-25306][SQL] Avoid skewed filter trees to speed up createFilter in ORC #22313

[SPARK-25306][SQL] Avoid skewed filter trees to speed up createFilter in ORC #22313

Uh oh!

Conversation

dongjoon-hyun commented Sep 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 2, 2018

Uh oh!

dongjoon-hyun commented Sep 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiszk commented Sep 2, 2018

Uh oh!

dongjoon-hyun commented Sep 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Sep 3, 2018

Uh oh!

dongjoon-hyun commented Sep 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 4, 2018

Uh oh!

SparkQA commented Sep 4, 2018

Uh oh!

HyukjinKwon commented Sep 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 4, 2018

Uh oh!

SparkQA commented Sep 4, 2018

Uh oh!

dongjoon-hyun commented Sep 4, 2018

Uh oh!

SparkQA commented Sep 4, 2018

Uh oh!

dongjoon-hyun commented Sep 4, 2018

[SPARK-25306][SQL] Avoid skewed filter trees to speed up `createFilter` in ORC #22313

[SPARK-25306][SQL] Avoid skewed filter trees to speed up `createFilter` in ORC #22313

dongjoon-hyun commented Sep 1, 2018 •

edited

Loading

dongjoon-hyun Sep 2, 2018 •

edited

Loading

dongjoon-hyun commented Sep 2, 2018 •

edited

Loading

dongjoon-hyun commented Sep 3, 2018 •

edited

Loading

dongjoon-hyun Sep 4, 2018 •

edited

Loading

dongjoon-hyun Sep 4, 2018 •

edited

Loading