Skip to content

Conversation

@mgaido91
Copy link
Contributor

What changes were proposed in this pull request?

As outlined in the JIRA by @JoshRosen, our conversion mechanism from catalyst types to scala ones is pretty inefficient for primitive data types. Indeed, in these cases, most of the times we are adding useless calls to identity function or anyway to functions which return the same value. Using the information we have when we generate the code, we can avoid most of these overheads.

How was this patch tested?

Here is a simple test which shows the benefit that this PR can bring:

test("SPARK-27684: perf evaluation") {
    val intLongUdf = ScalaUDF(
      (a: Int, b: Long) => a + b, LongType,
      Literal(1) :: Literal(1L) :: Nil,
      true :: true :: Nil,
      nullable = false)

    val plan = generateProject(
      MutableProjection.create(Alias(intLongUdf, s"udf")() :: Nil),
      intLongUdf)
    plan.initialize(0)

    var i = 0
    val N = 100000000
    val t0 = System.nanoTime()
    while(i < N) {
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      i += 1
    }
    val t1 = System.nanoTime()
    println(s"Avg time: ${(t1 - t0).toDouble / N} ns")
  }

The output before the patch is:

Avg time: 51.27083294 ns

after, we get:

Avg time: 11.85874227 ns

which is ~5X faster.

Moreover a benchmark has been added for Scala UDF. The output after the patch can be seen in this PR, before the patch, the output was:

================================================================================================
UDF with mixed input types
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
long/nullable int/string to string:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
long/nullable int/string to string wholestage off            257            287          42          0,4        2569,5       1,0X
long/nullable int/string to string wholestage on            158            172          18          0,6        1579,0       1,6X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
long/nullable int/string to option:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
long/nullable int/string to option wholestage off            104            107           5          1,0        1037,9       1,0X
long/nullable int/string to option wholestage on             80             92          12          1,2         804,0       1,3X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
long/nullable int to primitive:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
long/nullable int to primitive wholestage off             71             76           7          1,4         712,1       1,0X
long/nullable int to primitive wholestage on             64             71           6          1,6         636,2       1,1X


================================================================================================
UDF with primitive types
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
long/nullable int to string:              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
long/nullable int to string wholestage off             60             60           0          1,7         600,3       1,0X
long/nullable int to string wholestage on             55             64           8          1,8         551,2       1,1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
long/nullable int to option:              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
long/nullable int to option wholestage off             66             73           9          1,5         663,0       1,0X
long/nullable int to option wholestage on             30             32           2          3,3         300,7       2,2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
long/nullable int/string to primitive:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
long/nullable int/string to primitive wholestage off             32             35           5          3,2         316,7       1,0X
long/nullable int/string to primitive wholestage on             41             68          17          2,4         414,0       0,8X

The improvements are particularly visible in the second case, ie. when only primitive types are used as inputs.

@SparkQA
Copy link

SparkQA commented May 18, 2019

Test build #105515 has finished for PR 24636 at commit 37ced27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val initArg = if (CatalystTypeConverters.isPrimitive(dt)) {
val convertedTerm = ctx.freshName("conv")
s"""
|${CodeGenerator.boxedType(dt)} $convertedTerm = ${eval.value};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we need this extra convertedTerm for the boxing? Could you instead do

Object $argTerm = ${eval.isNull} ? null : ${eval.value};

and avoid the use of an extra variable name? Or if you want more typechecking, do

${CodeGenerator.boxedType(dt)} $argTerm = ${eval.isNull} ? null : ${eval.value};

and used the boxed type as $argTerm's type?

To avoid repetition and more tightly scope the conditional part of the argument convert logic, we even might consider something like this

val boxedType = CodeGenerator.boxedType(dt)
val maybeConverted = if (CatalystTypeConverters.isPrimitive(dt)) {
  eval.value
} else {
  "$convertersTerm[$i].apply(${eval.value})"
}
s"$boxedType $argTerm = ${eval.isNull} ? null : $maybeConverted;"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually my first trial was exactly what you are suggesting here, but it didn't work: indeed it can cause compilation error (the error message is something like no common type for void and int). Then, I also tried:

val boxedType = CodeGenerator.boxedType(dt)
val maybeConverted = if (CatalystTypeConverters.isPrimitive(dt)) {
  s"(${boxedType}) eval.value"
} else {
  "$convertersTerm[$i].apply(${eval.value})"
}
s"$boxedType $argTerm = ${eval.isNull} ? null : $maybeConverted;"

but this fails too with a confusing error message. Honestly, I am not sure why this 2nd solution doesn't work, since I tried taking the code and compiling it with jdk and it worked. My best guess is that it is a janino bug which doesn't support it.
I did several trials but I haven't found any better alternative as this seemed the only syntax working with janino.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

I've filed a bug against Janino to report this issue: janino-compiler/janino#90

val (funcArgs, initArgs) = evals.zipWithIndex.zip(children.map(_.dataType)).map {
case ((eval, i), dt) =>
val argTerm = ctx.freshName("arg")
val initArg = if (CatalystTypeConverters.isPrimitive(dt)) {
Copy link
Member

@viirya viirya May 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use CodeGenerator.isPrimitiveType? We can save the change to CatalystTypeConverters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't work, you can see the UT failures. For types like timestamp it is not the same.

@SparkQA
Copy link

SparkQA commented May 19, 2019

Test build #105532 has finished for PR 24636 at commit 3df235d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91 mgaido91 force-pushed the SPARK-27684 branch 2 times, most recently from caea444 to ad57acf Compare May 19, 2019 14:59
@SparkQA
Copy link

SparkQA commented May 19, 2019

Test build #105533 has finished for PR 24636 at commit ad57acf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

This reverts commit ad57acf.
@SparkQA
Copy link

SparkQA commented May 19, 2019

Test build #105537 has finished for PR 24636 at commit fead323.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 20, 2019

Test build #105562 has finished for PR 24636 at commit fead323.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented May 21, 2019

Good PR. I will review this carefully.

One minor comment: I like performance improvements using Benchmark class. On the other hand, I am not convinced by the first experiment in the description. Since there is no warm-up time. it may include the execution time on interpreter instead of native code.

@mgaido91
Copy link
Contributor Author

Thanks for you comment @kiszk .

I added the warm-up, and the result is barely the same. Here is the code:

test("SPARK-27684: perf evaluation") {
    val intLongUdf = ScalaUDF(
      (a: Int, b: Long) => a + b, LongType,
      Literal(1) :: Literal(1L) :: Nil,
      true :: true :: Nil,
      nullable = false)

    val plan = generateProject(
      MutableProjection.create(Alias(intLongUdf, s"udf")() :: Nil),
      intLongUdf)
    plan.initialize(0)

    var i = 0
    val N = 100000000
    while(i < 1000) {
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      i += 1
    }
    i = 0
    val t0 = System.nanoTime()
    while(i < N) {
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      plan(EmptyRow).get(0, intLongUdf.dataType)
      i += 1
    }
    val t1 = System.nanoTime()
    println(s"Avg time: ${(t1 - t0).toDouble / N} ns")
  }

and the results are:

Old Avg time: 49.58303799 ns
New Avg time: 12.66588096 ns

Any more comments/concerns?

doRunBenchmarkWithPrimitiveTypes(sampleUDF, cardinality)
}

codegenBenchmark("long/nullable int/string to primitive", cardinality) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a typo -- should this be "long/nullable int to primitive"

doRunBenchmarkWithMixedTypes(sampleUDF, cardinality)
}

codegenBenchmark("long/nullable int to primitive", cardinality) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"long/nullable int to primitive" , change to "long/nullable int/string to primitive"

* Results will be written to "benchmarks/UDFBenchmark-results.txt".
* }}}
*/
object UDFBenchmark extends SqlBasedBenchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mgaido91 for your work on this performance improvement. I'm curious if you tried the JIRA test case from @JoshRosen with your changes. How close does this get us? Also do you think it might be worthwhile to add that test in this benchmark suite as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, everything can be added. If you think it is critical, we can add it. Indeed, the test reported in the description is very similar, as it is doing a + 1, which is not so different from an identity. I think the point here is to identify how much from the overhead is saved, and the tests performed show that the overhead is reduced significantly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway I added it, as shown in the results the overhead is now ~ 20% instead of ~50%

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105673 has finished for PR 24636 at commit 74e70f5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105675 has finished for PR 24636 at commit 010b3d4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105690 has finished for PR 24636 at commit 010b3d4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105693 has finished for PR 24636 at commit 010b3d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val convertedTerm = ctx.freshName("conv")
s"""
|${CodeGenerator.boxedType(dt)} $convertedTerm = ${eval.value};
|Object $argTerm = ${eval.isNull} ? null : $convertedTerm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove ${eval.isNull} ? ... if ${eval.isNull} is compile-time constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO maybe we can do this in a separate PR?

I wouldn't be surprised if there's other places in Spark (beyond this method / file) where we could apply similar fixes (and if we're going to apply this in a lot of places then it might even be nice to write some sort of helper for generating / managing null checks.

Does this PR look good to you otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @JoshRosen . It would be better to define a generic approach for addressing this and there are several instances of it. I can create another JIRA and start a PR for that if you're ok with it.

@gatorsmile
Copy link
Member

cc @ueshin

@JoshRosen
Copy link
Contributor

@kiszk @ueshin @gatorsmile, does this PR now look good to you? If so, I'd like to get this merged soon so that it doesn't go stale.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the delay.
LGTM.

@JoshRosen
Copy link
Contributor

I've merged this to master. Thanks @mgaido91 (and to everyone who helped with review)!

@JoshRosen JoshRosen closed this in 93db7b8 May 31, 2019
@mgaido91
Copy link
Contributor Author

thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants