Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Aug 27, 2016

What changes were proposed in this pull request?

If ScalaUDF throws exceptions during executing user code, sometimes it's hard for users to figure out what's wrong, especially when they use Spark shell. An example

org.apache.spark.SparkException: Job aborted due to stage failure: Task 12 in stage 325.0 failed 4 times, most recent failure: Lost task 12.3 in stage 325.0 (TID 35622, 10.0.207.202): java.lang.NullPointerException
    at line8414e872fb8b42aba390efc153d1611a12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:40)
    at line8414e872fb8b42aba390efc153d1611a12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:40)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
...

We should catch these exceptions and rethrow them with better error message, to say that the exception is happened in scala udf.

This PR also does some clean up for ScalaUDF and add a unit test suite for it.

How was this patch tested?

the new test suite

@cloud-fan
Copy link
Contributor Author

cc @yhuai

@SparkQA
Copy link

SparkQA commented Aug 27, 2016

Test build #64528 has finished for PR 14850 at commit 3c7bdff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

f(input)
} catch {
case e: NullPointerException =>
throw new RuntimeException(npeErrorMessage, e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still use NullPointerException? NullPointerException can have a specified message. Then, you can use setStackTrace to set the original stacktrace.

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64562 has finished for PR 14850 at commit 1bd8382.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 30, 2016

Test build #64652 has finished for PR 14850 at commit 9a82d67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val result = try {
f(input)
} catch {
case e: NullPointerException =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit hacky to set stack trace like this.
npe.setStackTrace(e.getStackTrace)

If user search the code line reported in the stack trace, user may not able to find the code that matches the error message.

Copy link
Contributor

@clockfly clockfly Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For this code branch eval(input: InternalRow), the existing NPE message should be clear enough if there is a full stacktrace, and the stack contains method of the UDF.
  2. The error message you provided can be totally wrong.
    "Given UDF throws NPE during execution, please check the UDF to make sure it handles null parameters correctly".

What if NPE is not caused by null parameter? prompting this message is misleading.

@clockfly
Copy link
Contributor

clockfly commented Sep 1, 2016

@cloud-fan

There are two branches to execute an UDF.

  1. Call override def eval(input: InternalRow): Any directly. For this code branch, I don't think there is any barrier for an user to know it is an UDF problem, as the ScalaUDF is in the stack trace.
  2. Another branch is in codegen. This code branch is confusing, in my opinion. Since every different code piece from different expression is fused into one code block. When an exception is thrown, we don't know who owns that code block.

@cloud-fan cloud-fan changed the title [SPARK-17279][SQL] better error message for NPE during ScalaUDF execution [SPARK-17279][SQL] better error message for exceptions during ScalaUDF execution Sep 2, 2016
s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
val getFuncResult = s"$funcTerm.apply(${funcArguments.mkString(", ")})"
val rethrowException = "throw new org.apache.spark.SparkException" +
"""("Exception happens when execute user code in Scala UDF.", e);"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception happens when executing user defined function (className: input argument type => output argument type) .
Or
`Failed to execute user defined function (className: input argument type => output argument type)

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64829 has finished for PR 14850 at commit b7c459b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64833 has finished for PR 14850 at commit 4efb6fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64840 has finished for PR 14850 at commit c9cd5e0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64845 has finished for PR 14850 at commit c6284bd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2016

Test build #64889 has finished for PR 14850 at commit bf786e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@clockfly
Copy link
Contributor

clockfly commented Sep 5, 2016

+1

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@asfgit asfgit closed this in 8d08f43 Sep 6, 2016
@cloud-fan
Copy link
Contributor Author

also backport it to 2.0

asfgit pushed a commit that referenced this pull request Sep 7, 2016
…F execution

## What changes were proposed in this pull request?

If `ScalaUDF` throws exceptions during executing user code, sometimes it's hard for users to figure out what's wrong, especially when they use Spark shell. An example
```
org.apache.spark.SparkException: Job aborted due to stage failure: Task 12 in stage 325.0 failed 4 times, most recent failure: Lost task 12.3 in stage 325.0 (TID 35622, 10.0.207.202): java.lang.NullPointerException
	at line8414e872fb8b42aba390efc153d1611a12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:40)
	at line8414e872fb8b42aba390efc153d1611a12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:40)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
...
```
We should catch these exceptions and rethrow them with better error message, to say that the exception is happened in scala udf.

This PR also does some clean up for `ScalaUDF` and add a unit test suite for it.

## How was this patch tested?

the new test suite

Author: Wenchen Fan <[email protected]>

Closes #14850 from cloud-fan/npe.

(cherry picked from commit 8d08f43)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants