-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26560][SQL]:Repeat select on HiveUDF fails #23921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you fix the PR title? It sounds like we have a problem in |
|
Also, please describe how the current PR fixes the issue in the PR description. |
|
I have updated the PR description and title, please review. @HyukjinKwon |
|
@HyukjinKwon This issue happens only in Spark-shell, that's why I added in the title. Is that not required ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull up the flower bracket inline with finally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this try block is required now? please check once again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this inside try and put the catch block to the outer try.
|
@nivo091 Please correct the PR title format seems to be not in same as standard mentioned by the community. |
|
@sujith71955 : Handled, thanks |
|
cc @xiao Li |
|
Test build #4621 has finished for PR 23921 at commit
|
|
Can you make the PR description concise? It's difficult to read and follow. Why does the classloader change in the second time? Can you also point out the related codes? |
|
Spark-shell have the IMainTranslatingClassLoader which is loaded from IMain (scala.tools.nsc.interpreter). spark/sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala Lines 168 to 169 in e402de5
|
|
Can one of the admins verify this patch? |
| Try(super.makeFunctionExpression(name, clazz, input)).getOrElse { | ||
| var udfExpr: Option[Expression] = None | ||
| try { | ||
| val originalClassLoader = Thread.currentThread().getContextClassLoader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to modify original code but just switch context classloader. Then, use Utils.withContextClassLoader and just pass original code into fn.
|
Btw, please follow the template of PR description. And we describe PR title for what the patch fixes instead of what was the bug. |
|
I'm taking this over, as it's a bit old and the test should be added as well. Please refer #27025. |
…ardless of current thread context classloader ### What changes were proposed in this pull request? This patch is based on #23921 but revised to be simpler, as well as adds UT to test the behavior. (This patch contains the commit from #23921 to retain credit.) Spark loads new JARs for `ADD JAR` and `CREATE FUNCTION ... USING JAR` into jar classloader in shared state, and changes current thread's context classloader to jar classloader as many parts of remaining codes rely on current thread's context classloader. This would work if the further queries will run in same thread and there's no change on context classloader for the thread, but once the context classloader of current thread is switched back by various reason, Spark fails to create instance of class for the function. This bug mostly affects spark-shell, as spark-shell will roll back current thread's context classloader at every prompt. But it may also affects the case of job-server, where the queries may be running in multiple threads. This patch fixes the issue via switching the context classloader to the classloader which loads the class. Hopefully FunctionBuilder created by `makeFunctionBuilder` has the information of Class as a part of closure, hence the Class itself can be provided regardless of current thread's context classloader. ### Why are the changes needed? Without this patch, end users cannot execute Hive UDF using JAR twice in spark-shell. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New UT. Closes #27025 from HeartSaVioR/SPARK-26560-revised. Lead-authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Co-authored-by: nivo091 <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ardless of current thread context classloader This patch is based on apache#23921 but revised to be simpler, as well as adds UT to test the behavior. (This patch contains the commit from apache#23921 to retain credit.) Spark loads new JARs for `ADD JAR` and `CREATE FUNCTION ... USING JAR` into jar classloader in shared state, and changes current thread's context classloader to jar classloader as many parts of remaining codes rely on current thread's context classloader. This would work if the further queries will run in same thread and there's no change on context classloader for the thread, but once the context classloader of current thread is switched back by various reason, Spark fails to create instance of class for the function. This bug mostly affects spark-shell, as spark-shell will roll back current thread's context classloader at every prompt. But it may also affects the case of job-server, where the queries may be running in multiple threads. This patch fixes the issue via switching the context classloader to the classloader which loads the class. Hopefully FunctionBuilder created by `makeFunctionBuilder` has the information of Class as a part of closure, hence the Class itself can be provided regardless of current thread's context classloader. Without this patch, end users cannot execute Hive UDF using JAR twice in spark-shell. No. New UT. Closes apache#27025 from HeartSaVioR/SPARK-26560-revised. Lead-authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Co-authored-by: nivo091 <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…r regardless of current thread context classloader ### What changes were proposed in this pull request? This patch is based on #23921 but revised to be simpler, as well as adds UT to test the behavior. (This patch contains the commit from #23921 to retain credit.) Spark loads new JARs for `ADD JAR` and `CREATE FUNCTION ... USING JAR` into jar classloader in shared state, and changes current thread's context classloader to jar classloader as many parts of remaining codes rely on current thread's context classloader. This would work if the further queries will run in same thread and there's no change on context classloader for the thread, but once the context classloader of current thread is switched back by various reason, Spark fails to create instance of class for the function. This bug mostly affects spark-shell, as spark-shell will roll back current thread's context classloader at every prompt. But it may also affects the case of job-server, where the queries may be running in multiple threads. This patch fixes the issue via switching the context classloader to the classloader which loads the class. Hopefully FunctionBuilder created by `makeFunctionBuilder` has the information of Class as a part of closure, hence the Class itself can be provided regardless of current thread's context classloader. ### Why are the changes needed? Without this patch, end users cannot execute Hive UDF using JAR twice in spark-shell. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New UT. Closes #27075 from HeartSaVioR/SPARK-26560-branch-2.4. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
The classloader of spark-shell is IMainsTranslatingClassLoader which is loaded from IMain (scala.tools.nsc.interpreter). But on the first select, it registers the function and loads the hiveUDF jar to sparkContext classloader which is NonClosableMutuableClassLoader. While selecting the function on the second time, function is already registered so it tries to fetch from IMainTranslatingClassLoader which is giving analysis exception.
Changing of the classLoader of currentThread to NonClosableMutuableclassLoader will solve this issue.