-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32677][SQL] Load function resource before create #29502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #127726 has finished for PR 29502 at commit
|
|
Test build #127740 has finished for PR 29502 at commit
|
|
This action seems break many things, update title to WIP. |
|
|
||
| sql(s"CREATE FUNCTION ${udfInfo.funcName} AS '${udfInfo.className}' USING JAR '$jarUrl'") | ||
|
|
||
| assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need load resource during create function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was created in #27025.
|
Test build #128043 has finished for PR 29502 at commit
|
|
Test build #128044 has finished for PR 29502 at commit
|
|
Test build #128047 has finished for PR 29502 at commit
|
|
Test build #128058 has finished for PR 29502 at commit
|
|
Test build #128060 has finished for PR 29502 at commit
|
|
Test build #128071 has finished for PR 29502 at commit
|
|
Test build #128090 has finished for PR 29502 at commit
|
|
Test build #128100 has finished for PR 29502 at commit
|
|
Test build #128122 has finished for PR 29502 at commit
|
|
Test build #128123 has finished for PR 29502 at commit
|
|
It's ok to review. cc @cloud-fan |
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Show resolved
Hide resolved
|
Test build #128217 has finished for PR 29502 at commit
|
| val className = funcDefinition.className | ||
| if (!Utils.classIsLoadable(className)) { | ||
| throw new AnalysisException(s"Can not load class '$className' when registering " + | ||
| s"the function '${funcDefinition.identifier}', please make sure it is on the classpath") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about funcDefinition.identifier.unquotedString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FunctionIdentifier already override toString = unquotedString, but it's fine to invoke unquotedString explicit.
| val func = CatalogFunction(FunctionIdentifier(functionName, databaseName), className, resources) | ||
| catalog.loadFunctionResources(resources) | ||
| // We fail fast if function class is not exists. | ||
| catalog.requireFunctionClassExists(func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need to do this for non-temp functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also think about it.
| val className = funcDefinition.className | ||
| if (!Utils.classIsLoadable(className)) { | ||
| throw new AnalysisException(s"Can not load class '$className' when registering " + | ||
| s"the function '${funcDefinition.identifier.unquotedString}', please make sure " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last thing: shall we fill the default database before putting the function identifier in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
This method is used by both temporary and permanent function. The temporary has no database name and we can't fill the database name. The permanent follows user created.
|
Test build #128230 has finished for PR 29502 at commit
|
| struct<> | ||
| -- !query output | ||
| org.apache.spark.sql.AnalysisException | ||
| Can not load class 'test.non.existent.udaf' when registering the function 'default.udaf1', please make sure it is on the classpath; line 1 pos 94 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it was 'default.udaf1' previously. But the new message is 'udaf1': https://github.com/apache/spark/pull/29502/files#diff-2d3d6b47e6044b4d44590c5a73b7cd8bR46
Do you know why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For permanent, the old code path is:
create function // no class check
query on function -> lookup function -> fill database name if permanent -> register function -> check class
So the previously msg always contains database name.
Now the code path is:
create function -> check class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then shall we fill in the default database when creating permanent functions, to make the error message better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Permanent function always need a database, it's ok to fill database name when creating.
|
Test build #128320 has finished for PR 29502 at commit
|
|
thanks, merging to master! |
|
thanks for merging ! |
|
Sorry I have to revert it. This breaks a use case that: the udf class is not accessible in the cluster that creates the function, but people use this function in another cluster that can access the udf class. I'll open a PR to write the use case down into a comment, so that we won't forget. |
|
Never mind. That said, there are two Spark cluster(A, B) and one Hive metastore then create function with Spark A but using Spark B's local path. Seems strange but it actual exists. |
…nCommand ### What changes were proposed in this pull request? We made a mistake in #29502, as there is no code comment to explain why we can't load the UDF class when creating functions. This PR improves the code comment. ### Why are the changes needed? To avoid making the same mistake. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #29713 from cloud-fan/comment. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
…nCommand ### What changes were proposed in this pull request? We made a mistake in #29502, as there is no code comment to explain why we can't load the UDF class when creating functions. This PR improves the code comment. ### Why are the changes needed? To avoid making the same mistake. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #29713 from cloud-fan/comment. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 328d81a) Signed-off-by: Takeshi Yamamuro <[email protected]>
…nCommand ### What changes were proposed in this pull request? We made a mistake in apache#29502, as there is no code comment to explain why we can't load the UDF class when creating functions. This PR improves the code comment. ### Why are the changes needed? To avoid making the same mistake. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes apache#29713 from cloud-fan/comment. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 328d81a) Signed-off-by: Takeshi Yamamuro <[email protected]>
What changes were proposed in this pull request?
Change
CreateFunctionCommandcode that add class check before create function.Why are the changes needed?
We have different behavior between create permanent function and temporary function when function class is invaild. e.g.,
And Hive also fails both of them.
Does this PR introduce any user-facing change?
Yes, user will get exception when create a invalid udf.
How was this patch tested?
New test.