Skip to content

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Aug 21, 2020

What changes were proposed in this pull request?

Change CreateFunctionCommand code that add class check before create function.

Why are the changes needed?

We have different behavior between create permanent function and temporary function when function class is invaild. e.g.,

create function f as 'test.non.exists.udf';
-- Time taken: 0.104 seconds

create temporary function f as 'test.non.exists.udf'
-- Error in query: Can not load class 'test.non.exists.udf' when registering the function 'f', please make sure it is on the classpath;

And Hive also fails both of them.

Does this PR introduce any user-facing change?

Yes, user will get exception when create a invalid udf.

How was this patch tested?

New test.

@SparkQA
Copy link

SparkQA commented Aug 21, 2020

Test build #127726 has finished for PR 29502 at commit ce83487.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 21, 2020

Test build #127740 has finished for PR 29502 at commit def8c4e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

This action seems break many things, update title to WIP.

@ulysses-you ulysses-you changed the title [SPARK-32677][SQL] Cache function directly after create [WIP][SPARK-32677][SQL] Cache function directly after create Aug 30, 2020

sql(s"CREATE FUNCTION ${udfInfo.funcName} AS '${udfInfo.className}' USING JAR '$jarUrl'")

assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need load resource during create function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was created in #27025.

@SparkQA
Copy link

SparkQA commented Aug 30, 2020

Test build #128043 has finished for PR 29502 at commit 3e6320d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you ulysses-you changed the title [WIP][SPARK-32677][SQL] Cache function directly after create [WIP][SPARK-32677][SQL] Load function resource before create Aug 30, 2020
@SparkQA
Copy link

SparkQA commented Aug 30, 2020

Test build #128044 has finished for PR 29502 at commit ba248c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 30, 2020

Test build #128047 has finished for PR 29502 at commit 9fb3c13.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128058 has finished for PR 29502 at commit fe98b71.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128060 has finished for PR 29502 at commit 8b3a509.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128071 has finished for PR 29502 at commit fe77698.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128090 has finished for PR 29502 at commit 4750e2e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 31, 2020

Test build #128100 has finished for PR 29502 at commit 882942f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 1, 2020

Test build #128122 has finished for PR 29502 at commit d55b8bc.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 1, 2020

Test build #128123 has finished for PR 29502 at commit 75e61e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you ulysses-you changed the title [WIP][SPARK-32677][SQL] Load function resource before create [SPARK-32677][SQL] Load function resource before create Sep 1, 2020
@ulysses-you
Copy link
Contributor Author

It's ok to review. cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Sep 3, 2020

Test build #128217 has finished for PR 29502 at commit bb6d742.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val className = funcDefinition.className
if (!Utils.classIsLoadable(className)) {
throw new AnalysisException(s"Can not load class '$className' when registering " +
s"the function '${funcDefinition.identifier}', please make sure it is on the classpath")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about funcDefinition.identifier.unquotedString?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FunctionIdentifier already override toString = unquotedString, but it's fine to invoke unquotedString explicit.

val func = CatalogFunction(FunctionIdentifier(functionName, databaseName), className, resources)
catalog.loadFunctionResources(resources)
// We fail fast if function class is not exists.
catalog.requireFunctionClassExists(func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need to do this for non-temp functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also think about it.

val className = funcDefinition.className
if (!Utils.classIsLoadable(className)) {
throw new AnalysisException(s"Can not load class '$className' when registering " +
s"the function '${funcDefinition.identifier.unquotedString}', please make sure " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last thing: shall we fill the default database before putting the function identifier in the error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

This method is used by both temporary and permanent function. The temporary has no database name and we can't fill the database name. The permanent follows user created.

@SparkQA
Copy link

SparkQA commented Sep 3, 2020

Test build #128230 has finished for PR 29502 at commit 6584f39.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

struct<>
-- !query output
org.apache.spark.sql.AnalysisException
Can not load class 'test.non.existent.udaf' when registering the function 'default.udaf1', please make sure it is on the classpath; line 1 pos 94
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it was 'default.udaf1' previously. But the new message is 'udaf1': https://github.com/apache/spark/pull/29502/files#diff-2d3d6b47e6044b4d44590c5a73b7cd8bR46

Do you know why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For permanent, the old code path is:

create function // no class check
query on function -> lookup function -> fill database name if permanent -> register function -> check class

So the previously msg always contains database name.

Now the code path is:

create function -> check class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then shall we fill in the default database when creating permanent functions, to make the error message better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Permanent function always need a database, it's ok to fill database name when creating.

@SparkQA
Copy link

SparkQA commented Sep 5, 2020

Test build #128320 has finished for PR 29502 at commit 3cf971b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 05fcf26 Sep 7, 2020
@ulysses-you
Copy link
Contributor Author

thanks for merging !

@cloud-fan
Copy link
Contributor

Sorry I have to revert it. This breaks a use case that: the udf class is not accessible in the cluster that creates the function, but people use this function in another cluster that can access the udf class.

I'll open a PR to write the use case down into a comment, so that we won't forget.

@ulysses-you
Copy link
Contributor Author

Never mind.

That said, there are two Spark cluster(A, B) and one Hive metastore then create function with Spark A but using Spark B's local path. Seems strange but it actual exists.

maropu pushed a commit that referenced this pull request Sep 11, 2020
…nCommand

### What changes were proposed in this pull request?

We made a mistake in #29502, as there is no code comment to explain why we can't load the UDF class when creating functions. This PR improves the code comment.

### Why are the changes needed?

To avoid making the same mistake.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #29713 from cloud-fan/comment.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
maropu pushed a commit that referenced this pull request Sep 11, 2020
…nCommand

### What changes were proposed in this pull request?

We made a mistake in #29502, as there is no code comment to explain why we can't load the UDF class when creating functions. This PR improves the code comment.

### Why are the changes needed?

To avoid making the same mistake.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #29713 from cloud-fan/comment.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
(cherry picked from commit 328d81a)
Signed-off-by: Takeshi Yamamuro <[email protected]>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…nCommand

### What changes were proposed in this pull request?

We made a mistake in apache#29502, as there is no code comment to explain why we can't load the UDF class when creating functions. This PR improves the code comment.

### Why are the changes needed?

To avoid making the same mistake.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes apache#29713 from cloud-fan/comment.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
(cherry picked from commit 328d81a)
Signed-off-by: Takeshi Yamamuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants