Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

After this PR, we can test Pandas and Python UDF as below in Scala side:

import IntegratedUDFTestUtils._
val pandasTestUDF = TestScalarPandasUDF("udf")
spark.range(10).select(pandasTestUDF($"id")).show()

How was this patch tested?

Manually tested.

@HyukjinKwon
Copy link
Member Author

cc @viirya, @dongjoon-hyun, @BryanCutler

sys.props("spark.test.home")
assert(sys.props.contains("spark.test.home") ||
sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not set.")
sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for IDE case. spark.test.home can be missing if we run the tests in IDE without any other settings. In that case, it falls back to SPARK_HOME.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment for this reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I missed this. Actually there are multiple places like this. Let me fix them together later separately.

@SparkQA
Copy link

SparkQA commented Jun 24, 2019

Test build #106814 has finished for PR 24945 at commit 2e939f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait TestUDF

* sql("SELECT udf_name(1)")
* spark.select(expr("udf_name(1)")
* spark.range(10).select(expr("udf_name(id)")
* spark.range(10).select(pandasTestUDF($"id"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we use it? In SQLQueryTestSuite, I think udfs are all registered for UDFTestCase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this one will be used at #24946

@SparkQA
Copy link

SparkQA commented Jun 24, 2019

Test build #106843 has finished for PR 24945 at commit 8fe2474.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Thank you @viirya. This is not invasive at all. Let me merge it.

Merged to master.

@BryanCutler
Copy link
Member

Late +1, very nice!

HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 15, 2019
…h in EpochTracker (to support Python UDFs)

This PR proposes to use `InheritableThreadLocal` instead of `ThreadLocal` for current epoch in `EpochTracker`. Python UDF needs threads to write out to and read it from Python processes and when there are new threads, previously set epoch is lost.

After this PR, Python UDFs can be used at Structured Streaming with the continuous mode.

The test cases were written on the top of apache#24945.
Unit tests were added.

Manual tests.

Closes apache#24946 from HyukjinKwon/SPARK-27234.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-27893-followup branch March 3, 2020 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants