-
Notifications
You must be signed in to change notification settings - Fork 36
Add UDFRegister functions for Spark 2.4 #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /** | ||
| * A shortcut for [KSparkSession.spark].udf() | ||
| */ | ||
| inline fun KSparkSession.udf(): UDFRegistration = spark.udf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely don't need inline.
And why won't we put it inside KSparkSession?
| @@ -0,0 +1,803 @@ | |||
| @file:Suppress("NOTHING_TO_INLINE", "DuplicatedCode", "MemberVisibilityCanBePrivate") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this file is generated (I really hope it is) please provide us with generating code in comments. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not answering for a long time, was busy with another project. It's very good PR, but needs several iprovements.
No problem, and i thought so (especially with the improvements). This was only some kind of strange brainwave of mine and i hacked something together.
I will provide the generator-code after some clean-up, i only hacked it down for my own project and thought you might like/need it. :D
If you have further suggestions please let me know, i'll try to add them to the implementation.
(sorry that i can't do it today, but it's 00:30 in germany and i have a meeting in the morning.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no hurry at all, thank you for your effort.
How do you think if there is any way to support other classes natively supported by Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my fork is a branch for calling the scala-java-conversion-wrappers (i didnt call for a merge request, because i'm not really satisfied with the tests for it. i have problems testing the mutable scala types for changes, if you could help me with that [or give me some tipps how to write them] i would be thankfull).
I think we can achieve some kind of "auto-conversion" by using reified and calling these wrapper-functions. (at least for the function-wrappers)
I wrote this text on my smartphone, please excuse the typing error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you would file a PR we at least could think on how to test something :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take care for the code around christmas. Right now i don't have enough time to work on my PRs besides my current tasks.
I also commited the PR for the Wrappers (see #72)
Thank you for your feedback in advance! 😄
|
Sorry for not answering for a long time, was busy with another project. It's very good PR, but needs several iprovements. |
|
Please, forward-port this to version 3.0 |
|
I've following trouble: I've tried to write more complex test: should("also work with datasets") {
listOf("a" to 1, "b" to 2).toDS().toDF().createOrReplaceTempView("test1")
udf.register<String, Int, Int>("stringIntDiff") { a, b ->
a[0].toInt() - b
}
spark.sql("select stringIntDiff(first, second) from test1").show()
}and it fails with Looks like encoders won't work for our primitive type wrappers. |
# Conflicts: # kotlin-spark-api/2.4/src/main/kotlin/org/jetbrains/kotlinx/spark/api/UDFRegister.kt # kotlin-spark-api/2.4/src/test/kotlin/org/jetbrains/kotlinx/spark/api/UDFRegisterTest.kt
db91ee0 to
aa11744
Compare
a9da30c to
8e7523a
Compare
* copy code from #67 * remove hacked RowEncoder.scala * replace all schema(typeOf<R>()) to DataType.fromJson((schema(typeOf<R>()) as DataTypeWithClass).dt().json()) * add return udf data class test * change test * add in dataset test for calling the UDF-Wrapper * add the same exception link * refactor unWrapper * add test for udf return a List * make the test simpler * add License * add UDFRegister for 3.0 * remove useless import * resolved deprecated method * [experimental] add CatalystTypeConverters.scala for hacked it to implement UDF return data class * [experimental] implement UDF return data class * fix code inspection issue * Adds suppre unused Co-authored-by: can wang <[email protected]> Co-authored-by: Pasha Finkelshteyn <[email protected]>
|
Closing this. We don't support spark 2 anymore and based on the UDFs for spark 3 we implemented the registration: #152 (released at v1.2.0) |
Changes to old pull request
Correct branching.
Short description
Add a kotlin-styled way to create and call UDFs in a more secure manner.
Possible improvements
Future plans