-
Notifications
You must be signed in to change notification settings - Fork 35
Add UDF support #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UDF support #106
Conversation
…>()) as DataTypeWithClass).dt().json())
@asm0dey @FelixEngl |
Looks like returning complex classes works in Scala: https://stackoverflow.com/a/59293056 Are you sure it may not be supported in Kotlin? |
*/ | ||
@OptIn(ExperimentalStdlibApi::class) | ||
inline fun <reified R> UDFRegistration.register(name: String, noinline func: () -> R): UDFWrapper0 { | ||
register(name, UDF0(func), schema(typeOf<R>()).unWrapper()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it makes sense to work not with our generated schema, but with schema from an encoder, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Actually, it is better than hack RowEncoder
to adapter our generated schema when register a UDF. reason: 1. RowEncoder
only need a correctly DataType
, not our generated schema other info. 2. some spark action (eg: .as<Int>()
action) to use UDF result is not though RowEncoder
,if still use hack class way, we will hack many class and there is no guarantee that hacked classes will work well.
I will remove this to other file |
I think this is a big plan. If we don’t have the java implementation as a reference, this future should hack many class. i suggest separate this future to other PR |
no-no, I meant that we have |
But if it works for the Scala API should we really provide something so limited? I mean people will just use Scala API because they will get weird exceptions in runtime which sounds as extremely bad UX. Like "we support UDFs, but only for couple of classes". And BTW do we throw exception if provided class is just regular data class or Java bean class? Or they will work? |
I created a simple example for this in my repo spark-udf-explore, and got some conclusion:
Nevertheless, I still want to try my best to let users have a complete API. I will try to hack related classes to support return data class in kotlin API |
But our |
And that's why we potentially should use not our schema, but schema, which can be obtained from encoder, I think |
What the |
You're calling |
A..... Because these codes are copied, and I always thought that this is the |
Copying code is usually just a bad idea I believe. It increases WTFps dramatically |
After testing, Call
so
it always wrapper a so, i think we should call |
On the other hand, we could unwrap this struct, right? And this way we could avoid Spark dealing with our own |
Spark does not directly use our
|
Should it work with deeply nested structures? |
|
OK, I'm fine with design, let's port it to 3.0 now! |
…ement UDF return data class
Wowzer, let's just suppress "unused" inspections |
Or just ignore it and et me know when you're ready |
This was revamped in #152 See https://github.com/Kotlin/kotlin-spark-api/tree/main#user-defined-functions The original notation was kept intact, just built around it :) |
This pr is a complement to #67
solution
After many attempts, I gave up on hack
RowEncoder
, because it always has inexplicable bugs.When register UDF, I directly convert
DataTypeWithClass
to spark DataType, and this solution seems to be working well.problem
Please see
UDFRegisterTest.kt
line134, when use a complex type as the return value of UDF, we will got an exception(same as line134 link). This exception also exists in java(i test in my local on spark 3.1.1).So this bug we needs to wait for the official spark to fix or we try to hack related classes to fix it(maybe more effort)Possible improvements