-
Notifications
You must be signed in to change notification settings - Fork 3
Add common functions from sqrl repo, unify function package naming
#142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f2d81af to
abd4a39
Compare
|
The We need to document that somewhere in this repository as a convention to not break it in the DataSQRL repo ;-) Unrelated, should we use |
|
I like the As far as I can tell I am not necessarily against But we may double down on it, and we could create an
Maybe I miss something here, but since we create an uber JAR, I do not really see the benefit of grouping everything under specific maven sub-modules, other than separating well-defined groups, but IMO curerently that's not true. |
|
You are correct @ferenc-csaky. We could use "functions". The reason we are not is because I've seen multiple users create a "functions" folder to store their UDFs in for a DataSQRL project and then the import mechanism prioritizes the folder "functions" over the classpath resolution. Which means they are no longer accessible. That issue also exists for stdlib, but that name is less likely to be chosen as a folder name based on evidence to date. Yes, it makes sense to me to keep the type and functions together. The vector functions don't make sense without the type. And vice versa. Same for jsonb. The only reason for the sub-modules is that it allows users to pick and chose if they want to build their own flink sql runner, e.g. they could import the vector type and functions only. |
|
That's a valid point regarding sub-modules, I was not sure we wanna try shoot for that or not. Considering everything, WDYT about the following:
From a strict module naming perspective, not duplicating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR centralizes common UDFs into a shared module and standardizes package names across function libraries.
- Introduced three new system functions (
serialize_to_bytes,noop,hash_columns) infunctions-common. - Renamed packages from
stdlibtofunctionsin OpenAI and math modules. - Added an
auto-servicedependency (missing version) and updated the README import path.
Reviewed Changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| functions/openai-functions/src/main/java/com/datasqrl/flinkrunner/functions/openai/OpenAICompletions.java | Renamed package from stdlib.openai to functions.openai. |
| functions/math-functions/src/main/java/com/datasqrl/flinkrunner/functions/math/* | Renamed packages from stdlib.math to functions.math. |
| functions/functions-common/src/main/java/com/datasqrl/flinkrunner/functions/common/serialize_to_bytes.java | New UDF to serialize objects to bytes. |
| functions/functions-common/src/main/java/com/datasqrl/flinkrunner/functions/common/noop.java | New no-op UDF that always returns true. |
| functions/functions-common/src/main/java/com/datasqrl/flinkrunner/functions/common/hash_columns.java | New UDF to MD5-hash multiple column values. |
| functions/functions-common/pom.xml | Added com.google.auto-service:auto-service dependency without a version. |
| README.md | Updated SQL import syntax from stdlib.[library-name].* to functions.[library-name].*. |
Comments suppressed due to low confidence (2)
functions/functions-common/src/main/java/com/datasqrl/flinkrunner/functions/common/serialize_to_bytes.java:31
- There are no unit tests for this serialization function. Consider adding tests that verify correct byte output across a variety of input types and error scenarios.
public byte[] eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object object) {
functions/functions-common/src/main/java/com/datasqrl/flinkrunner/functions/common/hash_columns.java:40
- [nitpick] Instantiating a new
MessageDigestper call can be costly. Consider reusing aThreadLocal<MessageDigest>or pooling instances to reduce GC pressure.
var digest = MessageDigest.getInstance("MD5");
...tions-common/src/main/java/com/datasqrl/flinkrunner/functions/common/serialize_to_bytes.java
Show resolved
Hide resolved
velo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I expect a follow up PR on sqrl, right?
I would change the snashot version to make sure sqrl won't be using an old library.
Might also be the case here, might be downloading an snapshot from github packages.
functions/functions-common/src/main/java/com/datasqrl/flinkrunner/functions/common/noop.java
Show resolved
Hide resolved
stdlib/stdlib-commons/src/main/java/com/datasqrl/flinkrunner/stdlib/commons/noop.java
Show resolved
Hide resolved
Correct, these changes has to be reflected on the |
6342137 to
e05bac8
Compare
…arrange sub-modules accordingly
e05bac8 to
02f9f43
Compare
Added the functions to the
function-commonmodule, under a newcom.datasqrl.flinkrunner.functions.commonpackage.Also unified the package naming in every function module:
stdlibwithfunctionsin themathandopenaimodulesflinkrunnerin thetextmodule