-
Couldn't load subscription status.
- Fork 1.7k
feat: support Utf8View for more args of regexp_replace
#17195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| let patterns = StringArray::from(patterns); | ||
| let replacements = StringArray::from(replacement); | ||
| let patterns = <$U>::from(patterns); | ||
| let replacements = <$U>::from(replacement); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I change all of the other arg string types at once since they hit the same code path (fetch_string_arg) so I don't think we need to test them independently.
regexp_replace, signature cleanupregexp_replace, signature cleanup
regexp_replace, signature cleanupregexp_replace, signature cleanup
regexp_replace, signature cleanupregexp_replace
regexp_replaceUtf8View for more args of regexp_replace
|
I'm afraid I might have lost the plot with all of those string args. I looked at other string functions to see if a huge |
|
I'm not proud of the readability of the change, but at least there doesn't seem to be a meaningful performance regression: main PR |
…anymore, update the .slt test for string_view.slt, and understand why String(3) and String(4) is not equivalent to this.
|
@mbutrovich -- thank you for this PR. I still can't help but feel there is a mismatch between what this PR is doing (trying expand the types of string arguments this function supports) and what you are actually seeing in Comet. Can you explain what is happening in comet? For example, is regexp_replace being called with I ask because a mismatch between the actual argument types, and the types a function accepts can be fixed in at least 2 ways:
Typically coercion is done in the LogicalPlan, and I know that Comet does not use LogicalPlans. However, @adriangb and @kosiew have been working on a version of this for I also reviewed the documentation on types and type signatures and I think there is some ambiguity about what a function "supporting argument types" means. I will make a PR to improve the documentation in this area |
|
I haven't forgotten about this. I need to generate a new .slt file for the relevant tests. |
In my experimental branch adding I am increasingly of the mind that Comet needs to start doing some passes over the physical plan, and type coercion like this might be one reason. I think this PR is good to go, but also am okay if we think it's needless complexity. |
| Self { | ||
| signature: Signature::one_of( | ||
| vec![ | ||
| TypeSignature::Exact(vec![Utf8, Utf8, Utf8]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an option too?
Uniform(3, vec![Utf8View, LargeUtf8, Utf8]),
Uniform(4, vec![Utf8View, LargeUtf8, Utf8]),I tested it on a checked out copy of this PR and the slt tests passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
Rationale for this change
#11667 added some
Uft8Viewsupport forregexp_replace, but only for the first argument. The other args only supportUft8, which likely isn't a huge deal since they're probably short literals, but adding support for other string types makes it easier to simplify the signature. Systems like Comet (which may emit string literals asUtf8View) will also benefit from being able to use this function.What changes are included in this PR?
Are these changes tested?
Yes, there are new tests. See above.
Are there any user-facing changes?
The signature for
regexp_replacechanged, but it was maybe wrong before (did not haveLargeUtf8).