Skip to content

Conversation

dharanad
Copy link
Contributor

@dharanad dharanad commented Aug 3, 2024

Which issue does this PR close?

Partially make #11766 work.

Rationale for this change

What changes are included in this PR?

  • coerced Utf8View to Utf8 make concat work
  • Added slt tests cases

Are these changes tested?

Existing test cases

carog test

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Aug 3, 2024
@dharanad dharanad changed the title [WIP] Cast Utf8View to Utf8 to support || from StringViewArray Cast Utf8View to Utf8 to support || from StringViewArray Aug 5, 2024
@dharanad dharanad marked this pull request as ready for review August 5, 2024 10:21
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dharanad -- this is ceratinly better than what happens on main (which is the query doesn't run)

I do think the plans could be improved if they cast to Utf8Vew instead, but we can do that as a follow on PR as well

(LargeUtf8, from_type) | (from_type, LargeUtf8) => {
string_concat_internal_coercion(from_type, &LargeUtf8)
match (lhs_type, rhs_type) {
// If Utf8View is in any side, we coerce to Utf8.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to coerce to Utf8View as that coercsion will often be faster (it is faster to cast Utf8 -> Utf8View than the other way around)

Is that possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, similar to this policy:

fn string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
use arrow::datatypes::DataType::*;
match (lhs_type, rhs_type) {
// If Utf8View is in any side, we coerce to Utf8View.
(Utf8View, Utf8View | Utf8 | LargeUtf8) | (Utf8 | LargeUtf8, Utf8View) => {
Some(Utf8View)
}
// Then, if LargeUtf8 is in any side, we coerce to LargeUtf8.
(LargeUtf8, Utf8 | LargeUtf8) | (Utf8, LargeUtf8) => Some(LargeUtf8),
(Utf8, Utf8) => Some(Utf8),
_ => None,
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to coerce to Utf8View as that coercsion will often be faster (it is faster to cast Utf8 -> Utf8View than the other way around)

Is that possible?

How about we do in a seperate PR. Previously, we were coerced to Utf8View, so concat was failing. As a temporary workaround to resolve the issue, I've coerce Utf8 instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed #11881 to track

select column2||' is fast' from temp;
----
rust is fast
datafusion is fast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

explain select column2 || 'is' || column3 from temp;
----
logical_plan
01)Projection: CAST(temp.column2 AS Utf8) || Utf8("is") || CAST(temp.column3 AS Utf8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a better plan (likely faster) if the casting was to Utf8View rather than Utf8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

explain select column2||' is fast' from temp;
----
logical_plan
01)Projection: CAST(temp.column2 AS Utf8) || Utf8(" is fast")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here it would be better to use Utf8View

@dharanad
Copy link
Contributor Author

dharanad commented Aug 7, 2024

@alamb Are we good here ? Or do you have more suggestions for this PR
I'm actively investigating alternative solutions for this problem here #11836

@alamb
Copy link
Contributor

alamb commented Aug 7, 2024

@alamb Are we good here ? Or do you have more suggestions for this PR I'm actively investigating alternative solutions for this problem here #11836

Nope, sorry @dharanad -- this is good. I am behind on merging / upstream reviews.

@alamb
Copy link
Contributor

alamb commented Aug 7, 2024

Thanks again @dharanad and @XiangpengHao -- sorry for the delay

@dharanad dharanad deleted the fix/11766 branch August 13, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants