Skip to content

Conversation

@xinlifoobar
Copy link
Contributor

@xinlifoobar xinlifoobar commented Aug 8, 2024

Which issue does this PR close?

Closes #11853 and part of #11790

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 8, 2024
Ok(Arc::new(result) as ArrayRef)
}

fn initcap_utf8view<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am considering it might be a bit heavy to make this a macro, we wouldn't have a lot of initcap_* like this.

let result = string_view_array
.iter()
.map(initcap_string)
.collect::<GenericStringArray<T>>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type is a StringArray instead of a StringViewArray, should I alter this behavior? In previous it was defined here

https://github.com/apache/datafusion/blob/2521043ddcb3895a2010b8e328f3fa10f77fc094/datafusion/functions/src/utils.rs#L45C1-L46C1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current utf8_to_str_type only return Utf8 or LargeUtf8. I think ideally we should support returning Utf8View. But since we are recreating the strings anyway, I'm not sure if StringView will help here.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xinlifoobar and @XiangpengHao


fn initcap_string(string: Option<&str>) -> Option<String> {
string.map(|string: &str| {
let mut char_vector = Vec::<char>::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you could make this faster by creating the vector once and then resetting on each loop -- like

        let mut char_vector = Vec::<char>::new();
    string.map(|string: &str| {
      char_vector.clear();
...
}

@alamb alamb merged commit f2685d3 into apache:main Aug 12, 2024
@alamb
Copy link
Contributor

alamb commented Aug 12, 2024

Thanks again @xinlifoobar and @XiangpengHao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update INITCAP scalar function to support Utf8View

3 participants