Skip to content

String cloning is not optimized the same way as String construction #88905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SadiinsoSnowfall opened this issue Sep 13, 2021 · 7 comments
Closed
Assignees
Labels
C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@SadiinsoSnowfall
Copy link

SadiinsoSnowfall commented Sep 13, 2021

I don't know if it is really worth looking into, but I think I found some weird behavior in the rust compiler:
Let's consider the two following functions (godbolt link):

pub fn nop(a: &String) {
    a.as_str().to_owned();
}

pub fn complex(a: &String) {
    a.to_owned();
}

They both allocate a String struct but do not use or return it. Rustc is able to optimize away the first one, but not the second one.
If we dig deeper, we can observe that the compiler is not able to optimize away the to_owned and clone method when called on a string. Let's look at the following functions (godbolt link):

pub fn a() {
    "ok".to_owned();
}

pub fn b() {
    "ok".to_owned().clone();
}

pub fn c() {
    "ok".to_owned().as_str().to_owned();
}

pub fn d() {
    "ok".to_owned().to_owned();
}

pub fn e() {
    String::from("ok");
}

The fonctions a, c and e are optimized away even on opt-level 2. Whereas the functions b and d are not and still produce calls to alloc, dealloc, drop_in_place, etc...

Looking at the stdlib code, <&str>::to_owned internally calls <&[u8]>::to_owned, which calls <&[u8]>::to_vec, which calls <&[u8]>::to_vec_in which ends up calling hack::to_vec.

String::to_string calls String::to_owned which calls String::clone which calls Vec<u8>::clone which calls <&[u8]>::to_vec_in with the only difference from the <&str>::to_owned path being that the possibly custom allocator of the Vec internal buffer is cloned and passed to <&[u8]>::to_vec_in in Vec<u8>::clone.

Is it possible that rustc is not able to correctly inline the allocator functions when the default allocator is not directly assumed ?

This was tested on Rustc 1.55 and `Rustc 1.57.0-nightly (b69fe57 2021-09-10)

This issue has been assigned to @notriddle via this comment.

@SadiinsoSnowfall SadiinsoSnowfall added the C-bug Category: This is a bug. label Sep 13, 2021
@nagisa
Copy link
Member

nagisa commented Sep 13, 2021

Related #24194

@SadiinsoSnowfall
Copy link
Author

SadiinsoSnowfall commented Sep 13, 2021

While trying to understand why this is, I found another strange behavior (godbolt link): cloning a Vec containing elements with more than 1 byte wide cannot be optimized away by the compiler. I think this is because of Layout::repeat (called by Layout::array, in turn called by RawVec::allocate_in) which cannot fail for u8 (because (std::mem::size_of::<u8>(), std::mem::align_of::<u8>()) == (1, 1)) but can fail for any larger type (even though it should not in this case because an already existing slice is passed to the function). And this (checking the layout validity) might prevent the compiler from completly optimizing the clone operation.

However, I do not think this is related because all the previous tests used u8 slices.

@the8472
Copy link
Member

the8472 commented Sep 13, 2021

This looks like it's a matter of inlining. String::clone is neither a generic method nor #[inline] thus it's not eligible for cross-crate inlining, which means it won't be optimized out because the optimizer doesn't know it won't have side-effects without inlining.
It does get eliminated when building a bin with LTO.

@SadiinsoSnowfall
Copy link
Author

Hum, yes, I think this is it. Trying not to use String::clone (godbolt link) gives something much closer to the expected result but not perfect. Note that the resulting assembly is way better in the current nightly than it is in the current stable version, 1.55.

If it does get eliminated during LTO, it is not a problem. Should I close this issue ?

notriddle added a commit to notriddle/rust that referenced this issue Sep 13, 2021
It calls `Vec::clone` and performs a newtype-wrap around it,
so it should be trivial enough that it doesn't cause major size
regressions. It also fixes rust-lang#88905
@camelid
Copy link
Member

camelid commented Sep 13, 2021

@rustbot assign @notriddle

@rustbot rustbot self-assigned this Sep 13, 2021
@camelid camelid added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 13, 2021
@notriddle
Copy link
Contributor

@SadiinsoSnowfall is this example reduced from a realistic use case?

@SadiinsoSnowfall
Copy link
Author

@notriddle No, I was just playing around with compiler explorer. As the8472 said, this can be optimized during LTO so it might be a non-issue (as shown in the bench run in your PR).
I am closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants