Skip to content

Function calls between methods with the same __attribute__((target)) are not resolved to the target-specific clone #78416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DaMatrix opened this issue Jan 17, 2024 · 1 comment

Comments

@DaMatrix
Copy link
Contributor

DaMatrix commented Jan 17, 2024

Godbolt example

In the following example:

__attribute__((target("default")))
static int ctz(unsigned i) { return __builtin_ctz(i); }

__attribute__((target("arch=skylake")))
static int ctz(unsigned i) { return __builtin_ctz(i); }

__attribute__((target("default")))
int indirect_ctz(unsigned i) { return ctz(i); }

__attribute__((target("arch=skylake")))
int indirect_ctz(unsigned i) { return ctz(i); }

I would expect that indirect_ctz [default] and indirect_ctz [clone .arch_skylake] would be able to be optimized into static calls to ctz [default] and ctz [clone .arch_skylake], respectively. As can be seen on the Godbolt link above, GCC is able to perform this optimization (and then inline them). However, with clang both of the indirect_ctz versions simply call the ifunc-resolved version of ctz, which prevents inlining optimizations from taking effect.

Additionally, it seems that clang is also not able to perform this optimization if __attribute__((target_clones)) is used for either one or both of ctz or indirect_ctz:
Example 1
Example 2
Example 3
(GCC is able to optimize it into a static call to the target-specific implementation in examples 2 and 3, but fails to inline)

@jroelofs
Copy link
Contributor

related: #71714

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants