Skip to content

Function calls between methods with the same __attribute__((target)) are not resolved to the target-specific clone #78416

Open
@DaMatrix

Description

@DaMatrix

Godbolt example

In the following example:

__attribute__((target("default")))
static int ctz(unsigned i) { return __builtin_ctz(i); }

__attribute__((target("arch=skylake")))
static int ctz(unsigned i) { return __builtin_ctz(i); }

__attribute__((target("default")))
int indirect_ctz(unsigned i) { return ctz(i); }

__attribute__((target("arch=skylake")))
int indirect_ctz(unsigned i) { return ctz(i); }

I would expect that indirect_ctz [default] and indirect_ctz [clone .arch_skylake] would be able to be optimized into static calls to ctz [default] and ctz [clone .arch_skylake], respectively. As can be seen on the Godbolt link above, GCC is able to perform this optimization (and then inline them). However, with clang both of the indirect_ctz versions simply call the ifunc-resolved version of ctz, which prevents inlining optimizations from taking effect.

Additionally, it seems that clang is also not able to perform this optimization if __attribute__((target_clones)) is used for either one or both of ctz or indirect_ctz:
Example 1
Example 2
Example 3
(GCC is able to optimize it into a static call to the target-specific implementation in examples 2 and 3, but fails to inline)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions