You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would expect that indirect_ctz [default] and indirect_ctz [clone .arch_skylake] would be able to be optimized into static calls to ctz [default] and ctz [clone .arch_skylake], respectively. As can be seen on the Godbolt link above, GCC is able to perform this optimization (and then inline them). However, with clang both of the indirect_ctz versions simply call the ifunc-resolved version of ctz, which prevents inlining optimizations from taking effect.
Additionally, it seems that clang is also not able to perform this optimization if __attribute__((target_clones)) is used for either one or both of ctz or indirect_ctz: Example 1 Example 2 Example 3
(GCC is able to optimize it into a static call to the target-specific implementation in examples 2 and 3, but fails to inline)
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Godbolt example
In the following example:
I would expect that
indirect_ctz [default]
andindirect_ctz [clone .arch_skylake]
would be able to be optimized into static calls toctz [default]
andctz [clone .arch_skylake]
, respectively. As can be seen on the Godbolt link above, GCC is able to perform this optimization (and then inline them). However, with clang both of theindirect_ctz
versions simply call theifunc
-resolved version ofctz
, which prevents inlining optimizations from taking effect.Additionally, it seems that clang is also not able to perform this optimization if
__attribute__((target_clones))
is used for either one or both ofctz
orindirect_ctz
:Example 1
Example 2
Example 3
(GCC is able to optimize it into a static call to the target-specific implementation in examples 2 and 3, but fails to inline)
The text was updated successfully, but these errors were encountered: