Description
Consider this code:
#include <arm_acle.h>
#include <stdint.h>
#ifdef __aarch64__
#define REG "cntvct_el0"
#else
#define REG "cp15:1:c14"
#endif
uint64_t get_cntvct_xor() {
uint64_t v1 = __arm_rsr64(REG);
uint64_t v2 = __arm_rsr64(REG);
return v1 ^ v2;
}
When compiled for aarch64, it produces two mrs
instructions as expected.
For example, clang++ --target=aarch64-fuchsia -S -o - -O2 rsr.cc
produces (trimmed):
_Z14get_cntvct_xorv: // @_Z14get_cntvct_xorv
.cfi_startproc
// %bb.0:
mrs x8, CNTVCT_EL0
mrs x9, CNTVCT_EL0
eor x0, x9, x8
ret
However, when compiled for aarch32, it acts as if the intrinsic has "non-volatile" semantics and can be presumed to return the same value when called twice.
For example, clang++ --target=armv7-linux-gnueabihf -S -o - -O2 rsr.cc
produces (trimmed):
_Z14get_cntvct_xorv: @ @_Z14get_cntvct_xorv
.fnstart
@ %bb.0:
mov r0, #0
mov r1, #0
bx lr
(It's similar with -mthumb
added.)
The ARM ACLE spec does not say whether the __arm_rsr64
lowering should have "volatile" (non-CSE'able) or "non-volatile" (CSE'able) semantics. But for aarch64, both LLVM and GCC agree that it has the "volatile" semantics, and users now rely on that.
This example is exercising the aarch64 and aarch32 spellings of the exact same hardware access. IMHO they should definitely be treated consistently between the two backends. (GCC does not support the same intrinsics for aarch32 targets as for aarch64, so we don't have that precedent to refer to here.)
That seems to be the intent of the LLVM code too. To wit, in both cases above with -emit-llvm
added, the IR is basically the same:
define dso_local noundef i64 @_Z14get_cntvct_xorv() local_unnamed_addr #0 {
%1 = tail call i64 @llvm.read_volatile_register.i64(metadata !5)
%2 = tail call i64 @llvm.read_volatile_register.i64(metadata !5)
%3 = xor i64 %2, %1
ret i64 %3
}
It certainly seems wrong that llvm.read_volatile_register.i64
is being lowered on aarch32 as CSE'able. The "volatile" in the name really suggests the contrary.