Skip to content

__arm_rsr64 treated as CSE'able on arm32 #144845

@frobtech

Description

@frobtech

Consider this code:

#include <arm_acle.h>                                                           
                                                                                
#include <stdint.h>                                                             
                                                                                
#ifdef __aarch64__                                                              
#define REG "cntvct_el0"                                                        
#else                                                                           
#define REG "cp15:1:c14"                                                        
#endif                                                                          
                                                                                
uint64_t get_cntvct_xor() {                                                     
  uint64_t v1 = __arm_rsr64(REG);                                               
  uint64_t v2 = __arm_rsr64(REG);                                               
  return v1 ^ v2;                                                               
}

When compiled for aarch64, it produces two mrs instructions as expected.
For example, clang++ --target=aarch64-fuchsia -S -o - -O2 rsr.cc produces (trimmed):

_Z14get_cntvct_xorv:                    // @_Z14get_cntvct_xorv                 
        .cfi_startproc                                                          
// %bb.0:                                                                       
        mrs     x8, CNTVCT_EL0                                                  
        mrs     x9, CNTVCT_EL0                                                  
        eor     x0, x9, x8                                                      
        ret                                                                     

However, when compiled for aarch32, it acts as if the intrinsic has "non-volatile" semantics and can be presumed to return the same value when called twice.
For example, clang++ --target=armv7-linux-gnueabihf -S -o - -O2 rsr.cc produces (trimmed):

_Z14get_cntvct_xorv:                    @ @_Z14get_cntvct_xorv                  
        .fnstart                                                                
@ %bb.0:                                                                        
        mov     r0, #0                                                          
        mov     r1, #0                                                          
        bx      lr                                                              

(It's similar with -mthumb added.)

The ARM ACLE spec does not say whether the __arm_rsr64 lowering should have "volatile" (non-CSE'able) or "non-volatile" (CSE'able) semantics. But for aarch64, both LLVM and GCC agree that it has the "volatile" semantics, and users now rely on that.

This example is exercising the aarch64 and aarch32 spellings of the exact same hardware access. IMHO they should definitely be treated consistently between the two backends. (GCC does not support the same intrinsics for aarch32 targets as for aarch64, so we don't have that precedent to refer to here.)

That seems to be the intent of the LLVM code too. To wit, in both cases above with -emit-llvm added, the IR is basically the same:

define dso_local noundef i64 @_Z14get_cntvct_xorv() local_unnamed_addr #0 {     
  %1 = tail call i64 @llvm.read_volatile_register.i64(metadata !5)              
  %2 = tail call i64 @llvm.read_volatile_register.i64(metadata !5)              
  %3 = xor i64 %2, %1                                                           
  ret i64 %3                                                                    
}

It certainly seems wrong that llvm.read_volatile_register.i64 is being lowered on aarch32 as CSE'able. The "volatile" in the name really suggests the contrary.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions