Skip to content

GlobalISel: Fix combine duplicating atomic loads #111730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 31, 2024

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Oct 9, 2024

The sext_inreg (load) combine was not deleting the old load instruction,
and it would never be deleted if volatile or atomic.

Copy link
Contributor Author

arsenm commented Oct 9, 2024

@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2024

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Matt Arsenault (arsenm)

Changes

The sext_inreg (load) combine was not deleting the old load instruction,
and it would never be deleted if volatile or atomic.


Full diff: https://github.com/llvm/llvm-project/pull/111730.diff

5 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll (+18-78)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll (+9-30)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll (+9-27)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir (-2)
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 14e94d48bf8362..535c827f6a8223 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -1110,6 +1110,7 @@ void CombinerHelper::applySextInRegOfLoad(
   Builder.buildLoadInstr(TargetOpcode::G_SEXTLOAD, MI.getOperand(0).getReg(),
                          LoadDef->getPointerReg(), *NewMMO);
   MI.eraseFromParent();
+  LoadDef->eraseFromParent();
 }
 
 /// Return true if 'MI' is a load or a store that may be fold it's address
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll
index 788fb04e842b4e..fc3bc09cf8e3e1 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll
@@ -27,32 +27,12 @@ define i32 @atomic_load_flat_monotonic_i8_zext_to_i32(ptr %ptr) {
 }
 
 define i32 @atomic_load_flat_monotonic_i8_sext_to_i32(ptr %ptr) {
-; GFX7-LABEL: atomic_load_flat_monotonic_i8_sext_to_i32:
-; GFX7:       ; %bb.0:
-; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT:    flat_load_ubyte v0, v[0:1] glc
-; GFX7-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v2
-; GFX7-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-LABEL: atomic_load_flat_monotonic_i8_sext_to_i32:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT:    flat_load_ubyte v0, v[0:1] glc
-; GFX8-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v2
-; GFX8-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-LABEL: atomic_load_flat_monotonic_i8_sext_to_i32:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX9-NEXT:    flat_load_ubyte v3, v[0:1] glc
-; GFX9-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v2
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GCN-LABEL: atomic_load_flat_monotonic_i8_sext_to_i32:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    flat_load_sbyte v0, v[0:1] glc
+; GCN-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GCN-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i8, ptr %ptr monotonic, align 1
   %ext = sext i8 %load to i32
   ret i32 %ext
@@ -71,32 +51,12 @@ define i16 @atomic_load_flat_monotonic_i8_zext_to_i16(ptr %ptr) {
 }
 
 define i16 @atomic_load_flat_monotonic_i8_sext_to_i16(ptr %ptr) {
-; GFX7-LABEL: atomic_load_flat_monotonic_i8_sext_to_i16:
-; GFX7:       ; %bb.0:
-; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT:    flat_load_ubyte v0, v[0:1] glc
-; GFX7-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v2
-; GFX7-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-LABEL: atomic_load_flat_monotonic_i8_sext_to_i16:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT:    flat_load_ubyte v0, v[0:1] glc
-; GFX8-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v2
-; GFX8-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-LABEL: atomic_load_flat_monotonic_i8_sext_to_i16:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX9-NEXT:    flat_load_ubyte v3, v[0:1] glc
-; GFX9-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v2
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GCN-LABEL: atomic_load_flat_monotonic_i8_sext_to_i16:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    flat_load_sbyte v0, v[0:1] glc
+; GCN-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GCN-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i8, ptr %ptr monotonic, align 1
   %ext = sext i8 %load to i16
   ret i16 %ext
@@ -126,32 +86,12 @@ define i32 @atomic_load_flat_monotonic_i16_zext_to_i32(ptr %ptr) {
 }
 
 define i32 @atomic_load_flat_monotonic_i16_sext_to_i32(ptr %ptr) {
-; GFX7-LABEL: atomic_load_flat_monotonic_i16_sext_to_i32:
-; GFX7:       ; %bb.0:
-; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT:    flat_load_ushort v0, v[0:1] glc
-; GFX7-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v2
-; GFX7-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-LABEL: atomic_load_flat_monotonic_i16_sext_to_i32:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT:    flat_load_ushort v0, v[0:1] glc
-; GFX8-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v2
-; GFX8-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-LABEL: atomic_load_flat_monotonic_i16_sext_to_i32:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX9-NEXT:    flat_load_ushort v3, v[0:1] glc
-; GFX9-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v2
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GCN-LABEL: atomic_load_flat_monotonic_i16_sext_to_i32:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    flat_load_sbyte v0, v[0:1] glc
+; GCN-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GCN-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i16, ptr %ptr monotonic, align 2
   %ext = sext i16 %load to i32
   ret i32 %ext
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll
index 139d841590f85a..7a5e83868fd4b0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll
@@ -58,28 +58,21 @@ define i32 @atomic_load_global_monotonic_i8_sext_to_i32(ptr addrspace(1) %ptr) {
 ; GFX7-LABEL: atomic_load_global_monotonic_i8_sext_to_i32:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; GFX7-NEXT:    flat_load_sbyte v0, v[0:1] glc
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v2
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: atomic_load_global_monotonic_i8_sext_to_i32:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; GFX8-NEXT:    flat_load_sbyte v0, v[0:1] glc
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v2
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: atomic_load_global_monotonic_i8_sext_to_i32:
 ; GFX9:       ; %bb.0:
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    global_load_sbyte v2, v[0:1], off glc
-; GFX9-NEXT:    global_load_ubyte v3, v[0:1], off glc
-; GFX9-NEXT:    s_waitcnt vmcnt(1)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v2
+; GFX9-NEXT:    global_load_sbyte v0, v[0:1], off glc
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i8, ptr addrspace(1) %ptr monotonic, align 1
@@ -117,28 +110,21 @@ define i16 @atomic_load_global_monotonic_i8_sext_to_i16(ptr addrspace(1) %ptr) {
 ; GFX7-LABEL: atomic_load_global_monotonic_i8_sext_to_i16:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; GFX7-NEXT:    flat_load_sbyte v0, v[0:1] glc
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v2
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: atomic_load_global_monotonic_i8_sext_to_i16:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; GFX8-NEXT:    flat_load_sbyte v0, v[0:1] glc
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v2
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: atomic_load_global_monotonic_i8_sext_to_i16:
 ; GFX9:       ; %bb.0:
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    global_load_sbyte v2, v[0:1], off glc
-; GFX9-NEXT:    global_load_ubyte v3, v[0:1], off glc
-; GFX9-NEXT:    s_waitcnt vmcnt(1)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v2
+; GFX9-NEXT:    global_load_sbyte v0, v[0:1], off glc
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i8, ptr addrspace(1) %ptr monotonic, align 1
@@ -201,28 +187,21 @@ define i32 @atomic_load_global_monotonic_i16_sext_to_i32(ptr addrspace(1) %ptr)
 ; GFX7-LABEL: atomic_load_global_monotonic_i16_sext_to_i32:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT:    flat_load_ushort v0, v[0:1] glc
+; GFX7-NEXT:    flat_load_sbyte v0, v[0:1] glc
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v2
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: atomic_load_global_monotonic_i16_sext_to_i32:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT:    flat_load_ushort v0, v[0:1] glc
+; GFX8-NEXT:    flat_load_sbyte v0, v[0:1] glc
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v2
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: atomic_load_global_monotonic_i16_sext_to_i32:
 ; GFX9:       ; %bb.0:
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    global_load_sbyte v2, v[0:1], off glc
-; GFX9-NEXT:    global_load_ushort v3, v[0:1], off glc
-; GFX9-NEXT:    s_waitcnt vmcnt(1)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v2
+; GFX9-NEXT:    global_load_sbyte v0, v[0:1], off glc
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i16, ptr addrspace(1) %ptr monotonic, align 2
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll
index 5823bc3dfd3f5b..bad6f3643462c5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll
@@ -65,29 +65,23 @@ define i32 @atomic_load_local_monotonic_i8_sext_to_i32(ptr addrspace(3) %ptr) {
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX7-NEXT:    s_mov_b32 m0, -1
-; GFX7-NEXT:    ds_read_i8 v1, v0
-; GFX7-NEXT:    ds_read_u8 v0, v0
+; GFX7-NEXT:    ds_read_i8 v0, v0
 ; GFX7-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: atomic_load_local_monotonic_i8_sext_to_i32:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    s_mov_b32 m0, -1
-; GFX8-NEXT:    ds_read_i8 v1, v0
-; GFX8-NEXT:    ds_read_u8 v0, v0
+; GFX8-NEXT:    ds_read_i8 v0, v0
 ; GFX8-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: atomic_load_local_monotonic_i8_sext_to_i32:
 ; GFX9:       ; %bb.0:
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    ds_read_i8 v1, v0
-; GFX9-NEXT:    ds_read_u8 v0, v0
+; GFX9-NEXT:    ds_read_i8 v0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i8, ptr addrspace(3) %ptr monotonic, align 1
   %ext = sext i8 %load to i32
@@ -127,29 +121,23 @@ define i16 @atomic_load_local_monotonic_i8_sext_to_i16(ptr addrspace(3) %ptr) {
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX7-NEXT:    s_mov_b32 m0, -1
-; GFX7-NEXT:    ds_read_i8 v1, v0
-; GFX7-NEXT:    ds_read_u8 v0, v0
+; GFX7-NEXT:    ds_read_i8 v0, v0
 ; GFX7-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: atomic_load_local_monotonic_i8_sext_to_i16:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    s_mov_b32 m0, -1
-; GFX8-NEXT:    ds_read_i8 v1, v0
-; GFX8-NEXT:    ds_read_u8 v0, v0
+; GFX8-NEXT:    ds_read_i8 v0, v0
 ; GFX8-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: atomic_load_local_monotonic_i8_sext_to_i16:
 ; GFX9:       ; %bb.0:
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    ds_read_i8 v1, v0
-; GFX9-NEXT:    ds_read_u8 v0, v0
+; GFX9-NEXT:    ds_read_i8 v0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i8, ptr addrspace(3) %ptr monotonic, align 1
   %ext = sext i8 %load to i16
@@ -216,29 +204,23 @@ define i32 @atomic_load_local_monotonic_i16_sext_to_i32(ptr addrspace(3) %ptr) {
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX7-NEXT:    s_mov_b32 m0, -1
-; GFX7-NEXT:    ds_read_i16 v1, v0
-; GFX7-NEXT:    ds_read_u16 v0, v0
+; GFX7-NEXT:    ds_read_i16 v0, v0
 ; GFX7-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: atomic_load_local_monotonic_i16_sext_to_i32:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    s_mov_b32 m0, -1
-; GFX8-NEXT:    ds_read_i16 v1, v0
-; GFX8-NEXT:    ds_read_u16 v0, v0
+; GFX8-NEXT:    ds_read_i16 v0, v0
 ; GFX8-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: atomic_load_local_monotonic_i16_sext_to_i32:
 ; GFX9:       ; %bb.0:
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    ds_read_i8 v1, v0
-; GFX9-NEXT:    ds_read_u16 v0, v0
+; GFX9-NEXT:    ds_read_i8 v0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v0, v1
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %load = load atomic i16, ptr addrspace(3) %ptr monotonic, align 2
   %ext = sext i16 %load to i32
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir
index afa81980ebd621..23b80528c80a98 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir
@@ -133,7 +133,6 @@ body: |
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
     ; CHECK-NEXT: [[SEXTLOAD:%[0-9]+]]:_(s32) = G_SEXTLOAD [[COPY]](p1) :: (volatile load (s8), addrspace 1)
-    ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p1) :: (volatile load (s8), addrspace 1)
     ; CHECK-NEXT: $vgpr0 = COPY [[SEXTLOAD]](s32)
     %0:_(p1) = COPY $vgpr0_vgpr1
     %1:_(s32) = G_LOAD %0 :: (volatile load (s8), align 1, addrspace 1)
@@ -172,7 +171,6 @@ body: |
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
     ; CHECK-NEXT: [[SEXTLOAD:%[0-9]+]]:_(s32) = G_SEXTLOAD [[COPY]](p1) :: (volatile load (s16), addrspace 1)
-    ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p1) :: (volatile load (s16), addrspace 1)
     ; CHECK-NEXT: $vgpr0 = COPY [[SEXTLOAD]](s32)
     %0:_(p1) = COPY $vgpr0_vgpr1
     %1:_(s32) = G_LOAD %0 :: (volatile load (s16), align 2, addrspace 1)

@arsenm arsenm marked this pull request as ready for review October 9, 2024 18:08
@tschuett
Copy link

tschuett commented Oct 9, 2024

The original test files look unaffected? https://reviews.llvm.org/D85966

@arsenm
Copy link
Contributor Author

arsenm commented Oct 9, 2024

The original test files look unaffected? https://reviews.llvm.org/D85966

I don't think AArch64 will pass the legality check for the atomic sextload

@tschuett
Copy link

tschuett commented Oct 9, 2024

It is a standard prelegalizer combiner for testing and isLegalOrBeforeLegalizer.

@arsenm
Copy link
Contributor Author

arsenm commented Oct 9, 2024

It is a standard prelegalizer combiner for testing and isLegalOrBeforeLegalizer.

The aarch64 test was only negative. It did not perform the fold in the atomic case, so there was no extra load to remove now

@tschuett
Copy link

tschuett commented Oct 9, 2024

Standard question: Could you add/extend a mir file for showing the different cases and should the erase be conditional on the type?

@arsenm
Copy link
Contributor Author

arsenm commented Oct 9, 2024

Standard question: Could you add/extend a mir file for showing the different cases and should the erase be conditional on the type?

The type doesn't matter. The original load always has to be removed. This is only done for hasOneUse anyway

@tschuett
Copy link

tschuett commented Oct 9, 2024

I meant: if atomic ...

@tschuett
Copy link

tschuett commented Oct 9, 2024

The duplicating is only due to atomicity of the load?

; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[SEXTLOAD]](s16)

Then I vote for:

if (atomic)
  EraseFromParent();

for documentation.

@arsenm
Copy link
Contributor Author

arsenm commented Oct 9, 2024

The duplicating is only due to atomicity of the load?

The duplicating is the apparent effect because the non-atomic load can be deleted. There's no plus to keeping it around

if (atomic)
EraseFromParent();

This just adds extra work for later code to delete. The zextload equivalent combine already directly deletes the load

@arsenm arsenm force-pushed the users/arsenm/amdgpu-global-isel-fix-atomic-extloads branch from 81dad07 to f61f760 Compare October 10, 2024 10:03
@arsenm arsenm force-pushed the users/arsenm/globalisel-fix-duplicating-atomic-loads branch from 5a23c27 to 8c60413 Compare October 10, 2024 10:04
@arsenm arsenm force-pushed the users/arsenm/amdgpu-global-isel-fix-atomic-extloads branch from f61f760 to b2937eb Compare October 10, 2024 10:44
@arsenm arsenm force-pushed the users/arsenm/globalisel-fix-duplicating-atomic-loads branch from 8c60413 to 3ab6aae Compare October 10, 2024 10:44
@arsenm arsenm force-pushed the users/arsenm/amdgpu-global-isel-fix-atomic-extloads branch from b2937eb to bdd2a6b Compare October 16, 2024 04:56
@arsenm arsenm force-pushed the users/arsenm/globalisel-fix-duplicating-atomic-loads branch from 3ab6aae to 2039073 Compare October 16, 2024 04:57
Copy link
Contributor

@aemerson aemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@arsenm arsenm force-pushed the users/arsenm/amdgpu-global-isel-fix-atomic-extloads branch from bdd2a6b to 7d5e8ec Compare October 17, 2024 13:05
@arsenm arsenm force-pushed the users/arsenm/globalisel-fix-duplicating-atomic-loads branch from 2039073 to 02d2bcc Compare October 17, 2024 13:06
@arsenm arsenm force-pushed the users/arsenm/amdgpu-global-isel-fix-atomic-extloads branch from 7d5e8ec to 5c0c290 Compare October 30, 2024 22:04
@arsenm arsenm force-pushed the users/arsenm/globalisel-fix-duplicating-atomic-loads branch from 02d2bcc to ecd0db9 Compare October 30, 2024 22:04
Copy link
Contributor Author

arsenm commented Oct 31, 2024

Merge activity

  • Oct 31, 10:38 AM EDT: A user started a stack merge that includes this pull request via Graphite.
  • Oct 31, 10:47 AM EDT: Graphite rebased this pull request as part of a merge.
  • Oct 31, 10:50 AM EDT: Graphite rebased this pull request as part of a merge.
  • Oct 31, 10:53 AM EDT: Graphite rebased this pull request as part of a merge.
  • Oct 31, 10:55 AM EDT: A user merged this pull request with Graphite.

@arsenm arsenm force-pushed the users/arsenm/amdgpu-global-isel-fix-atomic-extloads branch 2 times, most recently from 2548945 to d8e49cc Compare October 31, 2024 14:42
Base automatically changed from users/arsenm/amdgpu-global-isel-fix-atomic-extloads to main October 31, 2024 14:44
@arsenm arsenm force-pushed the users/arsenm/globalisel-fix-duplicating-atomic-loads branch 2 times, most recently from ececb36 to 1278dd9 Compare October 31, 2024 14:49
The sext_inreg (load) combine was not deleting the old load instruction,
and it would never be deleted if volatile or atomic.
@arsenm arsenm force-pushed the users/arsenm/globalisel-fix-duplicating-atomic-loads branch from 1278dd9 to ae28b67 Compare October 31, 2024 14:52
@arsenm arsenm merged commit db5bcb2 into main Oct 31, 2024
4 of 5 checks passed
@arsenm arsenm deleted the users/arsenm/globalisel-fix-duplicating-atomic-loads branch October 31, 2024 14:55
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
The sext_inreg (load) combine was not deleting the old load instruction,
and it would never be deleted if volatile or atomic.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
The sext_inreg (load) combine was not deleting the old load instruction,
and it would never be deleted if volatile or atomic.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 4, 2024
This fixes a bug that started triggering after llvm#111730, where we could remove a
load with multiple uses. It looks like the match should be checking the other
register.

  %SrcReg = load..
  %DstReg = sign_extend_inreg %SrcReg
davemgreen added a commit that referenced this pull request Nov 5, 2024
…14763)

This fixes a bug that started triggering after #111730, where we could
remove a load with multiple uses. It looks like the match should be
checking the other register in a one-use check.

  %SrcReg = load..
  %DstReg = sign_extend_inreg %SrcReg
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
…vm#114763)

This fixes a bug that started triggering after llvm#111730, where we could
remove a load with multiple uses. It looks like the match should be
checking the other register in a one-use check.

  %SrcReg = load..
  %DstReg = sign_extend_inreg %SrcReg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants