-
Notifications
You must be signed in to change notification settings - Fork 13.6k
GlobalISel: Fix combine duplicating atomic loads #111730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GlobalISel: Fix combine duplicating atomic loads #111730
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-globalisel Author: Matt Arsenault (arsenm) ChangesThe sext_inreg (load) combine was not deleting the old load instruction, Full diff: https://github.com/llvm/llvm-project/pull/111730.diff 5 Files Affected:
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 14e94d48bf8362..535c827f6a8223 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -1110,6 +1110,7 @@ void CombinerHelper::applySextInRegOfLoad(
Builder.buildLoadInstr(TargetOpcode::G_SEXTLOAD, MI.getOperand(0).getReg(),
LoadDef->getPointerReg(), *NewMMO);
MI.eraseFromParent();
+ LoadDef->eraseFromParent();
}
/// Return true if 'MI' is a load or a store that may be fold it's address
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll
index 788fb04e842b4e..fc3bc09cf8e3e1 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll
@@ -27,32 +27,12 @@ define i32 @atomic_load_flat_monotonic_i8_zext_to_i32(ptr %ptr) {
}
define i32 @atomic_load_flat_monotonic_i8_sext_to_i32(ptr %ptr) {
-; GFX7-LABEL: atomic_load_flat_monotonic_i8_sext_to_i32:
-; GFX7: ; %bb.0:
-; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT: flat_load_ubyte v0, v[0:1] glc
-; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v2
-; GFX7-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-LABEL: atomic_load_flat_monotonic_i8_sext_to_i32:
-; GFX8: ; %bb.0:
-; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT: flat_load_ubyte v0, v[0:1] glc
-; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v2
-; GFX8-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX9-LABEL: atomic_load_flat_monotonic_i8_sext_to_i32:
-; GFX9: ; %bb.0:
-; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX9-NEXT: flat_load_ubyte v3, v[0:1] glc
-; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_mov_b32_e32 v0, v2
-; GFX9-NEXT: s_setpc_b64 s[30:31]
+; GCN-LABEL: atomic_load_flat_monotonic_i8_sext_to_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: flat_load_sbyte v0, v[0:1] glc
+; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i8, ptr %ptr monotonic, align 1
%ext = sext i8 %load to i32
ret i32 %ext
@@ -71,32 +51,12 @@ define i16 @atomic_load_flat_monotonic_i8_zext_to_i16(ptr %ptr) {
}
define i16 @atomic_load_flat_monotonic_i8_sext_to_i16(ptr %ptr) {
-; GFX7-LABEL: atomic_load_flat_monotonic_i8_sext_to_i16:
-; GFX7: ; %bb.0:
-; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT: flat_load_ubyte v0, v[0:1] glc
-; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v2
-; GFX7-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-LABEL: atomic_load_flat_monotonic_i8_sext_to_i16:
-; GFX8: ; %bb.0:
-; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT: flat_load_ubyte v0, v[0:1] glc
-; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v2
-; GFX8-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX9-LABEL: atomic_load_flat_monotonic_i8_sext_to_i16:
-; GFX9: ; %bb.0:
-; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX9-NEXT: flat_load_ubyte v3, v[0:1] glc
-; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_mov_b32_e32 v0, v2
-; GFX9-NEXT: s_setpc_b64 s[30:31]
+; GCN-LABEL: atomic_load_flat_monotonic_i8_sext_to_i16:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: flat_load_sbyte v0, v[0:1] glc
+; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i8, ptr %ptr monotonic, align 1
%ext = sext i8 %load to i16
ret i16 %ext
@@ -126,32 +86,12 @@ define i32 @atomic_load_flat_monotonic_i16_zext_to_i32(ptr %ptr) {
}
define i32 @atomic_load_flat_monotonic_i16_sext_to_i32(ptr %ptr) {
-; GFX7-LABEL: atomic_load_flat_monotonic_i16_sext_to_i32:
-; GFX7: ; %bb.0:
-; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT: flat_load_ushort v0, v[0:1] glc
-; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v2
-; GFX7-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-LABEL: atomic_load_flat_monotonic_i16_sext_to_i32:
-; GFX8: ; %bb.0:
-; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT: flat_load_ushort v0, v[0:1] glc
-; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v2
-; GFX8-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX9-LABEL: atomic_load_flat_monotonic_i16_sext_to_i32:
-; GFX9: ; %bb.0:
-; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX9-NEXT: flat_load_ushort v3, v[0:1] glc
-; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_mov_b32_e32 v0, v2
-; GFX9-NEXT: s_setpc_b64 s[30:31]
+; GCN-LABEL: atomic_load_flat_monotonic_i16_sext_to_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: flat_load_sbyte v0, v[0:1] glc
+; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GCN-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i16, ptr %ptr monotonic, align 2
%ext = sext i16 %load to i32
ret i32 %ext
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll
index 139d841590f85a..7a5e83868fd4b0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll
@@ -58,28 +58,21 @@ define i32 @atomic_load_global_monotonic_i8_sext_to_i32(ptr addrspace(1) %ptr) {
; GFX7-LABEL: atomic_load_global_monotonic_i8_sext_to_i32:
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT: flat_load_ubyte v0, v[0:1] glc
+; GFX7-NEXT: flat_load_sbyte v0, v[0:1] glc
; GFX7-NEXT: s_waitcnt vmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v2
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: atomic_load_global_monotonic_i8_sext_to_i32:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT: flat_load_ubyte v0, v[0:1] glc
+; GFX8-NEXT: flat_load_sbyte v0, v[0:1] glc
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v2
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: atomic_load_global_monotonic_i8_sext_to_i32:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: global_load_sbyte v2, v[0:1], off glc
-; GFX9-NEXT: global_load_ubyte v3, v[0:1], off glc
-; GFX9-NEXT: s_waitcnt vmcnt(1)
-; GFX9-NEXT: v_mov_b32_e32 v0, v2
+; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i8, ptr addrspace(1) %ptr monotonic, align 1
@@ -117,28 +110,21 @@ define i16 @atomic_load_global_monotonic_i8_sext_to_i16(ptr addrspace(1) %ptr) {
; GFX7-LABEL: atomic_load_global_monotonic_i8_sext_to_i16:
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT: flat_load_ubyte v0, v[0:1] glc
+; GFX7-NEXT: flat_load_sbyte v0, v[0:1] glc
; GFX7-NEXT: s_waitcnt vmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v2
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: atomic_load_global_monotonic_i8_sext_to_i16:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT: flat_load_ubyte v0, v[0:1] glc
+; GFX8-NEXT: flat_load_sbyte v0, v[0:1] glc
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v2
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: atomic_load_global_monotonic_i8_sext_to_i16:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: global_load_sbyte v2, v[0:1], off glc
-; GFX9-NEXT: global_load_ubyte v3, v[0:1], off glc
-; GFX9-NEXT: s_waitcnt vmcnt(1)
-; GFX9-NEXT: v_mov_b32_e32 v0, v2
+; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i8, ptr addrspace(1) %ptr monotonic, align 1
@@ -201,28 +187,21 @@ define i32 @atomic_load_global_monotonic_i16_sext_to_i32(ptr addrspace(1) %ptr)
; GFX7-LABEL: atomic_load_global_monotonic_i16_sext_to_i32:
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX7-NEXT: flat_load_ushort v0, v[0:1] glc
+; GFX7-NEXT: flat_load_sbyte v0, v[0:1] glc
; GFX7-NEXT: s_waitcnt vmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v2
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: atomic_load_global_monotonic_i16_sext_to_i32:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_load_sbyte v2, v[0:1] glc
-; GFX8-NEXT: flat_load_ushort v0, v[0:1] glc
+; GFX8-NEXT: flat_load_sbyte v0, v[0:1] glc
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v2
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: atomic_load_global_monotonic_i16_sext_to_i32:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: global_load_sbyte v2, v[0:1], off glc
-; GFX9-NEXT: global_load_ushort v3, v[0:1], off glc
-; GFX9-NEXT: s_waitcnt vmcnt(1)
-; GFX9-NEXT: v_mov_b32_e32 v0, v2
+; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i16, ptr addrspace(1) %ptr monotonic, align 2
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll
index 5823bc3dfd3f5b..bad6f3643462c5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll
@@ -65,29 +65,23 @@ define i32 @atomic_load_local_monotonic_i8_sext_to_i32(ptr addrspace(3) %ptr) {
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_mov_b32 m0, -1
-; GFX7-NEXT: ds_read_i8 v1, v0
-; GFX7-NEXT: ds_read_u8 v0, v0
+; GFX7-NEXT: ds_read_i8 v0, v0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v1
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: atomic_load_local_monotonic_i8_sext_to_i32:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_mov_b32 m0, -1
-; GFX8-NEXT: ds_read_i8 v1, v0
-; GFX8-NEXT: ds_read_u8 v0, v0
+; GFX8-NEXT: ds_read_i8 v0, v0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v1
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: atomic_load_local_monotonic_i8_sext_to_i32:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: ds_read_i8 v1, v0
-; GFX9-NEXT: ds_read_u8 v0, v0
+; GFX9-NEXT: ds_read_i8 v0, v0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
-; GFX9-NEXT: v_mov_b32_e32 v0, v1
; GFX9-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i8, ptr addrspace(3) %ptr monotonic, align 1
%ext = sext i8 %load to i32
@@ -127,29 +121,23 @@ define i16 @atomic_load_local_monotonic_i8_sext_to_i16(ptr addrspace(3) %ptr) {
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_mov_b32 m0, -1
-; GFX7-NEXT: ds_read_i8 v1, v0
-; GFX7-NEXT: ds_read_u8 v0, v0
+; GFX7-NEXT: ds_read_i8 v0, v0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v1
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: atomic_load_local_monotonic_i8_sext_to_i16:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_mov_b32 m0, -1
-; GFX8-NEXT: ds_read_i8 v1, v0
-; GFX8-NEXT: ds_read_u8 v0, v0
+; GFX8-NEXT: ds_read_i8 v0, v0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v1
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: atomic_load_local_monotonic_i8_sext_to_i16:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: ds_read_i8 v1, v0
-; GFX9-NEXT: ds_read_u8 v0, v0
+; GFX9-NEXT: ds_read_i8 v0, v0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
-; GFX9-NEXT: v_mov_b32_e32 v0, v1
; GFX9-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i8, ptr addrspace(3) %ptr monotonic, align 1
%ext = sext i8 %load to i16
@@ -216,29 +204,23 @@ define i32 @atomic_load_local_monotonic_i16_sext_to_i32(ptr addrspace(3) %ptr) {
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_mov_b32 m0, -1
-; GFX7-NEXT: ds_read_i16 v1, v0
-; GFX7-NEXT: ds_read_u16 v0, v0
+; GFX7-NEXT: ds_read_i16 v0, v0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)
-; GFX7-NEXT: v_mov_b32_e32 v0, v1
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: atomic_load_local_monotonic_i16_sext_to_i32:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_mov_b32 m0, -1
-; GFX8-NEXT: ds_read_i16 v1, v0
-; GFX8-NEXT: ds_read_u16 v0, v0
+; GFX8-NEXT: ds_read_i16 v0, v0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)
-; GFX8-NEXT: v_mov_b32_e32 v0, v1
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: atomic_load_local_monotonic_i16_sext_to_i32:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: ds_read_i8 v1, v0
-; GFX9-NEXT: ds_read_u16 v0, v0
+; GFX9-NEXT: ds_read_i8 v0, v0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
-; GFX9-NEXT: v_mov_b32_e32 v0, v1
; GFX9-NEXT: s_setpc_b64 s[30:31]
%load = load atomic i16, ptr addrspace(3) %ptr monotonic, align 2
%ext = sext i16 %load to i32
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir
index afa81980ebd621..23b80528c80a98 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-sextload-from-sextinreg.mir
@@ -133,7 +133,6 @@ body: |
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
; CHECK-NEXT: [[SEXTLOAD:%[0-9]+]]:_(s32) = G_SEXTLOAD [[COPY]](p1) :: (volatile load (s8), addrspace 1)
- ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p1) :: (volatile load (s8), addrspace 1)
; CHECK-NEXT: $vgpr0 = COPY [[SEXTLOAD]](s32)
%0:_(p1) = COPY $vgpr0_vgpr1
%1:_(s32) = G_LOAD %0 :: (volatile load (s8), align 1, addrspace 1)
@@ -172,7 +171,6 @@ body: |
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
; CHECK-NEXT: [[SEXTLOAD:%[0-9]+]]:_(s32) = G_SEXTLOAD [[COPY]](p1) :: (volatile load (s16), addrspace 1)
- ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p1) :: (volatile load (s16), addrspace 1)
; CHECK-NEXT: $vgpr0 = COPY [[SEXTLOAD]](s32)
%0:_(p1) = COPY $vgpr0_vgpr1
%1:_(s32) = G_LOAD %0 :: (volatile load (s16), align 2, addrspace 1)
|
The original test files look unaffected? https://reviews.llvm.org/D85966 |
I don't think AArch64 will pass the legality check for the atomic sextload |
It is a standard prelegalizer combiner for testing and |
The aarch64 test was only negative. It did not perform the fold in the atomic case, so there was no extra load to remove now |
Standard question: Could you add/extend a mir file for showing the different cases and should the erase be conditional on the type? |
The type doesn't matter. The original load always has to be removed. This is only done for hasOneUse anyway |
I meant: if atomic ... |
The duplicating is only due to atomicity of the load? llvm-project/llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-sextload-from-sextinreg.mir Line 17 in 4605ba0
Then I vote for:
for documentation. |
The duplicating is the apparent effect because the non-atomic load can be deleted. There's no plus to keeping it around
This just adds extra work for later code to delete. The zextload equivalent combine already directly deletes the load |
81dad07
to
f61f760
Compare
5a23c27
to
8c60413
Compare
f61f760
to
b2937eb
Compare
8c60413
to
3ab6aae
Compare
b2937eb
to
bdd2a6b
Compare
3ab6aae
to
2039073
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
bdd2a6b
to
7d5e8ec
Compare
2039073
to
02d2bcc
Compare
7d5e8ec
to
5c0c290
Compare
02d2bcc
to
ecd0db9
Compare
Merge activity
|
2548945
to
d8e49cc
Compare
ececb36
to
1278dd9
Compare
The sext_inreg (load) combine was not deleting the old load instruction, and it would never be deleted if volatile or atomic.
1278dd9
to
ae28b67
Compare
The sext_inreg (load) combine was not deleting the old load instruction, and it would never be deleted if volatile or atomic.
The sext_inreg (load) combine was not deleting the old load instruction, and it would never be deleted if volatile or atomic.
This fixes a bug that started triggering after llvm#111730, where we could remove a load with multiple uses. It looks like the match should be checking the other register. %SrcReg = load.. %DstReg = sign_extend_inreg %SrcReg
…vm#114763) This fixes a bug that started triggering after llvm#111730, where we could remove a load with multiple uses. It looks like the match should be checking the other register in a one-use check. %SrcReg = load.. %DstReg = sign_extend_inreg %SrcReg
The sext_inreg (load) combine was not deleting the old load instruction,
and it would never be deleted if volatile or atomic.