-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[ConstantFold] Remove non-trivial gep-of-gep fold #93823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-llvm-ir Author: Nikita Popov (nikic) ChangesThis fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs. Patch is 47.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93823.diff 9 Files Affected:
diff --git a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
index 4033adc8f0390..6495fdebacf02 100644
--- a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
+++ b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
@@ -22,5 +22,5 @@ struct S {
// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i32 0, i32 1), ptr noundef @.str)
// CHECK: store i32 1, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1)
// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 1, i32 1), ptr noundef @.str.1)
-// CHECK: store i32 2, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 2)
+// CHECK: store i32 2, ptr getelementptr inbounds (%struct.S, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1), i64 1)
// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 2, i32 1), ptr noundef @.str.2)
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
index 6fbe4c7fd17a7..4f4c88d9e96a5 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
@@ -83,8 +83,8 @@ std::initializer_list<std::initializer_list<int>> nested = {
// CHECK-DYNAMIC-BL: store i32 5, ptr @_ZGR6nested2_
// CHECK-DYNAMIC-BL: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
// CHECK-DYNAMIC-BL: store ptr @_ZGR6nested2_,
-// CHECK-DYNAMIC-BL: ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
-// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BL: ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i64 1), align 8
+// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i64 1, i32 1), align 8
// CHECK-DYNAMIC-BL: store ptr @_ZGR6nested_,
// CHECK-DYNAMIC-BL: ptr @nested, align 8
// CHECK-DYNAMIC-BL: store i64 3, ptr getelementptr inbounds ({{.*}}, ptr @nested, i32 0, i32 1), align 8
@@ -123,9 +123,9 @@ std::initializer_list<std::initializer_list<int>> nested = {
// CHECK-DYNAMIC-BE: store i32 5, ptr @_ZGR6nested2_
// CHECK-DYNAMIC-BE: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
// CHECK-DYNAMIC-BE: store ptr @_ZGR6nested2_,
-// CHECK-DYNAMIC-BE: ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
+// CHECK-DYNAMIC-BE: ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i64 1), align 8
// CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([2 x i32], ptr @_ZGR6nested2_, i64 0, i64 2),
-// CHECK-DYNAMIC-BE: ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BE: ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i64 1, i32 1), align 8
// CHECK-DYNAMIC-BE: store ptr @_ZGR6nested_,
// CHECK-DYNAMIC-BE: ptr @nested, align 8
// CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([3 x {{.*}}], ptr @_ZGR6nested_, i64 0, i64 3),
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
index aa2f078a5fb0c..ac05910151a46 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
@@ -370,8 +370,8 @@ namespace partly_constant {
//
// Third init list.
// CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
- // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_THIRD]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2)
- // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2, i32 1)
+ // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_THIRD]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1), i64 1)
+ // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1), i64 1, i32 1)
// CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
//
// Outer init list.
diff --git a/clang/test/OpenMP/threadprivate_codegen.cpp b/clang/test/OpenMP/threadprivate_codegen.cpp
index b27783be829d6..7094decd41041 100644
--- a/clang/test/OpenMP/threadprivate_codegen.cpp
+++ b/clang/test/OpenMP/threadprivate_codegen.cpp
@@ -1260,8 +1260,8 @@ int foobar() {
// CHECK1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
// CHECK1-NEXT: to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
// CHECK1: invoke.cont2:
-// CHECK1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// CHECK1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// CHECK1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// CHECK1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
// CHECK1-NEXT: to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
// CHECK1: invoke.cont3:
// CHECK1-NEXT: store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -1273,8 +1273,8 @@ int foobar() {
// CHECK1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
// CHECK1-NEXT: to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
// CHECK1: invoke.cont8:
-// CHECK1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
-// CHECK1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// CHECK1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
+// CHECK1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
// CHECK1-NEXT: to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]]
// CHECK1: invoke.cont9:
// CHECK1-NEXT: [[TMP0:%.*]] = call i32 @__cxa_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]]
@@ -1783,8 +1783,8 @@ int foobar() {
// CHECK2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
// CHECK2-NEXT: to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
// CHECK2: invoke.cont2:
-// CHECK2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// CHECK2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// CHECK2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// CHECK2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
// CHECK2-NEXT: to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
// CHECK2: invoke.cont3:
// CHECK2-NEXT: store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -1796,8 +1796,8 @@ int foobar() {
// CHECK2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
// CHECK2-NEXT: to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
// CHECK2: invoke.cont8:
-// CHECK2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
-// CHECK2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// CHECK2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
+// CHECK2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
// CHECK2-NEXT: to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]]
// CHECK2: invoke.cont9:
// CHECK2-NEXT: [[TMP0:%.*]] = call i32 @__cxa_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]]
@@ -2458,8 +2458,8 @@ int foobar() {
// SIMD1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
// SIMD1-NEXT: to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
// SIMD1: invoke.cont2:
-// SIMD1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// SIMD1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// SIMD1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// SIMD1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
// SIMD1-NEXT: to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
// SIMD1: invoke.cont3:
// SIMD1-NEXT: store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -2471,8 +2471,8 @@ int foobar() {
// SIMD1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
// SIMD1-NEXT: to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
// SIMD1: invoke.cont8:
-// SIMD1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
-// SIMD1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// SIMD1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
+// SIMD1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
// SIMD1-NEXT: to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]]
// SIMD1: invoke.cont9:
// SIMD1-NEXT: [[TMP0:%.*]] = call i32 @__cxa_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]]
@@ -2922,8 +2922,8 @@ int foobar() {
// SIMD2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
// SIMD2-NEXT: to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]], !dbg [[DBG158:![0-9]+]]
// SIMD2: invoke.cont2:
-// SIMD2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8, !dbg [[DBG156]]
-// SIMD2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// SIMD2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8, !dbg [[DBG156]]
+// SIMD2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
// SIMD2-NEXT: to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]], !dbg [[DBG159:![0-9]+]]
// SIMD2: invoke.cont3:
// SIMD2-NEXT: store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8, !dbg [[DBG154]]
@@ -2935,8 +2935,8 @@ int foobar() {
// SIMD2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
// SIMD2-NEXT: to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]], !dbg [[DBG162:![0-9]+]]
// SIMD2: invoke.cont8:
-// SIMD2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8, !dbg [[DBG160]]
-// SIMD2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// SIMD2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8, !dbg [[DBG160]]
+// SIMD2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
// SIMD2-NEXT: to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]], !dbg [[DBG163:![0-9]+]]
// SIMD2: invoke.cont9:
// SIMD2-NEXT: [[TMP0:%.*]] = call i32 @__cxa_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]], !dbg [[DBG164:![0-9]+]]
@@ -3451,8 +3451,8 @@ int foobar() {
// CHECK-TLS1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
// CHECK-TLS1-NEXT: to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
// CHECK-TLS1: invoke.cont2:
-// CHECK-TLS1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// CHECK-TLS1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// CHECK-TLS1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// CHECK-TLS1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
// CHECK-TLS1-NEXT: to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
// CHECK-TLS1: invoke.cont3:
// CHECK-TLS1-NEXT: store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -3464,8 +3464,8 @@ int foobar() {
// CHECK-TLS1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
// CHECK-TLS1-NEXT: to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
// CHECK-TLS1: invoke.cont8:
-// CHECK-TLS1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
-// CHECK-TLS1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// CHECK-TLS1-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
+// CHECK-TLS1-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
// CHECK-TLS1-NEXT: to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]]
// CHECK-TLS1: invoke.cont9:
// CHECK-TLS1-NEXT: [[TMP0:%.*]] = call i32 @__cxa_thread_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]]
@@ -4178,8 +4178,8 @@ int foobar() {
// CHECK-TLS2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
// CHECK-TLS2-NEXT: to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
// CHECK-TLS2: invoke.cont2:
-// CHECK-TLS2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// CHECK-TLS2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// CHECK-TLS2-NEXT: store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// CHECK-TLS2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
// CHECK-TLS2-NEXT: to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
// CHECK-TLS2: invoke.cont3:
// CHECK-TLS2-NEXT: store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -4191,8 +4191,8 @@ int foobar() {
// CHECK-TLS2-NEXT: invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
// CHECK-TLS2-NEXT: to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
// CHECK-TLS2: invoke.con...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the DL-aware constant folding will take care of this anyway.
Can you point out where we do this fold?
I've only kept the straightforward zero-index case, where we just concatenate two GEPs.
Is there a test for this case?
It looks like we don't have an LLVM test, but it affects a few clang tests:
I kind of expected more clang fallout, so I'd be open to just dropping this fold from the DL-unaware folder completely, even the basic case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR looks good, and further dropping the fold is also good
This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the `@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the current implementation. One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.
This reverts commit e1cc9e4. This causes some non-trivial text size increases in unoptimized builds for Bullet. Revert while I investigate.
This makes codegen for array initialization simpler in two ways: 1. Drop the zero-index GEP at the start, which is no longer needed with opaque pointers. 2. Emit GEPs directly to the correct element, instead of having a long chain of +1 GEPs. This is more canonical, and also avoids regressions in unoptimized builds from llvm#93823.
This makes codegen for array initialization simpler in two ways: 1. Drop the zero-index GEP at the start, which is no longer needed with opaque pointers. 2. Emit GEPs directly to the correct element, instead of having a long chain of +1 GEPs. This is more canonical, and also avoids regressions in unoptimized builds from #93823.
This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the `@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the current implementation. One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.
Reapply after #93956, which changed clang array initialization codegen to avoid size regressions for unoptimized builds. ----- This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the `@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the current implementation. One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.
This is a followup to llvm#93823 and drop the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.
This is a followup to #93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.
Reapplying without changes. The flang+openmp buildbot failure should be addressed by #94541. ----- This is a followup to #93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.
Reapplying without changes. The flang+openmp buildbot failure should be addressed by llvm#94541. ----- This is a followup to llvm#93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.
This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the
@constexpr_gep_of_gep_with_narrow_type()
for a miscompile with the current implementation.One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.