TTI: Fix special casing vectorization costs of saturating add/sub #97463

arsenm · 2024-07-02T19:38:52Z

No description provided.

arsenm · 2024-07-02T19:39:06Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @arsenm and the rest of your teammates on Graphite

llvmbot · 2024-07-02T19:41:41Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-analysis

Author: Matt Arsenault (arsenm)

Changes

Patch is 1.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97463.diff

22 Files Affected:

(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+10-38)
(modified) llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll (+16-16)
(modified) llvm/test/Analysis/CostModel/AArch64/arith-usat.ll (+16-16)
(modified) llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll (+4-4)
(modified) llvm/test/Analysis/CostModel/ARM/active_lane_mask.ll (+12-12)
(modified) llvm/test/Analysis/CostModel/ARM/arith-ssat.ll (+214-214)
(modified) llvm/test/Analysis/CostModel/ARM/arith-usat.ll (+202-202)
(modified) llvm/test/Analysis/CostModel/RISCV/int-sat-math.ll (+16-16)
(modified) llvm/test/Analysis/CostModel/X86/arith-ssat-codesize.ll (+350-350)
(modified) llvm/test/Analysis/CostModel/X86/arith-ssat-latency.ll (+352-352)
(modified) llvm/test/Analysis/CostModel/X86/arith-ssat-sizelatency.ll (+351-351)
(modified) llvm/test/Analysis/CostModel/X86/arith-ssat.ll (+220-220)
(modified) llvm/test/Analysis/CostModel/X86/arith-usat-codesize.ll (+352-352)
(modified) llvm/test/Analysis/CostModel/X86/arith-usat-latency.ll (+352-352)
(modified) llvm/test/Analysis/CostModel/X86/arith-usat-sizelatency.ll (+352-352)
(modified) llvm/test/Analysis/CostModel/X86/arith-usat.ll (+148-148)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-saddsatcost.ll (+2-2)
(modified) llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll (+309-109)
(modified) llvm/test/Transforms/SLPVectorizer/X86/arith-add-usat.ll (-54)
(modified) llvm/test/Transforms/SLPVectorizer/X86/arith-sub-ssat.ll (+309-109)
(modified) llvm/test/Transforms/SLPVectorizer/X86/arith-sub-usat.ll (+16-32)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 4f1dc9f991c06..2e0f5ee6ea6bd 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2129,45 +2129,17 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
       return Cost;
     }
     case Intrinsic::sadd_sat:
-    case Intrinsic::ssub_sat: {
-      Type *CondTy = RetTy->getWithNewBitWidth(1);
-
-      Type *OpTy = StructType::create({RetTy, CondTy});
-      Intrinsic::ID OverflowOp = IID == Intrinsic::sadd_sat
-                                     ? Intrinsic::sadd_with_overflow
-                                     : Intrinsic::ssub_with_overflow;
-      CmpInst::Predicate Pred = CmpInst::ICMP_SGT;
-
-      // SatMax -> Overflow && SumDiff < 0
-      // SatMin -> Overflow && SumDiff >= 0
-      InstructionCost Cost = 0;
-      IntrinsicCostAttributes Attrs(OverflowOp, OpTy, {RetTy, RetTy}, FMF,
-                                    nullptr, ScalarizationCostPassed);
-      Cost += thisT()->getIntrinsicInstrCost(Attrs, CostKind);
-      Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, RetTy, CondTy,
-                                          Pred, CostKind);
-      Cost += 2 * thisT()->getCmpSelInstrCost(BinaryOperator::Select, RetTy,
-                                              CondTy, Pred, CostKind);
-      return Cost;
-    }
+      ISD = ISD::SADDSAT;
+      break;
+    case Intrinsic::ssub_sat:
+      ISD = ISD::SSUBSAT;
+      break;
     case Intrinsic::uadd_sat:
-    case Intrinsic::usub_sat: {
-      Type *CondTy = RetTy->getWithNewBitWidth(1);
-
-      Type *OpTy = StructType::create({RetTy, CondTy});
-      Intrinsic::ID OverflowOp = IID == Intrinsic::uadd_sat
-                                     ? Intrinsic::uadd_with_overflow
-                                     : Intrinsic::usub_with_overflow;
-
-      InstructionCost Cost = 0;
-      IntrinsicCostAttributes Attrs(OverflowOp, OpTy, {RetTy, RetTy}, FMF,
-                                    nullptr, ScalarizationCostPassed);
-      Cost += thisT()->getIntrinsicInstrCost(Attrs, CostKind);
-      Cost +=
-          thisT()->getCmpSelInstrCost(BinaryOperator::Select, RetTy, CondTy,
-                                      CmpInst::BAD_ICMP_PREDICATE, CostKind);
-      return Cost;
-    }
+      ISD = ISD::UADDSAT;
+      break;
+    case Intrinsic::usub_sat:
+      ISD = ISD::USUBSAT;
+      break;
     case Intrinsic::smul_fix:
     case Intrinsic::umul_fix: {
       unsigned ExtSize = RetTy->getScalarSizeInBits() * 2;
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll b/llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll
index 2267d9b88c970..d79eee53ecb48 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll
@@ -33,23 +33,23 @@ declare <64 x i8>  @llvm.sadd.sat.v64i8(<64 x i8>, <64 x i8>)
 
 define i32 @add(i32 %arg) {
 ; RECIP-LABEL: 'add'
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = call i64 @llvm.sadd.sat.i64(i64 undef, i64 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I64 = call i64 @llvm.sadd.sat.i64(i64 undef, i64 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.sadd.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I32 = call i32 @llvm.sadd.sat.i32(i32 undef, i32 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I32 = call i32 @llvm.sadd.sat.i32(i32 undef, i32 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.sadd.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V3I32 = call <3 x i32> @llvm.sadd.sat.v3i32(<3 x i32> undef, <3 x i32> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I16 = call i16 @llvm.sadd.sat.i16(i16 undef, i16 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I16 = call i16 @llvm.sadd.sat.i16(i16 undef, i16 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.sadd.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.sadd.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I8 = call i8 @llvm.sadd.sat.i8(i8 undef, i8 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I8 = call i8 @llvm.sadd.sat.i8(i8 undef, i8 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.sadd.sat.v2i8(<2 x i8> undef, <2 x i8> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.sadd.sat.v4i8(<4 x i8> undef, <4 x i8> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = call <8 x i8> @llvm.sadd.sat.v8i8(<8 x i8> undef, <8 x i8> undef)
@@ -59,23 +59,23 @@ define i32 @add(i32 %arg) {
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
 ;
 ; SIZE-LABEL: 'add'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = call i64 @llvm.sadd.sat.i64(i64 undef, i64 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I64 = call i64 @llvm.sadd.sat.i64(i64 undef, i64 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.sadd.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I32 = call i32 @llvm.sadd.sat.i32(i32 undef, i32 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = call i32 @llvm.sadd.sat.i32(i32 undef, i32 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.sadd.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V3I32 = call <3 x i32> @llvm.sadd.sat.v3i32(<3 x i32> undef, <3 x i32> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I16 = call i16 @llvm.sadd.sat.i16(i16 undef, i16 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = call i16 @llvm.sadd.sat.i16(i16 undef, i16 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.sadd.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.sadd.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I8 = call i8 @llvm.sadd.sat.i8(i8 undef, i8 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = call i8 @llvm.sadd.sat.i8(i8 undef, i8 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.sadd.sat.v2i8(<2 x i8> undef, <2 x i8> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.sadd.sat.v4i8(<4 x i8> undef, <4 x i8> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = call <8 x i8> @llvm.sadd.sat.v8i8(<8 x i8> undef, <8 x i8> undef)
@@ -142,22 +142,22 @@ declare <64 x i8>  @llvm.ssub.sat.v64i8(<64 x i8>, <64 x i8>)
 
 define i32 @sub(i32 %arg) {
 ; RECIP-LABEL: 'sub'
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = call i64 @llvm.ssub.sat.i64(i64 undef, i64 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I64 = call i64 @llvm.ssub.sat.i64(i64 undef, i64 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.ssub.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.ssub.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I32 = call i32 @llvm.ssub.sat.i32(i32 undef, i32 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I32 = call i32 @llvm.ssub.sat.i32(i32 undef, i32 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.ssub.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.ssub.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I16 = call i16 @llvm.ssub.sat.i16(i16 undef, i16 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I16 = call i16 @llvm.ssub.sat.i16(i16 undef, i16 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.ssub.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.ssub.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.ssub.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I8 = call i8 @llvm.ssub.sat.i8(i8 undef, i8 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I8 = call i8 @llvm.ssub.sat.i8(i8 undef, i8 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.ssub.sat.v2i8(<2 x i8> undef, <2 x i8> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.ssub.sat.v4i8(<4 x i8> undef, <4 x i8> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = call <8 x i8> @llvm.ssub.sat.v8i8(<8 x i8> undef, <8 x i8> undef)
@@ -167,22 +167,22 @@ define i32 @sub(i32 %arg) {
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
 ;
 ; SIZE-LABEL: 'sub'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = call i64 @llvm.ssub.sat.i64(i64 undef, i64 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I64 = call i64 @llvm.ssub.sat.i64(i64 undef, i64 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.ssub.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.ssub.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I32 = call i32 @llvm.ssub.sat.i32(i32 undef, i32 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = call i32 @llvm.ssub.sat.i32(i32 undef, i32 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.ssub.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.ssub.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I16 = call i16 @llvm.ssub.sat.i16(i16 undef, i16 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = call i16 @llvm.ssub.sat.i16(i16 undef, i16 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.ssub.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.ssub.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.ssub.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I8 = call i8 @llvm.ssub.sat.i8(i8 undef, i8 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = call i8 @llvm.ssub.sat.i8(i8 undef, i8 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.ssub.sat.v2i8(<2 x i8> undef, <2 x i8> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.ssub.sat.v4i8(<4 x i8> undef, <4 x i8> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = call <8 x i8> @llvm.ssub.sat.v8i8(<8 x i8> undef, <8 x i8> undef)
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-usat.ll b/llvm/test/Analysis/CostModel/AArch64/arith-usat.ll
index 5a131f23847b1..c022a79ec2f38 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-usat.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-usat.ll
@@ -32,22 +32,22 @@ declare <64 x i8>  @llvm.uadd.sat.v64i8(<64 x i8>, <64 x i8>)
 
 define i32 @add(i32 %arg) {
 ; RECIP-LABEL: 'add'
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %I64 = call i64 @llvm.uadd.sat.i64(i64 undef, i64 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I64 = call i64 @llvm.uadd.sat.i64(i64 undef, i64 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.uadd.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.uadd.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.uadd.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %I32 = call i32 @llvm.uadd.sat.i32(i32 undef, i32 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I32 = call i32 @llvm.uadd.sat.i32(i32 undef, i32 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.uadd.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.uadd.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.uadd.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I16 = call i16 @llvm.uadd.sat.i16(i16 undef, i16 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I16 = call i16 @llvm.uadd.sat.i16(i16 undef, i16 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.uadd.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.uadd.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.uadd.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I8 = call i8 @llvm.uadd.sat.i8(i8 undef, i8 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I8 = call i8 @llvm.uadd.sat.i8(i8 undef, i8 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.uad...
[truncated]

llvmbot · 2024-07-02T19:41:42Z

@llvm/pr-subscribers-llvm-transforms

Author: Matt Arsenault (arsenm)

Changes

Patch is 1.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97463.diff

22 Files Affected:

(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+10-38)
(modified) llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll (+16-16)
(modified) llvm/test/Analysis/CostModel/AArch64/arith-usat.ll (+16-16)
(modified) llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll (+4-4)
(modified) llvm/test/Analysis/CostModel/ARM/active_lane_mask.ll (+12-12)
(modified) llvm/test/Analysis/CostModel/ARM/arith-ssat.ll (+214-214)
(modified) llvm/test/Analysis/CostModel/ARM/arith-usat.ll (+202-202)
(modified) llvm/test/Analysis/CostModel/RISCV/int-sat-math.ll (+16-16)
(modified) llvm/test/Analysis/CostModel/X86/arith-ssat-codesize.ll (+350-350)
(modified) llvm/test/Analysis/CostModel/X86/arith-ssat-latency.ll (+352-352)
(modified) llvm/test/Analysis/CostModel/X86/arith-ssat-sizelatency.ll (+351-351)
(modified) llvm/test/Analysis/CostModel/X86/arith-ssat.ll (+220-220)
(modified) llvm/test/Analysis/CostModel/X86/arith-usat-codesize.ll (+352-352)
(modified) llvm/test/Analysis/CostModel/X86/arith-usat-latency.ll (+352-352)
(modified) llvm/test/Analysis/CostModel/X86/arith-usat-sizelatency.ll (+352-352)
(modified) llvm/test/Analysis/CostModel/X86/arith-usat.ll (+148-148)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-saddsatcost.ll (+2-2)
(modified) llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll (+309-109)
(modified) llvm/test/Transforms/SLPVectorizer/X86/arith-add-usat.ll (-54)
(modified) llvm/test/Transforms/SLPVectorizer/X86/arith-sub-ssat.ll (+309-109)
(modified) llvm/test/Transforms/SLPVectorizer/X86/arith-sub-usat.ll (+16-32)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 4f1dc9f991c06..2e0f5ee6ea6bd 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2129,45 +2129,17 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
       return Cost;
     }
     case Intrinsic::sadd_sat:
-    case Intrinsic::ssub_sat: {
-      Type *CondTy = RetTy->getWithNewBitWidth(1);
-
-      Type *OpTy = StructType::create({RetTy, CondTy});
-      Intrinsic::ID OverflowOp = IID == Intrinsic::sadd_sat
-                                     ? Intrinsic::sadd_with_overflow
-                                     : Intrinsic::ssub_with_overflow;
-      CmpInst::Predicate Pred = CmpInst::ICMP_SGT;
-
-      // SatMax -> Overflow && SumDiff < 0
-      // SatMin -> Overflow && SumDiff >= 0
-      InstructionCost Cost = 0;
-      IntrinsicCostAttributes Attrs(OverflowOp, OpTy, {RetTy, RetTy}, FMF,
-                                    nullptr, ScalarizationCostPassed);
-      Cost += thisT()->getIntrinsicInstrCost(Attrs, CostKind);
-      Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, RetTy, CondTy,
-                                          Pred, CostKind);
-      Cost += 2 * thisT()->getCmpSelInstrCost(BinaryOperator::Select, RetTy,
-                                              CondTy, Pred, CostKind);
-      return Cost;
-    }
+      ISD = ISD::SADDSAT;
+      break;
+    case Intrinsic::ssub_sat:
+      ISD = ISD::SSUBSAT;
+      break;
     case Intrinsic::uadd_sat:
-    case Intrinsic::usub_sat: {
-      Type *CondTy = RetTy->getWithNewBitWidth(1);
-
-      Type *OpTy = StructType::create({RetTy, CondTy});
-      Intrinsic::ID OverflowOp = IID == Intrinsic::uadd_sat
-                                     ? Intrinsic::uadd_with_overflow
-                                     : Intrinsic::usub_with_overflow;
-
-      InstructionCost Cost = 0;
-      IntrinsicCostAttributes Attrs(OverflowOp, OpTy, {RetTy, RetTy}, FMF,
-                                    nullptr, ScalarizationCostPassed);
-      Cost += thisT()->getIntrinsicInstrCost(Attrs, CostKind);
-      Cost +=
-          thisT()->getCmpSelInstrCost(BinaryOperator::Select, RetTy, CondTy,
-                                      CmpInst::BAD_ICMP_PREDICATE, CostKind);
-      return Cost;
-    }
+      ISD = ISD::UADDSAT;
+      break;
+    case Intrinsic::usub_sat:
+      ISD = ISD::USUBSAT;
+      break;
     case Intrinsic::smul_fix:
     case Intrinsic::umul_fix: {
       unsigned ExtSize = RetTy->getScalarSizeInBits() * 2;
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll b/llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll
index 2267d9b88c970..d79eee53ecb48 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-ssat.ll
@@ -33,23 +33,23 @@ declare <64 x i8>  @llvm.sadd.sat.v64i8(<64 x i8>, <64 x i8>)
 
 define i32 @add(i32 %arg) {
 ; RECIP-LABEL: 'add'
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = call i64 @llvm.sadd.sat.i64(i64 undef, i64 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I64 = call i64 @llvm.sadd.sat.i64(i64 undef, i64 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.sadd.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I32 = call i32 @llvm.sadd.sat.i32(i32 undef, i32 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I32 = call i32 @llvm.sadd.sat.i32(i32 undef, i32 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.sadd.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V3I32 = call <3 x i32> @llvm.sadd.sat.v3i32(<3 x i32> undef, <3 x i32> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I16 = call i16 @llvm.sadd.sat.i16(i16 undef, i16 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I16 = call i16 @llvm.sadd.sat.i16(i16 undef, i16 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.sadd.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.sadd.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I8 = call i8 @llvm.sadd.sat.i8(i8 undef, i8 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I8 = call i8 @llvm.sadd.sat.i8(i8 undef, i8 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.sadd.sat.v2i8(<2 x i8> undef, <2 x i8> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.sadd.sat.v4i8(<4 x i8> undef, <4 x i8> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = call <8 x i8> @llvm.sadd.sat.v8i8(<8 x i8> undef, <8 x i8> undef)
@@ -59,23 +59,23 @@ define i32 @add(i32 %arg) {
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
 ;
 ; SIZE-LABEL: 'add'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = call i64 @llvm.sadd.sat.i64(i64 undef, i64 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I64 = call i64 @llvm.sadd.sat.i64(i64 undef, i64 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.sadd.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I32 = call i32 @llvm.sadd.sat.i32(i32 undef, i32 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = call i32 @llvm.sadd.sat.i32(i32 undef, i32 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.sadd.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V3I32 = call <3 x i32> @llvm.sadd.sat.v3i32(<3 x i32> undef, <3 x i32> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I16 = call i16 @llvm.sadd.sat.i16(i16 undef, i16 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = call i16 @llvm.sadd.sat.i16(i16 undef, i16 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.sadd.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.sadd.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I8 = call i8 @llvm.sadd.sat.i8(i8 undef, i8 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = call i8 @llvm.sadd.sat.i8(i8 undef, i8 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.sadd.sat.v2i8(<2 x i8> undef, <2 x i8> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.sadd.sat.v4i8(<4 x i8> undef, <4 x i8> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = call <8 x i8> @llvm.sadd.sat.v8i8(<8 x i8> undef, <8 x i8> undef)
@@ -142,22 +142,22 @@ declare <64 x i8>  @llvm.ssub.sat.v64i8(<64 x i8>, <64 x i8>)
 
 define i32 @sub(i32 %arg) {
 ; RECIP-LABEL: 'sub'
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = call i64 @llvm.ssub.sat.i64(i64 undef, i64 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I64 = call i64 @llvm.ssub.sat.i64(i64 undef, i64 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.ssub.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.ssub.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I32 = call i32 @llvm.ssub.sat.i32(i32 undef, i32 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I32 = call i32 @llvm.ssub.sat.i32(i32 undef, i32 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.ssub.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.ssub.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I16 = call i16 @llvm.ssub.sat.i16(i16 undef, i16 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I16 = call i16 @llvm.ssub.sat.i16(i16 undef, i16 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.ssub.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.ssub.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.ssub.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I8 = call i8 @llvm.ssub.sat.i8(i8 undef, i8 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I8 = call i8 @llvm.ssub.sat.i8(i8 undef, i8 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.ssub.sat.v2i8(<2 x i8> undef, <2 x i8> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.ssub.sat.v4i8(<4 x i8> undef, <4 x i8> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = call <8 x i8> @llvm.ssub.sat.v8i8(<8 x i8> undef, <8 x i8> undef)
@@ -167,22 +167,22 @@ define i32 @sub(i32 %arg) {
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
 ;
 ; SIZE-LABEL: 'sub'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = call i64 @llvm.ssub.sat.i64(i64 undef, i64 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I64 = call i64 @llvm.ssub.sat.i64(i64 undef, i64 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.ssub.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.ssub.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I32 = call i32 @llvm.ssub.sat.i32(i32 undef, i32 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = call i32 @llvm.ssub.sat.i32(i32 undef, i32 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.ssub.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.ssub.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I16 = call i16 @llvm.ssub.sat.i16(i16 undef, i16 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = call i16 @llvm.ssub.sat.i16(i16 undef, i16 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.ssub.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.ssub.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.ssub.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %I8 = call i8 @llvm.ssub.sat.i8(i8 undef, i8 undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = call i8 @llvm.ssub.sat.i8(i8 undef, i8 undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.ssub.sat.v2i8(<2 x i8> undef, <2 x i8> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.ssub.sat.v4i8(<4 x i8> undef, <4 x i8> undef)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = call <8 x i8> @llvm.ssub.sat.v8i8(<8 x i8> undef, <8 x i8> undef)
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-usat.ll b/llvm/test/Analysis/CostModel/AArch64/arith-usat.ll
index 5a131f23847b1..c022a79ec2f38 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-usat.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-usat.ll
@@ -32,22 +32,22 @@ declare <64 x i8>  @llvm.uadd.sat.v64i8(<64 x i8>, <64 x i8>)
 
 define i32 @add(i32 %arg) {
 ; RECIP-LABEL: 'add'
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %I64 = call i64 @llvm.uadd.sat.i64(i64 undef, i64 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I64 = call i64 @llvm.uadd.sat.i64(i64 undef, i64 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = call <2 x i64> @llvm.uadd.sat.v2i64(<2 x i64> undef, <2 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I64 = call <4 x i64> @llvm.uadd.sat.v4i64(<4 x i64> undef, <4 x i64> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8I64 = call <8 x i64> @llvm.uadd.sat.v8i64(<8 x i64> undef, <8 x i64> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %I32 = call i32 @llvm.uadd.sat.i32(i32 undef, i32 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I32 = call i32 @llvm.uadd.sat.i32(i32 undef, i32 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = call <2 x i32> @llvm.uadd.sat.v2i32(<2 x i32> undef, <2 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> undef, <4 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.uadd.sat.v8i32(<8 x i32> undef, <8 x i32> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.uadd.sat.v16i32(<16 x i32> undef, <16 x i32> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I16 = call i16 @llvm.uadd.sat.i16(i16 undef, i16 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I16 = call i16 @llvm.uadd.sat.i16(i16 undef, i16 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I16 = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> undef, <2 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = call <4 x i16> @llvm.uadd.sat.v4i16(<4 x i16> undef, <4 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> undef, <8 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.uadd.sat.v16i16(<16 x i16> undef, <16 x i16> undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V32I16 = call <32 x i16> @llvm.uadd.sat.v32i16(<32 x i16> undef, <32 x i16> undef)
-; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I8 = call i8 @llvm.uadd.sat.i8(i8 undef, i8 undef)
+; RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I8 = call i8 @llvm.uadd.sat.i8(i8 undef, i8 undef)
 ; RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.uad...
[truncated]

sparker-arm

I think the BasicTTI changes make sense. But for any backends, such as AArch64, which have already made an attempt to cost the saturating intrinsics, it looks like we're now losing out.

I'm wondering if using getArithmeticInstrCost would help in the general sense even though, from a brief look, the AArch64 backend wouldn't handle it properly there either.

arsenm · 2024-07-03T09:00:59Z

or any backends, such as AArch64, which have already made an attempt to cost the saturating intrinsics, it looks like we're now losing out.

How is this attempting to cost these? This was just adding up the cost of the assumed expansion. I don't think any of this code here adding up a hypothetical expansions is reasonable to have as a default

I'm wondering if using getArithmeticInstrCost would help in the general sense even though, from a brief look, the AArch64 backend wouldn't handle it properly there either.

I have no idea what's going on with the soup of TTI cost functions. It's like there are 3-10 competing APIs intermixed

davemgreen · 2024-07-03T09:19:20Z

We have overrides for legal operations in ARMTTIImpl::getIntrinsicInstrCost and AArch64TTIImpl::getIntrinsicInstrCost. If the operation gets expanded we should be producing some cost of what the (default) expansion cost is likely to be though, much like abs or min/max above.

I believe the logic should be:

If the ISD operations is legal or promote -> treat it as cheap.
else use the cost of the expanded form.

It is similar to how fmuladd is performed at the moment. For more precise costs the target can override getIntrinsicInstrCost.

sparker-arm · 2024-07-03T09:50:24Z

I have no idea what's going on with the soup of TTI cost functions.

Fair! The glories of mangling IR instructions, intrinsic calls and ISD nodes all in one, multi-layered, mess :)

For more precise costs the target can override getIntrinsicInstrCost.

I'm not sure I follow why intrinsics would/should be more precise? In this case, doesn't the intrinsic map directly to the ISD node?

davemgreen · 2024-07-03T10:55:50Z

I'm not sure I follow why intrinsics would/should be more precise? In this case, doesn't the intrinsic map directly to the ISD node?

It can if it is legal (say a v4i32 sadd_sat under aarch64), but if it is expand (say a i32 sadd_sat under AArch64) the cost model will just say return SingleCallCost. It scalarizes vectors, and if the node is custom lowered it will just return "LT.first * 2".

llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h

Line 2330 in 298e292

} else if (!TLI->isOperationExpand(ISD, LT.second)) {

So that's not very accurate, and having a fallback to at least some expansion would seem better for any operation that isn't transformed into a libcall. If an individual target needed something more precise for what it will do, that can be overridden in the target cost model.

RKSimon · 2024-07-03T11:32:13Z

Can we keep the existing cost estimate code, but instead of returning Cost just stash it as a optional value and fallback to it if the ISD legality fails?

arsenm · 2024-07-24T20:17:53Z

For more precise costs the target can override getIntrinsicInstrCost.

I'm not sure I follow why intrinsics would/should be more precise? In this case, doesn't the intrinsic map directly to the ISD node?

Because to compute an exact cost you need to know the exact expansion is going to use. The current code assumes a specific expansion, adds up the costs of those pieces. This overestimate the cost when the underlying operation is actually legal. The default cost for an arbitrary expand ISD opcode is some fixed constant

arsenm · 2024-07-24T20:19:54Z

Can we keep the existing cost estimate code, but instead of returning Cost just stash it as a optional value and fallback to it if the ISD legality fails?

I'm not sure what it means for the ISD legality to fail? I also don't know why some intrinsics have this expansion testing, and others don't

davemgreen

The llvm.sadd.sat.i64 Arm costs are a bit odd - I assume because getTypeLegalizationCost(i64) will be {2, i32}, and it uses the LT.first * 2 as the i32 is legal. The other results seem OK but that one doesn't sound right for this operation. I could add an override in the Arm backend if you think that is the best way forward - the other results look better, including the SVE versions that we don't currently have tests for.

Summary: Fixes regressions from #97463 due to missing costs for custom lowered ops Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250540

Summary: Noticed due to x86 changes in #97463 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250555

davemgreen · 2024-07-26T12:07:16Z

How do we want to handle for things like @llvm.sadd.sat.i64 (on Arm, or i128 on 64bit systems)? With this patch they will be costed as 2 * cost(i32). I don't mind adding custom cost model rules if needed, but it feels like something that the generic cost model should be getting more correct.

arsenm · 2024-07-26T12:28:35Z

How do we want to handle for things like @llvm.sadd.sat.i64 (on Arm, or i128 on 64bit systems)? With this patch they will be costed as 2 * cost(i32). I don't mind adding custom cost model rules if needed, but it feels like something that the generic cost model should be getting more correct.

It's probably not the best default for custom lowering, but once something is custom, backend intervention is kind of unavoidable. If we want to change the default custom cost heuristic, that would be a bigger and separate change.

davemgreen · 2024-07-26T15:26:30Z

I think I agree for custom, but AFAIU this isn't about that exactly (I should probably have picked a better word). It's about getTypeLegalizationCost(i64) == {i32, 2}, and whether it is sensible to use an i32 cost for a i64 operation.

We would hit the same problem on an i64 system if getTypeLegalizationCost(i128) == {i64, 2}. The same is true for any other larger-than-legal type and I'm not sure if the best answer is to try and get ever target to add the correct costs for every type that isn't legal. It's OK to override custom costs for custom operations but for these it feels like there should be a more generic way to handle them.

I haven't tried it with this patch (and it probably deserves to be a separate revision), but something like this might work, even if it is still not perfect.

    const TargetLoweringBase *TLI = getTLI();
    std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(RetTy);

    if (LT.second.getScalarSizeInBits() >= RetTy->getScalarSizeInBits() &&
        TLI->isOperationLegalOrPromote(ISD, LT.second)) {
      if (IID == Intrinsic::fabs && LT.second.isFloatingPoint() &&
          TLI->isFAbsFree(LT.second)) {
        return 0;
      }

      // The operation is legal. Assume it costs 1.
      return LT.first;
    }

    // If we can't lower fmuladd into an FMA estimate the cost as a floating
    // point mul followed by an add.
    if (IID == Intrinsic::fmuladd)
      return thisT()->getArithmeticInstrCost(BinaryOperator::FMul, RetTy,
                                             CostKind) +
             thisT()->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy,
                                             CostKind);
    if (IID == Intrinsic::experimental_constrained_fmuladd) {
      IntrinsicCostAttributes FMulAttrs(
        Intrinsic::experimental_constrained_fmul, RetTy, Tys);
      IntrinsicCostAttributes FAddAttrs(
        Intrinsic::experimental_constrained_fadd, RetTy, Tys);
      return thisT()->getIntrinsicInstrCost(FMulAttrs, CostKind) +
             thisT()->getIntrinsicInstrCost(FAddAttrs, CostKind);
    }

    // More of these..

    if (!TLI->isOperationExpand(ISD, LT.second)) {
      // If the operation is custom lowered or .. then assume
      // that the code is twice as expensive.
      return (LT.first * 2);
    }

    // Else, assume that we...

arsenm · 2024-07-26T19:44:02Z

We would hit the same problem on an i64 system if getTypeLegalizationCost(i128) == {i64, 2}. The same is true for any other larger-than-legal type and I'm not sure if the best answer is to try and get ever target to add the correct costs for every type that isn't legal. It's OK to override custom costs for custom operations but for these it feels like there should be a more generic way to handle them.

So it's a question of what the default custom lowered cost is in the presence of type legalization. The code here is also ignoring the possibility of targets custom lowering with illegal types. It's pretty terrible that this needs to guess at the all the steps legalization might take later

davemgreen · 2024-07-29T09:17:07Z

So it's a question of what the default custom lowered cost is in the presence of type legalization. The code here is also ignoring the possibility of targets custom lowering with illegal types. It's pretty terrible that this needs to guess at the all the steps legalization might take later

That's kind of the point of the cost-model. I've updated some of these costs in #100988.

IMO it is easy for a target to get the legal operations correct as when people implement support for them they are already thinking about them. Custom should be OK too for the most part. But for expanded nodes the target might not have even thought about them, and the surface area is so much larger. The generic cost model should give a decent cost if it can. Obviously it is better for it to give decent costs for as many cases as it can.

I believe if #100988 goes in then what remains here should LGTM.

RKSimon

I've no objections to this if the ARM guys are OK with it - we're setting a precedent that custom costs should be set in the TTI, so we should probably make that clear with an extra comment in the "!TLI->isOperationExpand(ISD, LT.second)" description and that "cost * 2" is just a fallback. Once we're agreed on this on the other intrinsic costs PRs should be straightforward. I'm also going to make a bit more of an effort to avoid unnecessary custom lowerings and use the TargetLowering generic expansions more.

arsenm · 2024-08-06T05:54:54Z

I've no objections to this if the ARM guys are OK with it - we're setting a precedent that custom costs should be set in the TTI

I don't see how this is setting a precedent. This is normalizing what happened for most opcodes, other than this set that didn't map the intrinsic to the ISD opcode. Conceptually custom would always require custom handling for an accurate result

davemgreen

I think this unfortunately also sets the precedent that some nodes that are not legal/custom also need to be handled by the target.

Having said that the Arm/AArch64 costs look OK to me. LGTM

arsenm · 2024-08-06T07:07:17Z

It doesn't really make sense to me to assume custom is cheaper than expand. I would probably just change the default custom cost to be the same

RKSimon · 2024-08-06T07:42:25Z

in which case why not just drop the !TLI->isOperationExpand(ISD, LT.second) case entirely?

arsenm · 2024-08-06T08:00:59Z

in which case why not just drop the !TLI->isOperationExpand(ISD, LT.second) case entirely?

That's a broader scoped change than this patch, which is just trying to send the intrinsic down the normal ISD path.

RKSimon

OK - cheers

arsenm added the vectorizers label Jul 2, 2024 — with Graphite App

arsenm requested review from alexey-bataev, nikic, RKSimon, rotateright, sdesmalen-arm and sparker-arm July 2, 2024 19:41

arsenm marked this pull request as ready for review July 2, 2024 19:41

llvmbot added backend:X86 llvm:analysis llvm:transforms labels Jul 2, 2024

sparker-arm reviewed Jul 3, 2024

View reviewed changes

arsenm force-pushed the users/arsenm/add-sub-sat-cost-model-cleanup branch from fd197af to 9b608af Compare July 24, 2024 20:42

This was referenced Jul 25, 2024

AMDGPU: Add baseline test for vectorize of integer min/max #100513

Merged

TTI: Check legalization cost of min/max ISD nodes #100514

Merged

davemgreen reviewed Jul 25, 2024

View reviewed changes

arsenm force-pushed the users/arsenm/add-sub-sat-cost-model-cleanup branch from ebc2c5b to dd9c04c Compare July 25, 2024 07:03

This was referenced Jul 25, 2024

TTI: Check legalization cost of add/sub overflow ISD nodes #100518

Merged

TTI: Check legalization cost of mul overflow ISD nodes #100519

Merged

TTI: Check legalization cost of mulfix ISD nodes #100520

Merged

arsenm force-pushed the users/arsenm/amdgpu-add-baseline-tti-cost-abs branch 2 times, most recently from eb121f6 to c9b0504 Compare July 25, 2024 20:49

Base automatically changed from users/arsenm/amdgpu-add-baseline-tti-cost-abs to main July 25, 2024 20:51

arsenm force-pushed the users/arsenm/add-sub-sat-cost-model-cleanup branch from ae93c88 to 879d467 Compare July 25, 2024 20:53

arsenm force-pushed the users/arsenm/add-sub-sat-cost-model-cleanup branch from 879d467 to 0d70311 Compare July 26, 2024 19:59

arsenm mentioned this pull request Jul 26, 2024

AMDGPU: Correct costs of saturating add/sub intrinsics #100808

Merged

arsenm force-pushed the users/arsenm/add-sub-sat-cost-model-cleanup branch from 0d70311 to 1030345 Compare July 28, 2024 13:48

arsenm force-pushed the users/arsenm/add-sub-sat-cost-model-cleanup branch from 1030345 to 2bfe33a Compare August 2, 2024 16:24

arsenm added 3 commits August 6, 2024 01:06

TTI: Fix special casing vectorization costs of saturating add/sub

e3c26b1

compute approx default expansion cost in expand case

d643c8c

Use switch

8317027

arsenm force-pushed the users/arsenm/add-sub-sat-cost-model-cleanup branch from 2bfe33a to 8317027 Compare August 5, 2024 21:06

RKSimon approved these changes Aug 5, 2024

View reviewed changes

davemgreen approved these changes Aug 6, 2024

View reviewed changes

RKSimon approved these changes Aug 6, 2024

View reviewed changes

arsenm merged commit 4f067dc into main Aug 6, 2024
7 checks passed

arsenm deleted the users/arsenm/add-sub-sat-cost-model-cleanup branch August 6, 2024 13:33

TTI: Fix special casing vectorization costs of saturating add/sub #97463

TTI: Fix special casing vectorization costs of saturating add/sub #97463

Uh oh!

Conversation

arsenm commented Jul 2, 2024

Uh oh!

arsenm commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 2, 2024

Uh oh!

sparker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Jul 3, 2024

Uh oh!

davemgreen commented Jul 3, 2024

Uh oh!

sparker-arm commented Jul 3, 2024

Uh oh!

davemgreen commented Jul 3, 2024

Uh oh!

RKSimon commented Jul 3, 2024

Uh oh!

arsenm commented Jul 24, 2024

Uh oh!

arsenm commented Jul 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

davemgreen commented Jul 26, 2024

Uh oh!

arsenm commented Jul 26, 2024

Uh oh!

davemgreen commented Jul 26, 2024

Uh oh!

arsenm commented Jul 26, 2024

Uh oh!

davemgreen commented Jul 29, 2024

Uh oh!

RKSimon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Aug 6, 2024

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Aug 6, 2024

Uh oh!

RKSimon commented Aug 6, 2024

Uh oh!

arsenm commented Aug 6, 2024

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arsenm commented Jul 2, 2024 •

edited

Loading

llvmbot commented Jul 2, 2024 •

edited

Loading

arsenm commented Jul 24, 2024 •

edited

Loading

RKSimon left a comment •

edited

Loading