[RISCV][TTI] Implement vector costs for `llvm.fpto{u|s}i.sat()`. #143655

ElvisWang123 · 2025-06-11T05:26:05Z

This patch implement vector costs for llvm.fptoui.sat() in RISCV TTI.

llvmbot · 2025-06-11T05:26:32Z

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-analysis

Author: Elvis Wang (ElvisWang123)

Changes

This patch implement vector costs for llvm.fptoui.sat() in RISCV TTI that prevent it fall back to basic TTI.

Without this patch, the cost is cheap because it will fall back to the BasicTTIImpl::getTypeBasedIntrinsicInstrCost()
Every intrinsics with legal operations will return a very cheap cost for expansion.

llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h

Lines 2511 to 2529 in a3201ce

    
           if (TLI->isOperationLegalOrPromote(ISD, LT.second)) { 
        
             if (IID == Intrinsic::fabs && LT.second.isFloatingPoint() && 
        
                 TLI->isFAbsFree(LT.second)) { 
        
               return 0; 
        
             } 
        
             // The operation is legal. Assume it costs 1. 
        
             // If the type is split to multiple registers, assume that there is some 
        
             // overhead to this. 
        
             // TODO: Once we have extract/insert subvector cost we need to use them. 
        
             if (LT.first > 1) 
        
               return (LT.first * 2); 
        
             else 
        
               return (LT.first * 1); 
        
           } else if (TLI->isOperationCustom(ISD, LT.second)) { 
        
             // If the operation is custom lowered then assume 
        
             // that the code is twice as expensive. 
        
             return (LT.first * 2); 
        
           }

Based on the discussions in #97463, IIUC I implement this in target TTI.

This also fixes #142973.

Patch is 78.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143655.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+27)
(added) llvm/test/Analysis/CostModel/RISCV/cast-sat.ll (+583)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index d5ea0c5d52293..a48358167c2bb 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1497,6 +1497,33 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
                           cast<VectorType>(ICA.getArgTypes()[0]), {}, CostKind,
                           0, cast<VectorType>(ICA.getReturnType()));
   }
+  case Intrinsic::fptoui_sat:
+  case Intrinsic::fptosi_sat: {
+    InstructionCost Cost = 0;
+    bool IsSigned = ICA.getID() == Intrinsic::fptosi_sat;
+    Type *SrcTy = ICA.getArgTypes()[0];
+
+    auto SrcLT = getTypeLegalizationCost(SrcTy);
+    auto DstLT = getTypeLegalizationCost(RetTy);
+    if (!SrcLT.first.isValid() || !DstLT.first.isValid())
+      return InstructionCost::getInvalid();
+
+    IntrinsicCostAttributes Attrs1(Intrinsic::minnum, SrcTy, {SrcTy, SrcTy});
+    Cost += getIntrinsicInstrCost(Attrs1, CostKind);
+    IntrinsicCostAttributes Attrs2(Intrinsic::maxnum, SrcTy, {SrcTy, SrcTy});
+    Cost += getIntrinsicInstrCost(Attrs2, CostKind);
+    Cost +=
+        getCastInstrCost(IsSigned ? Instruction::FPToSI : Instruction::FPToUI,
+                         RetTy, SrcTy, TTI::CastContextHint::None, CostKind);
+    if (IsSigned) {
+      Type *CondTy = RetTy->getWithNewBitWidth(1);
+      Cost += getCmpSelInstrCost(BinaryOperator::FCmp, SrcTy, CondTy,
+                                 CmpInst::FCMP_UNO, CostKind);
+      Cost += getCmpSelInstrCost(BinaryOperator::Select, RetTy, CondTy,
+                                 CmpInst::FCMP_UNO, CostKind);
+    }
+    return Cost;
+  }
   }
 
   if (ST->hasVInstructions() && RetTy->isVectorTy()) {
diff --git a/llvm/test/Analysis/CostModel/RISCV/cast-sat.ll b/llvm/test/Analysis/CostModel/RISCV/cast-sat.ll
new file mode 100644
index 0000000000000..52dd102f89da1
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/RISCV/cast-sat.ll
@@ -0,0 +1,583 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -mtriple=riscv64 -mattr=+zve32f,+zvl128b,+f,+d,+zfh,+zvfh -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output | FileCheck %s --check-prefixes=CHECK,RV64ZVE32F
+; RUN: opt < %s -mtriple=riscv64 -mattr=+v,+zvl128b,+f,+d,+zfh,+zvfh -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output | FileCheck %s --check-prefixes=CHECK,RV64V
+
+define void @fptoui_sat() {
+; RV64ZVE32F-LABEL: 'fptoui_sat'
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v1f32_v1i8 = call <1 x i8> @llvm.fptoui.sat.v1i8.v1f32(<1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f64_v1i8 = call <1 x i8> @llvm.fptoui.sat.v1i8.v1f64(<1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f32_v1i16 = call <1 x i16> @llvm.fptoui.sat.v1i16.v1f32(<1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f64_v1i16 = call <1 x i16> @llvm.fptoui.sat.v1i16.v1f64(<1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f32_v1i32 = call <1 x i32> @llvm.fptoui.sat.v1i32.v1f32(<1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f64_v1i32 = call <1 x i32> @llvm.fptoui.sat.v1i32.v1f64(<1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f32_v1i64 = call <1 x i64> @llvm.fptoui.sat.v1i64.v1f32(<1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v1f64_v1i64 = call <1 x i64> @llvm.fptoui.sat.v1i64.v1f64(<1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v1f32_v1i1 = call <1 x i1> @llvm.fptoui.sat.v1i1.v1f32(<1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: %v1f64_v1i1 = call <1 x i1> @llvm.fptoui.sat.v1i1.v1f64(<1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v2f32_v2i8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f32(<2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %v2f64_v2i8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f64(<2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2f32_v2i16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f32(<2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %v2f64_v2i16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f64(<2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2f32_v2i32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f32(<2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %v2f64_v2i32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f64(<2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v2f32_v2i64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f32(<2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %v2f64_v2i64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f64(<2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v2f32_v2i1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f32(<2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 27 for instruction: %v2f64_v2i1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f64(<2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v4f32_v4i8 = call <4 x i8> @llvm.fptoui.sat.v4i8.v4f32(<4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 31 for instruction: %v4f64_v4i8 = call <4 x i8> @llvm.fptoui.sat.v4i8.v4f64(<4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v4f32_v4i16 = call <4 x i16> @llvm.fptoui.sat.v4i16.v4f32(<4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 31 for instruction: %v4f64_v4i16 = call <4 x i16> @llvm.fptoui.sat.v4i16.v4f64(<4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v4f32_v4i32 = call <4 x i32> @llvm.fptoui.sat.v4i32.v4f32(<4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 31 for instruction: %v4f64_v4i32 = call <4 x i32> @llvm.fptoui.sat.v4i32.v4f64(<4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: %v4f32_v4i64 = call <4 x i64> @llvm.fptoui.sat.v4i64.v4f32(<4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %v4f64_v4i64 = call <4 x i64> @llvm.fptoui.sat.v4i64.v4f64(<4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v4f32_v4i1 = call <4 x i1> @llvm.fptoui.sat.v4i1.v4f32(<4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 55 for instruction: %v4f64_v4i1 = call <4 x i1> @llvm.fptoui.sat.v4i1.v4f64(<4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v8f32_v8i8 = call <8 x i8> @llvm.fptoui.sat.v8i8.v8f32(<8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 63 for instruction: %v8f64_v8i8 = call <8 x i8> @llvm.fptoui.sat.v8i8.v8f64(<8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v8f32_v8i16 = call <8 x i16> @llvm.fptoui.sat.v8i16.v8f32(<8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 63 for instruction: %v8f64_v8i16 = call <8 x i16> @llvm.fptoui.sat.v8i16.v8f64(<8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v8f32_v8i32 = call <8 x i32> @llvm.fptoui.sat.v8i32.v8f32(<8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 63 for instruction: %v8f64_v8i32 = call <8 x i32> @llvm.fptoui.sat.v8i32.v8f64(<8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 19 for instruction: %v8f32_v8i64 = call <8 x i64> @llvm.fptoui.sat.v8i64.v8f32(<8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %v8f64_v8i64 = call <8 x i64> @llvm.fptoui.sat.v8i64.v8f64(<8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v8f32_v8i1 = call <8 x i1> @llvm.fptoui.sat.v8i1.v8f32(<8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 111 for instruction: %v8f64_v8i1 = call <8 x i1> @llvm.fptoui.sat.v8i1.v8f64(<8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nxv1f32_nxv1i8 = call <vscale x 1 x i8> @llvm.fptoui.sat.nxv1i8.nxv1f32(<vscale x 1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv1f64_nxv1i8 = call <vscale x 1 x i8> @llvm.fptoui.sat.nxv1i8.nxv1f64(<vscale x 1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nxv1f32_nxv1i16 = call <vscale x 1 x i16> @llvm.fptoui.sat.nxv1i16.nxv1f32(<vscale x 1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv1f64_nxv1i16 = call <vscale x 1 x i16> @llvm.fptoui.sat.nxv1i16.nxv1f64(<vscale x 1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nxv1f32_nxv1i32 = call <vscale x 1 x i32> @llvm.fptoui.sat.nxv1i32.nxv1f32(<vscale x 1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv1f64_nxv1i32 = call <vscale x 1 x i32> @llvm.fptoui.sat.nxv1i32.nxv1f64(<vscale x 1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv1f32_nxv1i64 = call <vscale x 1 x i64> @llvm.fptoui.sat.nxv1i64.nxv1f32(<vscale x 1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv1f64_nxv1i64 = call <vscale x 1 x i64> @llvm.fptoui.sat.nxv1i64.nxv1f64(<vscale x 1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv1f32_nxv1i1 = call <vscale x 1 x i1> @llvm.fptoui.sat.nxv1i1.nxv1f32(<vscale x 1 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv1f64_nxv1i1 = call <vscale x 1 x i1> @llvm.fptoui.sat.nxv1i1.nxv1f64(<vscale x 1 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nxv2f32_nxv2i8 = call <vscale x 2 x i8> @llvm.fptoui.sat.nxv2i8.nxv2f32(<vscale x 2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv2f64_nxv2i8 = call <vscale x 2 x i8> @llvm.fptoui.sat.nxv2i8.nxv2f64(<vscale x 2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nxv2f32_nxv2i16 = call <vscale x 2 x i16> @llvm.fptoui.sat.nxv2i16.nxv2f32(<vscale x 2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv2f64_nxv2i16 = call <vscale x 2 x i16> @llvm.fptoui.sat.nxv2i16.nxv2f64(<vscale x 2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nxv2f32_nxv2i32 = call <vscale x 2 x i32> @llvm.fptoui.sat.nxv2i32.nxv2f32(<vscale x 2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv2f64_nxv2i32 = call <vscale x 2 x i32> @llvm.fptoui.sat.nxv2i32.nxv2f64(<vscale x 2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv2f32_nxv2i64 = call <vscale x 2 x i64> @llvm.fptoui.sat.nxv2i64.nxv2f32(<vscale x 2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv2f64_nxv2i64 = call <vscale x 2 x i64> @llvm.fptoui.sat.nxv2i64.nxv2f64(<vscale x 2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv2f32_nxv2i1 = call <vscale x 2 x i1> @llvm.fptoui.sat.nxv2i1.nxv2f32(<vscale x 2 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv2f64_nxv2i1 = call <vscale x 2 x i1> @llvm.fptoui.sat.nxv2i1.nxv2f64(<vscale x 2 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nxv4f32_nxv4i8 = call <vscale x 4 x i8> @llvm.fptoui.sat.nxv4i8.nxv4f32(<vscale x 4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv4f64_nxv4i8 = call <vscale x 4 x i8> @llvm.fptoui.sat.nxv4i8.nxv4f64(<vscale x 4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nxv4f32_nxv4i16 = call <vscale x 4 x i16> @llvm.fptoui.sat.nxv4i16.nxv4f32(<vscale x 4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv4f64_nxv4i16 = call <vscale x 4 x i16> @llvm.fptoui.sat.nxv4i16.nxv4f64(<vscale x 4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nxv4f32_nxv4i32 = call <vscale x 4 x i32> @llvm.fptoui.sat.nxv4i32.nxv4f32(<vscale x 4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv4f64_nxv4i32 = call <vscale x 4 x i32> @llvm.fptoui.sat.nxv4i32.nxv4f64(<vscale x 4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv4f32_nxv4i64 = call <vscale x 4 x i64> @llvm.fptoui.sat.nxv4i64.nxv4f32(<vscale x 4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv4f64_nxv4i64 = call <vscale x 4 x i64> @llvm.fptoui.sat.nxv4i64.nxv4f64(<vscale x 4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv4f32_nxv4i1 = call <vscale x 4 x i1> @llvm.fptoui.sat.nxv4i1.nxv4f32(<vscale x 4 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv4f64_nxv4i1 = call <vscale x 4 x i1> @llvm.fptoui.sat.nxv4i1.nxv4f64(<vscale x 4 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv8f32_nxv8i8 = call <vscale x 8 x i8> @llvm.fptoui.sat.nxv8i8.nxv8f32(<vscale x 8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv8f64_nxv8i8 = call <vscale x 8 x i8> @llvm.fptoui.sat.nxv8i8.nxv8f64(<vscale x 8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nxv8f32_nxv8i16 = call <vscale x 8 x i16> @llvm.fptoui.sat.nxv8i16.nxv8f32(<vscale x 8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv8f64_nxv8i16 = call <vscale x 8 x i16> @llvm.fptoui.sat.nxv8i16.nxv8f64(<vscale x 8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %nxv8f32_nxv8i32 = call <vscale x 8 x i32> @llvm.fptoui.sat.nxv8i32.nxv8f32(<vscale x 8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv8f64_nxv8i32 = call <vscale x 8 x i32> @llvm.fptoui.sat.nxv8i32.nxv8f64(<vscale x 8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv8f32_nxv8i64 = call <vscale x 8 x i64> @llvm.fptoui.sat.nxv8i64.nxv8f32(<vscale x 8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv8f64_nxv8i64 = call <vscale x 8 x i64> @llvm.fptoui.sat.nxv8i64.nxv8f64(<vscale x 8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %nxv8f32_nxv8i1 = call <vscale x 8 x i1> @llvm.fptoui.sat.nxv8i1.nxv8f32(<vscale x 8 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv8f64_nxv8i1 = call <vscale x 8 x i1> @llvm.fptoui.sat.nxv8i1.nxv8f64(<vscale x 8 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %nxv16f32_nxv16i8 = call <vscale x 16 x i8> @llvm.fptoui.sat.nxv16i8.nxv16f32(<vscale x 16 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv16f64_nxv16i8 = call <vscale x 16 x i8> @llvm.fptoui.sat.nxv16i8.nxv16f64(<vscale x 16 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %nxv16f32_nxv16i16 = call <vscale x 16 x i16> @llvm.fptoui.sat.nxv16i16.nxv16f32(<vscale x 16 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv16f64_nxv16i16 = call <vscale x 16 x i16> @llvm.fptoui.sat.nxv16i16.nxv16f64(<vscale x 16 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %nxv16f32_nxv16i32 = call <vscale x 16 x i32> @llvm.fptoui.sat.nxv16i32.nxv16f32(<vscale x 16 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv16f64_nxv16i32 = call <vscale x 16 x i32> @llvm.fptoui.sat.nxv16i32.nxv16f64(<vscale x 16 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv16f32_nxv16i64 = call <vscale x 16 x i64> @llvm.fptoui.sat.nxv16i64.nxv16f32(<vscale x 16 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv16f64_nxv16i64 = call <vscale x 16 x i64> @llvm.fptoui.sat.nxv16i64.nxv16f64(<vscale x 16 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %nxv16f32_nxv16i1 = call <vscale x 16 x i1> @llvm.fptoui.sat.nxv16i1.nxv16f32(<vscale x 16 x float> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Invalid cost for instruction: %nxv16f64_nxv16i1 = call <vscale x 16 x i1> @llvm.fptoui.sat.nxv16i1.nxv16f64(<vscale x 16 x double> undef)
+; RV64ZVE32F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64V-LABEL: 'fptoui_sat'
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v1f32_v1i8 = call <1 x i8> @llvm.fptoui.sat.v1i8.v1f32(<1 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v1f64_v1i8 = call <1 x i8> @llvm.fptoui.sat.v1i8.v1f64(<1 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f32_v1i16 = call <1 x i16> @llvm.fptoui.sat.v1i16.v1f32(<1 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v1f64_v1i16 = call <1 x i16> @llvm.fptoui.sat.v1i16.v1f64(<1 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f32_v1i32 = call <1 x i32> @llvm.fptoui.sat.v1i32.v1f32(<1 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f64_v1i32 = call <1 x i32> @llvm.fptoui.sat.v1i32.v1f64(<1 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f32_v1i64 = call <1 x i64> @llvm.fptoui.sat.v1i64.v1f32(<1 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v1f64_v1i64 = call <1 x i64> @llvm.fptoui.sat.v1i64.v1f64(<1 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v1f32_v1i1 = call <1 x i1> @llvm.fptoui.sat.v1i1.v1f32(<1 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v1f64_v1i1 = call <1 x i1> @llvm.fptoui.sat.v1i1.v1f64(<1 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v2f32_v2i8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f32(<2 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v2f64_v2i8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f64(<2 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2f32_v2i16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f32(<2 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v2f64_v2i16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f64(<2 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2f32_v2i32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f32(<2 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2f64_v2i32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f64(<2 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2f32_v2i64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f32(<2 x float> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2f64_v2i64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f64(<2 x double> undef)
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 7 for i...
[truncated]

github-actions · 2025-06-11T05:28:19Z

✅ With the latest revision this PR passed the undef deprecator.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

topperc · 2025-06-11T18:30:02Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

@@ -1497,6 +1497,33 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
                          cast<VectorType>(ICA.getArgTypes()[0]), {}, CostKind,
                          0, cast<VectorType>(ICA.getReturnType()));
  }
+  case Intrinsic::fptoui_sat:


Can we do this in target independent code? The missing scalarization for the source type isn't just a RISC-V bug.

I think it is not only missing the scalarization for the source type, the cost of fptoui_sat is much lower than expect.

This can be implemented in BasicTTI::getIntrinsicInstrCost(). But when using type-based query it will try to get cost from BasicTTI::getTypeBasedIntrinsicInstrCost() which will return the cost different from the value-based query.

IIUC #97463 make the precedent that cost of intrinsics in getTypeBasedIntrinsicInstrCost() and not implemented before

llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h

Line 2511 in a3201ce

if (TLI->isOperationLegalOrPromote(ISD, LT.second)) {

should be handled by target.

I don't really understand this giant mess of TTI interfaces. The default should include an estimate based on the default expansion, but that doesn't mean the default should be assume the cost is the default expansion. It needs to account for the target making the base operation legal

Move the part of invalid cost in the BasicTTI (#147657).
And leave cost correction in RISCVTTI in this PR.

ElvisWang123 · 2025-06-18T07:48:53Z

Gentle ping @arsenm

ElvisWang123 · 2025-06-24T09:13:04Z

Gentle ping @arsenm

Your previous changes (#97463) make saturating operation return the expansion cost. If the target TTI doesn't implement their own implementation of the saturating operations, the cost of vector saturating operation will be under estimate and wrong.

Although the issue of that the wrong cost of the saturating fptoint is target independent. I think after #97463, target TTI need to handle some of the cases of invalid cost instead of just lower to BasicTTI.

I think implement this change in the target TTI is better to prevent moving the cost calculation in the BasicTTI before the expansion cost, which basically revert #100521 and add the legal check of source type.

arsenm · 2025-07-02T10:26:42Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+    IntrinsicCostAttributes Attrs1(Intrinsic::minnum, SrcTy, {SrcTy, SrcTy});
+    Cost += getIntrinsicInstrCost(Attrs1, CostKind);
+    IntrinsicCostAttributes Attrs2(Intrinsic::maxnum, SrcTy, {SrcTy, SrcTy});
+    Cost += getIntrinsicInstrCost(Attrs2, CostKind);


We probably should be checking / using something other than minnum/maxnum here but I haven't thought too deeply about it

Dropped. The generated ASM doesn't contains these intrinsics.

ElvisWang123 · 2025-07-14T00:01:31Z

Ping @topperc :)

After #147657, the invalid cost for llvm.fpto{u|s}i is handled by basic TTI.
This patch only correct the cost of llvm.fpto{u|s}i for RISCV target.

topperc · 2025-07-22T06:40:01Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+    Cost +=
+        getCastInstrCost(IsSigned ? Instruction::FPToSI : Instruction::FPToUI,
+                         RetTy, SrcTy, TTI::CastContextHint::None, CostKind);
+    if (IsSigned) {


Don't we always generate a select and a compare? Our hardware instructions produce all 1s for nan and we need to convert it to 0 for both signed and unsigned.

define void @fp2si_v2f32_v2i32(ptr %x, ptr %y) { ; CHECK-LABEL: fp2si_v2f32_v2i32: ; CHECK: # %bb.0: ; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma ; CHECK-NEXT: vle32.v v8, (a0) ; CHECK-NEXT: vmfne.vv v0, v8, v8 ; CHECK-NEXT: vfcvt.rtz.x.f.v v8, v8 ; CHECK-NEXT: vmerge.vim v8, v8, 0, v0 ; CHECK-NEXT: vse32.v v8, (a1) ; CHECK-NEXT: ret %a = load <2 x float>, ptr %x %d = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f32(<2 x float> %a) store <2 x i32> %d, ptr %y ret void } declare <2 x i32> @llvm.fptosi.sat.v2i32.v2f32(<2 x float>) define void @fp2ui_v2f32_v2i32(ptr %x, ptr %y) { ; CHECK-LABEL: fp2ui_v2f32_v2i32: ; CHECK: # %bb.0: ; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma ; CHECK-NEXT: vle32.v v8, (a0) ; CHECK-NEXT: vmfne.vv v0, v8, v8 ; CHECK-NEXT: vfcvt.rtz.xu.f.v v8, v8 ; CHECK-NEXT: vmerge.vim v8, v8, 0, v0 ; CHECK-NEXT: vse32.v v8, (a1) ; CHECK-NEXT: ret %a = load <2 x float>, ptr %x %d = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f32(<2 x float> %a) store <2 x i32> %d, ptr %y ret void } declare <2 x i32> @llvm.fptoui.sat.v2i32.v2f32(<2 x float>)

Yes, I used the really old version (which is wrong) to show the asm. 😢
Updated, thanks!

topperc

LGTM

…m#143655) This patch implement vector costs for `llvm.fptoui.sat()` in RISCV TTI.

ElvisWang123 requested review from preames, lukel97, arcbbb, topperc and wangpc-pp June 11, 2025 05:26

llvmbot added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding labels Jun 11, 2025

arcbbb reviewed Jun 11, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Show resolved Hide resolved

topperc reviewed Jun 11, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Show resolved Hide resolved

topperc reviewed Jun 11, 2025

View reviewed changes

topperc requested a review from arsenm June 12, 2025 01:00

arsenm reviewed Jul 2, 2025

View reviewed changes

ElvisWang123 force-pushed the fix-cast-sat-cost-RISCV branch from c96dbe2 to e98cc39 Compare July 9, 2025 06:22

topperc reviewed Jul 22, 2025

View reviewed changes

ElvisWang123 added 2 commits July 22, 2025 00:25

[RISCV][TTI] Implement cost for llvm.fpto{u|s}i.sat.

d5a0cec

!fixup, always calculate cost of vfcmp and vselect.

35cb961

ElvisWang123 force-pushed the fix-cast-sat-cost-RISCV branch from e98cc39 to 35cb961 Compare July 22, 2025 07:35

topperc approved these changes Jul 22, 2025

View reviewed changes

Merge branch 'main' into fix-cast-sat-cost-RISCV

3ed8264

ElvisWang123 merged commit 324773e into llvm:main Jul 23, 2025
9 checks passed

ElvisWang123 deleted the fix-cast-sat-cost-RISCV branch July 23, 2025 01:52

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025

[RISCV][TTI] Implement vector costs for llvm.fpto{u|s}i.sat(). (llv…

ff55cdb

…m#143655) This patch implement vector costs for `llvm.fptoui.sat()` in RISCV TTI.

[RISCV][TTI] Implement vector costs for llvm.fpto{u|s}i.sat(). #143655

[RISCV][TTI] Implement vector costs for llvm.fpto{u|s}i.sat(). #143655

Uh oh!

Conversation

ElvisWang123 commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ElvisWang123 commented Jun 18, 2025

Uh oh!

ElvisWang123 commented Jun 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ElvisWang123 commented Jul 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[RISCV][TTI] Implement vector costs for `llvm.fpto{u|s}i.sat()`. #143655

[RISCV][TTI] Implement vector costs for `llvm.fpto{u|s}i.sat()`. #143655

ElvisWang123 commented Jun 11, 2025 •

edited

Loading

llvmbot commented Jun 11, 2025 •

edited

Loading

github-actions bot commented Jun 11, 2025 •

edited

Loading