Skip to content

[AMDGPU] Baseline gfx1250 speed model. #145217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rampitec
Copy link
Collaborator

No description provided.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@rampitec rampitec marked this pull request as ready for review June 22, 2025 07:35
@llvmbot
Copy link
Member

llvmbot commented Jun 22, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/145217.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNProcessors.td (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/SISchedule.td (+33)
diff --git a/llvm/lib/Target/AMDGPU/GCNProcessors.td b/llvm/lib/Target/AMDGPU/GCNProcessors.td
index 0b331bd3f3fb6..b5ffa64c3a4b4 100644
--- a/llvm/lib/Target/AMDGPU/GCNProcessors.td
+++ b/llvm/lib/Target/AMDGPU/GCNProcessors.td
@@ -326,6 +326,6 @@ def : ProcessorModel<"gfx12-generic", GFX12SpeedModel,
   FeatureISAVersion12_Generic.Features
 >;
 
-def : ProcessorModel<"gfx1250", GFX12SpeedModel,
+def : ProcessorModel<"gfx1250", GFX1250SpeedModel,
   FeatureISAVersion12_50.Features
 >;
diff --git a/llvm/lib/Target/AMDGPU/SISchedule.td b/llvm/lib/Target/AMDGPU/SISchedule.td
index 2a374b360b04a..1679cee320067 100644
--- a/llvm/lib/Target/AMDGPU/SISchedule.td
+++ b/llvm/lib/Target/AMDGPU/SISchedule.td
@@ -99,6 +99,7 @@ def SIDPGFX950FullSpeedModel : SISchedMachineModel;
 def GFX10SpeedModel : SISchedMachineModel;
 def GFX11SpeedModel : SISchedMachineModel;
 def GFX12SpeedModel : SISchedMachineModel;
+def GFX1250SpeedModel : SISchedMachineModel;
 
 // XXX: Are the resource counts correct?
 def HWBranch : ProcResource<1> {
@@ -455,3 +456,35 @@ def : HWWriteRes<WriteBarrier,           [HWBranch],       2000>;
 def : InstRW<[WriteCopy], (instrs COPY)>;
 
 }  // End SchedModel = GFX12SpeedModel
+
+multiclass GFX125xCommonWriteRes {
+
+def : HWWriteRes<Write32Bit,             [HWVALU, HWRC],   5>;
+def : HWWriteRes<WriteFloatCvt,          [HWVALU, HWRC],   5>;
+def : HWWriteRes<WriteTrans32,           [HWTransVALU, HWRC],   7>;
+def : HWWriteRes<WriteQuarterRate32,     [HWVALU, HWRC],   6>;
+def : HWWriteRes<WriteFloatFMA,          [HWVALU, HWRC],   5>;
+def : HWWriteRes<WritePseudoScalarTrans, [HWVALU, HWRC],   8>;
+
+def : HWWriteRes<WriteBranch,            [HWBranch],       32>;
+def : HWWriteRes<WriteExport,            [HWExport, HWRC], 16>;
+def : HWWriteRes<WriteLDS,               [HWLGKM,   HWRC], 20>;
+def : HWWriteRes<WriteSALU,              [HWSALU,   HWRC], 2>;
+def : HWWriteRes<WriteSFPU,              [HWSALU,   HWRC], 4>;
+def : HWWriteRes<WriteSMEM,              [HWLGKM,   HWRC], 20>;
+def : HWWriteRes<WriteVMEM,              [HWVMEM,   HWRC], 320>;
+def : HWWriteRes<WriteBarrier,           [HWBranch],       2000>;
+
+def : InstRW<[WriteCopy], (instrs COPY)>;
+} // End GFX125xCommonWriteRes
+
+let SchedModel = GFX1250SpeedModel in {
+defm : GFX125xCommonWriteRes;
+
+def : HWWriteRes<Write64Bit,             [HWVALU, HWRC],   7>;
+def : HWWriteRes<WriteIntMul,            [HWVALU, HWRC],   11>;
+def : HWWriteRes<WriteDouble,            [HWVALU, HWRC],   32>;
+def : HWWriteRes<WriteDoubleAdd,         [HWVALU, HWRC],   32>;
+def : HWWriteRes<WriteDoubleCvt,         [HWVALU, HWRC],   32>;
+def : HWWriteRes<WriteTrans64,           [HWVALU, HWTransVALU, HWRC], 38>;
+} // SchedModel = GFX1250SpeedModel

Comment on lines +464 to +467
def : HWWriteRes<WriteTrans32, [HWTransVALU, HWRC], 7>;
def : HWWriteRes<WriteQuarterRate32, [HWVALU, HWRC], 6>;
def : HWWriteRes<WriteFloatFMA, [HWVALU, HWRC], 5>;
def : HWWriteRes<WritePseudoScalarTrans, [HWVALU, HWRC], 8>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do WriteTrans32 and WritePseudoScalarTrans use different resources? And it seems unintuitive that the scalar trans cost is higher than trans32, is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec it uses VALU pipeline. I suspect it uses both, but that is really what is written. Then it is of course inherited from the gfx12 baseline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, it is correct it is higher, because you also need to move data to the pipeline.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm-mca tests would be good

@rampitec rampitec merged commit 89c6144 into main Jun 23, 2025
11 checks passed
@rampitec rampitec deleted the users/rampitec/06-22-_amdgpu_baseline_gfx1250_speed_model branch June 23, 2025 03:26
miguelcsx pushed a commit to miguelcsx/llvm-project that referenced this pull request Jun 23, 2025
Jaddyen pushed a commit to Jaddyen/llvm-project that referenced this pull request Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants