Skip to content

[mlir][llvm] Add experimental.vector.interleave2 intrinsic #79270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 29, 2024

Conversation

c-rhodes
Copy link
Collaborator

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Jan 24, 2024

@llvm/pr-subscribers-mlir-sve

@llvm/pr-subscribers-mlir

Author: Cullen Rhodes (c-rhodes)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/79270.diff

2 Files Affected:

  • (modified) mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td (+4)
  • (modified) mlir/test/Target/LLVMIR/arm-sve.mlir (+7)
diff --git a/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td b/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
index e3f3d9e62e8fb39..754413a1ad491ec 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
+++ b/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
@@ -410,4 +410,8 @@ def ConvertToSvboolIntrOp :
     /*overloadedResults=*/[]>,
     Arguments<(ins SVEPredicate:$mask)>;
 
+def Zip1IntrOp :
+  ArmSVE_IntrBinaryOverloadedOp<"zip1">,
+  Arguments<(ins AnyScalableVector, AnyScalableVector)>;
+
 #endif // ARMSVE_OPS
diff --git a/mlir/test/Target/LLVMIR/arm-sve.mlir b/mlir/test/Target/LLVMIR/arm-sve.mlir
index b63d3f06515690a..002b1f9d804a7ce 100644
--- a/mlir/test/Target/LLVMIR/arm-sve.mlir
+++ b/mlir/test/Target/LLVMIR/arm-sve.mlir
@@ -314,3 +314,10 @@ llvm.func @arm_sve_convert_to_svbool(
     : (vector<[1]xi1>) -> vector<[16]xi1>
   llvm.return
 }
+
+// CHECK-LABEL: @arm_sve_zip1
+// CHECK-NEXT: call <vscale x 8 x half> @llvm.aarch64.sve.zip1.nxv8f16(<vscale x 8 x half> %{{.*}}, <vscale x 8 x half> {{.*}})
+llvm.func @arm_sve_zip1(%arg0 : vector<[8]xf16>) -> vector<[8]xf16> {
+  %0 = "arm_sve.intr.zip1"(%arg0, %arg0) : (vector<[8]xf16>, vector<[8]xf16>) -> vector<[8]xf16>
+  llvm.return %0 : vector<[8]xf16>
+}

@llvmbot
Copy link
Member

llvmbot commented Jan 24, 2024

@llvm/pr-subscribers-mlir-llvm

Author: Cullen Rhodes (c-rhodes)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/79270.diff

2 Files Affected:

  • (modified) mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td (+4)
  • (modified) mlir/test/Target/LLVMIR/arm-sve.mlir (+7)
diff --git a/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td b/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
index e3f3d9e62e8fb39..754413a1ad491ec 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
+++ b/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
@@ -410,4 +410,8 @@ def ConvertToSvboolIntrOp :
     /*overloadedResults=*/[]>,
     Arguments<(ins SVEPredicate:$mask)>;
 
+def Zip1IntrOp :
+  ArmSVE_IntrBinaryOverloadedOp<"zip1">,
+  Arguments<(ins AnyScalableVector, AnyScalableVector)>;
+
 #endif // ARMSVE_OPS
diff --git a/mlir/test/Target/LLVMIR/arm-sve.mlir b/mlir/test/Target/LLVMIR/arm-sve.mlir
index b63d3f06515690a..002b1f9d804a7ce 100644
--- a/mlir/test/Target/LLVMIR/arm-sve.mlir
+++ b/mlir/test/Target/LLVMIR/arm-sve.mlir
@@ -314,3 +314,10 @@ llvm.func @arm_sve_convert_to_svbool(
     : (vector<[1]xi1>) -> vector<[16]xi1>
   llvm.return
 }
+
+// CHECK-LABEL: @arm_sve_zip1
+// CHECK-NEXT: call <vscale x 8 x half> @llvm.aarch64.sve.zip1.nxv8f16(<vscale x 8 x half> %{{.*}}, <vscale x 8 x half> {{.*}})
+llvm.func @arm_sve_zip1(%arg0 : vector<[8]xf16>) -> vector<[8]xf16> {
+  %0 = "arm_sve.intr.zip1"(%arg0, %arg0) : (vector<[8]xf16>, vector<[8]xf16>) -> vector<[8]xf16>
+  llvm.return %0 : vector<[8]xf16>
+}

@MacDue
Copy link
Member

MacDue commented Jan 24, 2024

Maybe add zip2 as well?
(In the 2/4-way lowerings I think zip1, zip2, would use half the registers two zip1s, which might be something we'd like to try).

@dcaballe
Copy link
Contributor

I've been successfully using vector.shuffle to model this pattern for Neon and the zip instructions were generated accordingly so I'm wondering if we could do the same for scalable. What is the way to model scalable shuffles in LLVM?

@banach-space
Copy link
Contributor

I've been successfully using vector.shuffle to model this pattern for Neon and the zip instructions were generated accordingly so I'm wondering if we could do the same for scalable. What is the way to model scalable shuffles in LLVM?

IIUC, that's not possible for SVE. From https://llvm.org/docs/LangRef.html#id189:

For scalable vectors, the only valid mask values at present are zeroinitializer, undef and poison, since we cannot write all indices as literals for a vector with a length unknown at compile time.

@MacDue
Copy link
Member

MacDue commented Jan 24, 2024

IIUC, that's not possible for SVE. From https://llvm.org/docs/LangRef.html#id189:

For SVE/scalable vectors there is @llvm.experimental.vector.interleave2 which a slightly higher level abstraction for this:
https://llvm.org/docs/LangRef.html#llvm-experimental-vector-interleave2-intrinsic

Though the MLIR vector.shuffle operation currently cannot model a scalable zip, so in MLIR this would probably be vector.scalable.interleave (following from vector.scalable.insert/extract, which are also experimental LLVM intrinsics).

@c-rhodes
Copy link
Collaborator Author

Maybe add zip2 as well? (In the 2/4-way lowerings I think zip1, zip2, would use half the registers two zip1s, which might be something we'd like to try).

I tried that already but couldn't get it to work

@c-rhodes
Copy link
Collaborator Author

Maybe add zip2 as well? (In the 2/4-way lowerings I think zip1, zip2, would use half the registers two zip1s, which might be something we'd like to try).

I tried that already but couldn't get it to work

If we can get that to work could be a nice future improvement, I'd rather not add an intrinsic until we're sure if that's ok, and it's not necessary for this first widening outer product support

@c-rhodes
Copy link
Collaborator Author

IIUC, that's not possible for SVE. From https://llvm.org/docs/LangRef.html#id189:

For SVE/scalable vectors there is @llvm.experimental.vector.interleave2 which a slightly higher level abstraction for this: https://llvm.org/docs/LangRef.html#llvm-experimental-vector-interleave2-intrinsic

Oh cool, this is nicer, I'll use this intrinsic instead.

Though the MLIR vector.shuffle operation currently cannot model a scalable zip, so in MLIR this would probably be vector.scalable.interleave (following from vector.scalable.insert/extract, which are also experimental LLVM intrinsics).

Sounds good 👍 I'll update the patches to use the target-agnostic interleave intrinsic, and we can look to add an op later.

@c-rhodes c-rhodes changed the title [mlir][ArmSVE] add zip1 intrinsic [mlir][llvm] add experimental.vector.interleave2 intrinsic Jan 26, 2024
Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@c-rhodes
Copy link
Collaborator Author

@MacDue pointed out the input typeLLVM_AnyVector includes LLVMFixedVectorType and LLVMScalableVectorType and cast<VectorType> will crash. I've updated the constraint to fix this.

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@MacDue MacDue changed the title [mlir][llvm] add experimental.vector.interleave2 intrinsic [mlir][llvm] Add experimental.vector.interleave2 intrinsic Jan 29, 2024
@c-rhodes c-rhodes merged commit 754a8ad into llvm:main Jan 29, 2024
@c-rhodes c-rhodes deleted the mlir-arm-sve-zip1-intrinsic branch January 29, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants