Skip to content

[MLIR] Add allow Insert/extract slice option to pack/unpack op #117340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 10, 2024

Conversation

jerryyin
Copy link
Member

@jerryyin jerryyin commented Nov 22, 2024

This PR adds default option below. The new options will come as default to true and not change the original lowering behavior of pack and unpack op.

  • lowerPadLikeWithInsertSlice to packOp (with default = true)
  • lowerUnpadLikeWithExtractSlice to unPackOp (with default = true)

The motivation of the PR is finer granular control of the lowering of pack and unpack Ops. This is useful in particular when we want to guarantee that there's no additional insertslice and extractslice that interfere with tiling. With the original lowering pipeline, packOp and unPackOp may be lowered to insertslice and extractslice when the high dimensions are unit dimensions and no transpose is invovled. Under such circumstances, such insert and extract slice ops will block producer/consumer fusion tile + fuse transforms. With this PR, we will be able to disable such lowering path and allow consumer fusion to go through as expected.

@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2024

@llvm/pr-subscribers-mlir

Author: Zhuoran Yin (jerryyin)

Changes

This PR adds default option below. The new options will come as default to true and not change the original lowering behavior of pack and unpack op.

  • allowInsertSliceLowering to packOp (with default = true)
  • allowExtractSliceLowering to unPackOp (with default = true)

The motivation of the PR is finer granular control of the lowering of pack and unpack Ops. This is useful in particular when we want to guarantee that there's no additional insertslice and extractslice that interfere with tiling. With the original lowering pipeline, packOp and unPackOp may be lowered to insertslice and extractslice when the high dimensions are unit dimensions and no transpose is invovled. Under such circumstances, such insert and extract slice ops will block consumer fusion for TileAndFuse pipeline. With this PR, we will be able to disable such lowering path and allow consumer fusion to go through as expected.


Full diff: https://github.com/llvm/llvm-project/pull/117340.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td (+4-2)
  • (modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+5-3)
  • (modified) mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp (+6-2)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+7-5)
diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index e3084530bd11b5..ea96da77b6c331 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -548,7 +548,8 @@ def LowerPackOp : Op<Transform_Dialect, "structured.lower_pack", [
     Return handles to the newly produced pad, expand_shape and transpose ops.
   }];
 
-  let arguments = (ins Transform_ConcreteOpType<"tensor.pack">:$target);
+  let arguments = (ins Transform_ConcreteOpType<"tensor.pack">:$target,
+                       DefaultValuedAttr<BoolAttr, "true">:$allowInsertSliceLowering);
   let results = (outs Transform_ConcreteOpType<"tensor.pad">:$pad_op,
                       Transform_ConcreteOpType<"tensor.expand_shape">:$expand_shape_op,
                       Transform_ConcreteOpType<"linalg.transpose">:$transpose_op);
@@ -588,7 +589,8 @@ def LowerUnPackOp : Op<Transform_Dialect, "structured.lower_unpack", [
     Return handles to the newly produced empty, transpose, collapse_shape and extract_slice ops.
   }];
 
-  let arguments = (ins Transform_ConcreteOpType<"tensor.unpack">:$target);
+  let arguments = (ins Transform_ConcreteOpType<"tensor.unpack">:$target,
+                       DefaultValuedAttr<BoolAttr, "true">:$allowExtractSliceLowering);
   let results = (outs Transform_ConcreteOpType<"tensor.empty">:$empty_op,
                       Transform_ConcreteOpType<"linalg.transpose">:$transpose_op,
                       Transform_ConcreteOpType<"tensor.collapse_shape">:$collapse_shape_op,
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 51967f83fee377..fd27e7929764d3 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -1121,7 +1121,8 @@ struct LowerPackResult {
 
 /// Rewrite pack as pad + reshape + transpose.
 FailureOr<LowerPackResult> lowerPack(RewriterBase &rewriter,
-                                     tensor::PackOp packOp);
+                                     tensor::PackOp packOp,
+                                     bool allowInsertSliceLowering = true);
 
 struct LowerUnPackOpResult {
   tensor::EmptyOp emptyOp;
@@ -1131,8 +1132,9 @@ struct LowerUnPackOpResult {
 };
 
 /// Rewrite pack as empty + transpose + reshape + extract_slice.
-FailureOr<LowerUnPackOpResult> lowerUnPack(RewriterBase &rewriter,
-                                           tensor::UnPackOp unPackOp);
+FailureOr<LowerUnPackOpResult>
+lowerUnPack(RewriterBase &rewriter, tensor::UnPackOp unPackOp,
+            bool allowExtractSliceLowering = true);
 
 /// Struct to hold the result of a `pack` call.
 struct PackResult {
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index ada80deacfdbfe..5117a5c58c381d 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -1171,7 +1171,9 @@ DiagnosedSilenceableFailure transform::LowerPackOp::applyToOne(
     transform::ApplyToEachResultList &transformResults,
     transform::TransformState &state) {
   rewriter.setInsertionPoint(target);
-  FailureOr<LowerPackResult> res = lowerPack(rewriter, target);
+  bool allowInsertSliceLowering = getAllowInsertSliceLowering();
+  FailureOr<LowerPackResult> res =
+      lowerPack(rewriter, target, allowInsertSliceLowering);
   if (failed(res)) {
     return mlir::emitSilenceableFailure(target->getLoc())
            << "cannot lower to pad + expand + transpose";
@@ -1191,7 +1193,9 @@ DiagnosedSilenceableFailure transform::LowerUnPackOp::applyToOne(
     transform::ApplyToEachResultList &transformResults,
     transform::TransformState &state) {
   rewriter.setInsertionPoint(target);
-  FailureOr<LowerUnPackOpResult> res = lowerUnPack(rewriter, target);
+  bool allowExtractSliceLowering = getAllowExtractSliceLowering();
+  FailureOr<LowerUnPackOpResult> res =
+      lowerUnPack(rewriter, target, allowExtractSliceLowering);
   if (failed(res)) {
     DiagnosedSilenceableFailure diag =
         emitSilenceableError()
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index d92543d7264625..0717dad4c2852f 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -217,7 +217,8 @@ struct PackedOperandsDimList {
 } // namespace
 
 FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
-                                             tensor::PackOp packOp) {
+                                             tensor::PackOp packOp,
+                                             bool allowInsertSliceLowering) {
   // 1. Filter out NYI cases.
   auto packedTensorType =
       cast<RankedTensorType>(packOp->getResultTypes().front());
@@ -295,7 +296,7 @@ FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
       llvm::interleaveComma(stripMinedShape, DBGS() << "stripMinedShape: ");
       DBGSNL(); DBGS() << "collapsed type: " << collapsed; DBGSNL(););
 
-  if (packOp.isLikePad()) {
+  if (allowInsertSliceLowering && packOp.isLikePad()) {
     // Pack ops which operate as simple pads may not produce legal
     // tensor.insert_slice operations when the packed type does not rank reduce
     // to the padded type.
@@ -351,8 +352,9 @@ FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
   return LowerPackResult{padOp, reshapeOp, transposeOp};
 }
 
-FailureOr<LowerUnPackOpResult> linalg::lowerUnPack(RewriterBase &rewriter,
-                                                   tensor::UnPackOp unPackOp) {
+FailureOr<LowerUnPackOpResult>
+linalg::lowerUnPack(RewriterBase &rewriter, tensor::UnPackOp unPackOp,
+                    bool allowExtractSliceLowering) {
   Location loc = unPackOp->getLoc();
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(unPackOp);
@@ -362,7 +364,7 @@ FailureOr<LowerUnPackOpResult> linalg::lowerUnPack(RewriterBase &rewriter,
 
   OpFoldResult zero = rewriter.getIndexAttr(0), one = rewriter.getIndexAttr(1);
   auto destTensorType = cast<RankedTensorType>(unPackOp.getDest().getType());
-  if (unPackOp.isLikeUnPad()) {
+  if (allowExtractSliceLowering && unPackOp.isLikeUnPad()) {
     // This unpack is just a plain unpad.
     // Just extract the slice from the higher ranked tensor.
     ArrayRef<int64_t> destShape = destTensorType.getShape();

@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2024

@llvm/pr-subscribers-mlir-linalg

Author: Zhuoran Yin (jerryyin)

Changes

This PR adds default option below. The new options will come as default to true and not change the original lowering behavior of pack and unpack op.

  • allowInsertSliceLowering to packOp (with default = true)
  • allowExtractSliceLowering to unPackOp (with default = true)

The motivation of the PR is finer granular control of the lowering of pack and unpack Ops. This is useful in particular when we want to guarantee that there's no additional insertslice and extractslice that interfere with tiling. With the original lowering pipeline, packOp and unPackOp may be lowered to insertslice and extractslice when the high dimensions are unit dimensions and no transpose is invovled. Under such circumstances, such insert and extract slice ops will block consumer fusion for TileAndFuse pipeline. With this PR, we will be able to disable such lowering path and allow consumer fusion to go through as expected.


Full diff: https://github.com/llvm/llvm-project/pull/117340.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td (+4-2)
  • (modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+5-3)
  • (modified) mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp (+6-2)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+7-5)
diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index e3084530bd11b5..ea96da77b6c331 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -548,7 +548,8 @@ def LowerPackOp : Op<Transform_Dialect, "structured.lower_pack", [
     Return handles to the newly produced pad, expand_shape and transpose ops.
   }];
 
-  let arguments = (ins Transform_ConcreteOpType<"tensor.pack">:$target);
+  let arguments = (ins Transform_ConcreteOpType<"tensor.pack">:$target,
+                       DefaultValuedAttr<BoolAttr, "true">:$allowInsertSliceLowering);
   let results = (outs Transform_ConcreteOpType<"tensor.pad">:$pad_op,
                       Transform_ConcreteOpType<"tensor.expand_shape">:$expand_shape_op,
                       Transform_ConcreteOpType<"linalg.transpose">:$transpose_op);
@@ -588,7 +589,8 @@ def LowerUnPackOp : Op<Transform_Dialect, "structured.lower_unpack", [
     Return handles to the newly produced empty, transpose, collapse_shape and extract_slice ops.
   }];
 
-  let arguments = (ins Transform_ConcreteOpType<"tensor.unpack">:$target);
+  let arguments = (ins Transform_ConcreteOpType<"tensor.unpack">:$target,
+                       DefaultValuedAttr<BoolAttr, "true">:$allowExtractSliceLowering);
   let results = (outs Transform_ConcreteOpType<"tensor.empty">:$empty_op,
                       Transform_ConcreteOpType<"linalg.transpose">:$transpose_op,
                       Transform_ConcreteOpType<"tensor.collapse_shape">:$collapse_shape_op,
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 51967f83fee377..fd27e7929764d3 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -1121,7 +1121,8 @@ struct LowerPackResult {
 
 /// Rewrite pack as pad + reshape + transpose.
 FailureOr<LowerPackResult> lowerPack(RewriterBase &rewriter,
-                                     tensor::PackOp packOp);
+                                     tensor::PackOp packOp,
+                                     bool allowInsertSliceLowering = true);
 
 struct LowerUnPackOpResult {
   tensor::EmptyOp emptyOp;
@@ -1131,8 +1132,9 @@ struct LowerUnPackOpResult {
 };
 
 /// Rewrite pack as empty + transpose + reshape + extract_slice.
-FailureOr<LowerUnPackOpResult> lowerUnPack(RewriterBase &rewriter,
-                                           tensor::UnPackOp unPackOp);
+FailureOr<LowerUnPackOpResult>
+lowerUnPack(RewriterBase &rewriter, tensor::UnPackOp unPackOp,
+            bool allowExtractSliceLowering = true);
 
 /// Struct to hold the result of a `pack` call.
 struct PackResult {
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index ada80deacfdbfe..5117a5c58c381d 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -1171,7 +1171,9 @@ DiagnosedSilenceableFailure transform::LowerPackOp::applyToOne(
     transform::ApplyToEachResultList &transformResults,
     transform::TransformState &state) {
   rewriter.setInsertionPoint(target);
-  FailureOr<LowerPackResult> res = lowerPack(rewriter, target);
+  bool allowInsertSliceLowering = getAllowInsertSliceLowering();
+  FailureOr<LowerPackResult> res =
+      lowerPack(rewriter, target, allowInsertSliceLowering);
   if (failed(res)) {
     return mlir::emitSilenceableFailure(target->getLoc())
            << "cannot lower to pad + expand + transpose";
@@ -1191,7 +1193,9 @@ DiagnosedSilenceableFailure transform::LowerUnPackOp::applyToOne(
     transform::ApplyToEachResultList &transformResults,
     transform::TransformState &state) {
   rewriter.setInsertionPoint(target);
-  FailureOr<LowerUnPackOpResult> res = lowerUnPack(rewriter, target);
+  bool allowExtractSliceLowering = getAllowExtractSliceLowering();
+  FailureOr<LowerUnPackOpResult> res =
+      lowerUnPack(rewriter, target, allowExtractSliceLowering);
   if (failed(res)) {
     DiagnosedSilenceableFailure diag =
         emitSilenceableError()
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index d92543d7264625..0717dad4c2852f 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -217,7 +217,8 @@ struct PackedOperandsDimList {
 } // namespace
 
 FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
-                                             tensor::PackOp packOp) {
+                                             tensor::PackOp packOp,
+                                             bool allowInsertSliceLowering) {
   // 1. Filter out NYI cases.
   auto packedTensorType =
       cast<RankedTensorType>(packOp->getResultTypes().front());
@@ -295,7 +296,7 @@ FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
       llvm::interleaveComma(stripMinedShape, DBGS() << "stripMinedShape: ");
       DBGSNL(); DBGS() << "collapsed type: " << collapsed; DBGSNL(););
 
-  if (packOp.isLikePad()) {
+  if (allowInsertSliceLowering && packOp.isLikePad()) {
     // Pack ops which operate as simple pads may not produce legal
     // tensor.insert_slice operations when the packed type does not rank reduce
     // to the padded type.
@@ -351,8 +352,9 @@ FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
   return LowerPackResult{padOp, reshapeOp, transposeOp};
 }
 
-FailureOr<LowerUnPackOpResult> linalg::lowerUnPack(RewriterBase &rewriter,
-                                                   tensor::UnPackOp unPackOp) {
+FailureOr<LowerUnPackOpResult>
+linalg::lowerUnPack(RewriterBase &rewriter, tensor::UnPackOp unPackOp,
+                    bool allowExtractSliceLowering) {
   Location loc = unPackOp->getLoc();
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(unPackOp);
@@ -362,7 +364,7 @@ FailureOr<LowerUnPackOpResult> linalg::lowerUnPack(RewriterBase &rewriter,
 
   OpFoldResult zero = rewriter.getIndexAttr(0), one = rewriter.getIndexAttr(1);
   auto destTensorType = cast<RankedTensorType>(unPackOp.getDest().getType());
-  if (unPackOp.isLikeUnPad()) {
+  if (allowExtractSliceLowering && unPackOp.isLikeUnPad()) {
     // This unpack is just a plain unpad.
     // Just extract the slice from the higher ranked tensor.
     ArrayRef<int64_t> destShape = destTensorType.getShape();

@jerryyin jerryyin changed the title [NFC] Add allow Insert/extract slice option to pack/unpack op [MLIR][NFC] Add allow Insert/extract slice option to pack/unpack op Nov 22, 2024
@banach-space banach-space self-requested a review November 22, 2024 16:08
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit maybe, but the fact that you have tests in a NFC change is a bit surprising to me?

@banach-space
Copy link
Contributor

Under such circumstances, such insert and extract slice ops will block producer/consumer fusion tile + fuse transforms. With this PR, we will be able to disable such lowering path and allow consumer fusion to go through as expected.

Why not add tests for that? I don't really know how fusion is tested, but this summary implies that we should be writing tests showing how these things compose? And demonstrating the impact of this new flag.

On a different note, there's lower{Un}Pack and there's DecomposeOuterUnitDimsPackOpPattern. We should unify these things at some point.

@Max191
Copy link
Contributor

Max191 commented Nov 25, 2024

On a different note, there's lower{Un}Pack and there's DecomposeOuterUnitDimsPackOpPattern. We should unify these things at some point.

I agree it is good to unify and reduce the overall number of patterns, but just want to note that we should make sure we still have the same type of control over how/which patterns are applied (that's probably what you were suggesting anyway, but just wanted to add that comment).

@banach-space
Copy link
Contributor

just want to note that we should make sure we still have the same type of control over how/which patterns are applied

Agreed - we shouldn't bundle things pre-maturely. However, if these patterns effectively duplicate one another ... That's a topic for another PR though ;-)

@Max191
Copy link
Contributor

Max191 commented Nov 25, 2024

Why not add tests for that? I don't really know how fusion is tested, but this summary implies that we should be writing tests showing how these things compose? And demonstrating the impact of this new flag.

I'm not really sure where a test like this belongs. If you have a suggestion for where we could put it that would be very helpful!

Here is an example case where this helps:

%producer = tensor.pack %src into %dest ...
%consumer_loop = scf.for ... {
  %slice = tensor.extract_slice %producer ...
  %consumer = "some_op" %slice ...
  ...
}

After decomposition this becomes one of 2 options:

// With the slice lowering:
%pad = tensor.pad %src ...
%producer = tensor.insert_slice %pad into %dest ...
%consumer_loop = scf.for ... {
  %slice = tensor.extract_slice %producer ...
  %consumer = "some_op" %slice ...
  ...
}

// With the non-slice (expand_shape/transpose) lowering\
%pad = tensor.pad %src
%expand = tensor.expand_shape %pad ...
%producer = linalg.transpose ins(%expand ...
%consumer_loop = scf.for ... {
  %slice = tensor.extract_slice %producer ...
  %consumer = "some_op" %slice ...
  ...
}

If we then want to run fusion of %producer into the %consumer_loop, then this would be blocked by the insert_slice in the first (slice) case, but not in the second (non-slice) case.

So if we want to test this specific case, then we would be testing the composition of these 2 transform ops, and I'm not sure where the correct test directory for this is. Alternatively, we can add a test for the fusion case above (which would be expected to fail fusion), but I'm not sure if that's a very useful test.

@jerryyin
Copy link
Member Author

jerryyin commented Nov 25, 2024

I see discussions converging here. My next commit is going to be:

  • Get rid of the unit test
  • Rename allowInsertSliceLowering to lowerPadLikeWithInsertSlice
  • Rename allowExtractSliceLowering to lowerUnpadLikeWithExtractSlice

- Renamed allowInsertSliceLowering to lowerPadLikeWithInsertSlice
- Renamed allowExtractSliceLowering to lowerUnpadLikeWithExtractSlice
- Removed the redundant unit test since this is NFC change

This reverts commit 46b7202.
@hanhanW
Copy link
Contributor

hanhanW commented Nov 25, 2024

On a different note, there's lower{Un}Pack and there's DecomposeOuterUnitDimsPackOpPattern. We should unify these things at some point.

It should be much easier than before today. People have filled the gaps in the past one year.

@banach-space
Copy link
Contributor

I see discussions converging here. My next commit is going to be:

  • Get rid of the unit test

I would keep the unit test.

When I asked about the test that you added (and subsequently removed), it just wasn't clear to me what to look for and what the difference with the existing tests should be. The comment from @Max191 makes that very clear:

// Current
func.func @pack_as_pad_default() {
  %pad = tensor.pad %src ...
  %producer = tensor.insert_slice %pad into %dest
}

// New
func.func @pack_as_pad_{disable|no|skip}_insert_slice() {
  %pad = tensor.pad %src
  %expand = tensor.expand_shape %pad ...
  %producer = linalg.transpose ins(%expand ..
}

Suggestion 1

  • This PR should include a test that follows the pattern above ^^^.

I think that the following test would be sufficient and very illustrative (trimmed for brevity - feel free to re-use/expand):

// CHECK-LABEL: func.func @pack_disallowed_as_pad(
func.func @pack_disallowed_as_pad(%arg0, %arg1) {

  //      CHECK: %[[PAD:.*]] = tensor.pad %[[SRC]] 
  //  CHECK-NOT: %[[RES:.*]] = tensor.insert_slice %[[PAD]]
  //      CHECK: %[[PAD_EXPANDED:.*]] = tensor.expand_shape %[[PAD]]
  //      CHECK: %[[RES:.*]] = linalg.transpose %[[PAD_EXPANDED]]
  %pack = tensor.pack %arg0 padding_value(%cst_0 : f32) inner_dims_pos = [0, 1, 2, 3] inner_tiles = [136, 64, 16, 16] into %arg1
    : tensor<129x47x16x16xf32> -> tensor<1x1x1x1x136x64x16x16xf32>
  return %pack :  tensor<1x1x1x1x136x64x16x16xf32>
}

IIUC, these are the existing tests that exercise similar path:

// CHECK-LABEL: func.func @pack(
func.func @pack(%arg0: tensor<129x47x16x16xf32>, %arg1: tensor<17x2x16x16x32x8xf32>) -> tensor<17x2x16x16x32x8xf32> {
%cst_0 = arith.constant 0.0 : f32
// tensor.pack is lowered to tensor.pad + tensor.expand_shape + linalg.transpose
// CHECK: tensor.pad {{.*}} low[0, 0, 0, 0]
// CHECK: : tensor<129x47x16x16xf32> to tensor<136x64x16x16xf32>
// CHECK: tensor.expand_shape %{{.*}} [{{.*}}[0, 1], [2, 3], [4], [5]]
// CHECK-SAME: : tensor<136x64x16x16xf32> into tensor<17x8x2x32x16x16xf32>
// CHECK: linalg.transpose
// CHECK-SAME: ins(%{{.*}} : tensor<17x8x2x32x16x16xf32>)
// CHECK-SAME: outs(%{{.*}} : tensor<17x2x16x16x32x8xf32>)
// CHECK-SAME: permutation = [0, 2, 4, 5, 3, 1]
%pack = tensor.pack %arg0 padding_value(%cst_0 : f32) inner_dims_pos = [1, 0] inner_tiles = [32, 8] into %arg1
: tensor<129x47x16x16xf32> -> tensor<17x2x16x16x32x8xf32>
return %pack : tensor<17x2x16x16x32x8xf32>
}
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%module_op: !transform.any_op {transform.readonly}) {
%pack = transform.structured.match ops{["tensor.pack"]} in %module_op
: (!transform.any_op) -> !transform.op<"tensor.pack">
transform.structured.lower_pack %pack : (!transform.op<"tensor.pack">)
-> (!transform.op<"tensor.pad">, !transform.op<"tensor.expand_shape">, !transform.op<"linalg.transpose">)
transform.yield
}
}
// -----
// CHECK-LABEL: func.func @pack(
func.func @pack(%arg0: tensor<128x8xf32>, %arg1: tensor<8x8x16x1xf32>) -> tensor<8x8x16x1xf32> {
// tensor.pack is lowered to tensor.pad + tensor.expand_shape + linalg.transpose
// CHECK: tensor.pad {{.*}} low[0, 0]
// CHECK: : tensor<128x8xf32> to tensor<128x8xf32>
// CHECK: tensor.expand_shape %{{.*}} [{{.*}}[0, 1], [2, 3]]
// CHECK-SAME: : tensor<128x8xf32> into tensor<8x16x8x1xf32>
// CHECK: linalg.transpose
// CHECK-SAME: ins(%{{.*}} : tensor<8x16x8x1xf32>)
// CHECK-SAME: outs(%{{.*}} : tensor<8x8x16x1xf32>)
// CHECK-SAME: permutation = [0, 2, 1, 3]
%pack = tensor.pack %arg0 inner_dims_pos = [0, 1] inner_tiles = [16, 1] into %arg1
: tensor<128x8xf32> -> tensor<8x8x16x1xf32>
return %pack : tensor<8x8x16x1xf32>
}
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%module_op: !transform.any_op {transform.readonly}) {
%pack = transform.structured.match ops{["tensor.pack"]} in %module_op
: (!transform.any_op) -> !transform.op<"tensor.pack">
transform.structured.lower_pack %pack : (!transform.op<"tensor.pack">)
-> (!transform.op<"tensor.pad">, !transform.op<"tensor.expand_shape">, !transform.op<"linalg.transpose">)
transform.yield
}
}

Both of these tests are called func.func @pack, so it's hard to tell what the difference is? Are they duplicating one another? IIUC, these are not "pad-like"?

Suggestion 2

  • Update the name of these tests ^^^ with something more descriptive so that it's easier to identify the edge cases being tested.

I'm not really sure where a test like this belongs. If you have a suggestion for where we could put it that would be very helpful!

Sure, happy to help! I see multiple options:

Option 1: Add a test file in test/Dialect/Linalg (e.g. "transform-lower-pack-and-fuse.mlir") and test it there. Looks like there's already transform-tile-and-fuse.mlir, so use that as your source of inspiration :)

Option 2: Some e2e tests exercise complex compilation pipelines, e.g. pack-dynamic-inner-tile.mlir. But that's quite a heavy hammer.

Option 3: Add a bit more powerful TD Op that would include fusion and various related patterns that create fusion opportunities.

I'd start with Option 1 and leave Option 2 and Option 3 for the future. On a related note, we should start creating sub-directories in test/Dialect/Linalg/ - there's too many tests prefixed "transform-{}.mlir" 😅


Hope this helps and thanks for working on this 🙏🏻 !

@jerryyin
Copy link
Member Author

jerryyin commented Dec 5, 2024

@banach-space Thanks for the review and suggestions. It's very helpful to see so much details in your review!

For suggestion 1, please see commit 671829e that adds back the unit test with attributes.

  • It looks to be a reasonable decision that since the PR has touched function interfaces, this isn't exactly a NFC PR. Therefore, I'm amending the title of the PR to get rid of [NFC].
  • Review feedback wise, while I can understand those additional CHECK after the CHECK-NOT adds the readability of the unit test, I still think those CHECKs are duplicated coverage with previous test cases. I'm following your suggestion for this since I don't have strong opinion anyways.

For suggestion 2, option 1, see commit 3ad8cd6 for your suggested test case to exercise consumer and producer fusion.

  • I picked Option 1, and created the bare minimal example to illustrate the additional fusion that can be done with new attributes.
  • Note that I am skipping the comments (Like suggestion 2 main body) around the test cases that already exist before the PR, as I'd like this PR to be focused about the new attributes added, and not distracted with additional refactoring that we may discover. I'd have no problem filing follow-up commits or PRs if we are on the same page with the main body of this PR.

Thanks again for your suggestions. Please take a second look of the PR and let me know if you have additional questions/concerns.

@jerryyin jerryyin changed the title [MLIR][NFC] Add allow Insert/extract slice option to pack/unpack op [MLIR] Add allow Insert/extract slice option to pack/unpack op Dec 5, 2024
@banach-space
Copy link
Contributor

Thanks for the updates!

I've run out of cycles today, but this will be my priority for Monday morning. Apologies for the delay!

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my comments so comprehensively and for splitting your replies into separate commits - makes reviewing so much easier 🙏🏻

I've left a few minor comments inline, otherwise looks good to me.

Under such circumstances, such insert and extract slice ops will block producer/consumer fusion tile + fuse transforms.

Somewhat tangential to this PR, but it would be good to document whether this limitation is meant to be lifted at some point?

- Added additional test cases to demonstrate insert/extract slice will
  block producer/consumer fusion
- Readability enahncements
Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % one final minor request.

Thanks for working on this!

This help to clearly demonstrate the produer fusion in pack case and
consumer fusion in unpack case.
@jerryyin jerryyin force-pushed the amend-packop-transform branch from fbd32d1 to f6c54e8 Compare December 9, 2024 22:06
@nirvedhmeshram nirvedhmeshram merged commit d6590c1 into llvm:main Dec 10, 2024
8 checks passed
nirvedhmeshram added a commit to iree-org/iree that referenced this pull request Dec 18, 2024
… shapes (#19484)

This PR does two things
1. Allow all GEMM shapes to use padded TileAndFuse Matmul configuration.
This is still behind the
`iree-codegen-llvmgpu-test-tile-and-fuse-matmul=false` flag by default
and does not change the default behavior. However following PRs that
have landed in the past month make it possible to relax the guards we
originally had on this.
#19196
#19307
llvm/llvm-project#117340
2. Allow fused producers to use use padded TileAndFuse Matmul
configuration. Following PRs make this possible now
#19399
llvm/llvm-project#119039

Together this allows us to do padded IGEMM with intrinsics for shapes
unaligned to intrinsic which we use by default.
[Here](https://docs.google.com/spreadsheets/d/1O-SdUZCn5pHsxx7JTGjIIdH6PWCFnvlfe4XBbjEBaIM/edit?gid=0#gid=0)
is the performance difference observed in conv cases in
iree-kernel-benchmark-module that utilize this change. A median speedup
of 2.26x was observed.

The numeric changes I observed with enabling this path were the same
between any aligned shape when comparing intrinsic vs no intrinsic use.
Generally some differences are noticed for narrow types like f16 but
they are within a relative error of 0.001 but since our tests use
absolute errors we may have to change some test values to account for
this change.

The perf difference in CI seem to be within noise margin compared to
main,
https://github.com/iree-org/iree/actions/runs/12323399269/attempts/1#summary-34399247902

---------

Signed-off-by: Nirvedh <[email protected]>
nirvedhmeshram pushed a commit to iree-org/iree that referenced this pull request Jan 3, 2025
…lt (#19590)

This PR is a follow-up to
llvm/llvm-project#117340.

It disables `lowerPadLikeWithInsertSlice` and
`lowerUnpadLikeWithExtractSlice` so `insertslice` or `extractslice`
won't appear when high dimensions are unit dimensions.

---------

Signed-off-by: jerryyin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants