[MLIR] Add allow Insert/extract slice option to pack/unpack op #117340

jerryyin · 2024-11-22T16:02:50Z

This PR adds default option below. The new options will come as default to true and not change the original lowering behavior of pack and unpack op.

lowerPadLikeWithInsertSlice to packOp (with default = true)
lowerUnpadLikeWithExtractSlice to unPackOp (with default = true)

The motivation of the PR is finer granular control of the lowering of pack and unpack Ops. This is useful in particular when we want to guarantee that there's no additional insertslice and extractslice that interfere with tiling. With the original lowering pipeline, packOp and unPackOp may be lowered to insertslice and extractslice when the high dimensions are unit dimensions and no transpose is invovled. Under such circumstances, such insert and extract slice ops will block producer/consumer fusion tile + fuse transforms. With this PR, we will be able to disable such lowering path and allow consumer fusion to go through as expected.

…ering to UnPackOp

llvmbot · 2024-11-22T16:03:26Z

@llvm/pr-subscribers-mlir

Author: Zhuoran Yin (jerryyin)

Changes

This PR adds default option below. The new options will come as default to true and not change the original lowering behavior of pack and unpack op.

allowInsertSliceLowering to packOp (with default = true)
allowExtractSliceLowering to unPackOp (with default = true)

The motivation of the PR is finer granular control of the lowering of pack and unpack Ops. This is useful in particular when we want to guarantee that there's no additional insertslice and extractslice that interfere with tiling. With the original lowering pipeline, packOp and unPackOp may be lowered to insertslice and extractslice when the high dimensions are unit dimensions and no transpose is invovled. Under such circumstances, such insert and extract slice ops will block consumer fusion for TileAndFuse pipeline. With this PR, we will be able to disable such lowering path and allow consumer fusion to go through as expected.

Full diff: https://github.com/llvm/llvm-project/pull/117340.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td (+4-2)
(modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+5-3)
(modified) mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp (+6-2)
(modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+7-5)

diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index e3084530bd11b5..ea96da77b6c331 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -548,7 +548,8 @@ def LowerPackOp : Op<Transform_Dialect, "structured.lower_pack", [
     Return handles to the newly produced pad, expand_shape and transpose ops.
   }];
 
-  let arguments = (ins Transform_ConcreteOpType<"tensor.pack">:$target);
+  let arguments = (ins Transform_ConcreteOpType<"tensor.pack">:$target,
+                       DefaultValuedAttr<BoolAttr, "true">:$allowInsertSliceLowering);
   let results = (outs Transform_ConcreteOpType<"tensor.pad">:$pad_op,
                       Transform_ConcreteOpType<"tensor.expand_shape">:$expand_shape_op,
                       Transform_ConcreteOpType<"linalg.transpose">:$transpose_op);
@@ -588,7 +589,8 @@ def LowerUnPackOp : Op<Transform_Dialect, "structured.lower_unpack", [
     Return handles to the newly produced empty, transpose, collapse_shape and extract_slice ops.
   }];
 
-  let arguments = (ins Transform_ConcreteOpType<"tensor.unpack">:$target);
+  let arguments = (ins Transform_ConcreteOpType<"tensor.unpack">:$target,
+                       DefaultValuedAttr<BoolAttr, "true">:$allowExtractSliceLowering);
   let results = (outs Transform_ConcreteOpType<"tensor.empty">:$empty_op,
                       Transform_ConcreteOpType<"linalg.transpose">:$transpose_op,
                       Transform_ConcreteOpType<"tensor.collapse_shape">:$collapse_shape_op,
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 51967f83fee377..fd27e7929764d3 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -1121,7 +1121,8 @@ struct LowerPackResult {
 
 /// Rewrite pack as pad + reshape + transpose.
 FailureOr<LowerPackResult> lowerPack(RewriterBase &rewriter,
-                                     tensor::PackOp packOp);
+                                     tensor::PackOp packOp,
+                                     bool allowInsertSliceLowering = true);
 
 struct LowerUnPackOpResult {
   tensor::EmptyOp emptyOp;
@@ -1131,8 +1132,9 @@ struct LowerUnPackOpResult {
 };
 
 /// Rewrite pack as empty + transpose + reshape + extract_slice.
-FailureOr<LowerUnPackOpResult> lowerUnPack(RewriterBase &rewriter,
-                                           tensor::UnPackOp unPackOp);
+FailureOr<LowerUnPackOpResult>
+lowerUnPack(RewriterBase &rewriter, tensor::UnPackOp unPackOp,
+            bool allowExtractSliceLowering = true);
 
 /// Struct to hold the result of a `pack` call.
 struct PackResult {
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index ada80deacfdbfe..5117a5c58c381d 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -1171,7 +1171,9 @@ DiagnosedSilenceableFailure transform::LowerPackOp::applyToOne(
     transform::ApplyToEachResultList &transformResults,
     transform::TransformState &state) {
   rewriter.setInsertionPoint(target);
-  FailureOr<LowerPackResult> res = lowerPack(rewriter, target);
+  bool allowInsertSliceLowering = getAllowInsertSliceLowering();
+  FailureOr<LowerPackResult> res =
+      lowerPack(rewriter, target, allowInsertSliceLowering);
   if (failed(res)) {
     return mlir::emitSilenceableFailure(target->getLoc())
            << "cannot lower to pad + expand + transpose";
@@ -1191,7 +1193,9 @@ DiagnosedSilenceableFailure transform::LowerUnPackOp::applyToOne(
     transform::ApplyToEachResultList &transformResults,
     transform::TransformState &state) {
   rewriter.setInsertionPoint(target);
-  FailureOr<LowerUnPackOpResult> res = lowerUnPack(rewriter, target);
+  bool allowExtractSliceLowering = getAllowExtractSliceLowering();
+  FailureOr<LowerUnPackOpResult> res =
+      lowerUnPack(rewriter, target, allowExtractSliceLowering);
   if (failed(res)) {
     DiagnosedSilenceableFailure diag =
         emitSilenceableError()
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index d92543d7264625..0717dad4c2852f 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -217,7 +217,8 @@ struct PackedOperandsDimList {
 } // namespace
 
 FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
-                                             tensor::PackOp packOp) {
+                                             tensor::PackOp packOp,
+                                             bool allowInsertSliceLowering) {
   // 1. Filter out NYI cases.
   auto packedTensorType =
       cast<RankedTensorType>(packOp->getResultTypes().front());
@@ -295,7 +296,7 @@ FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
       llvm::interleaveComma(stripMinedShape, DBGS() << "stripMinedShape: ");
       DBGSNL(); DBGS() << "collapsed type: " << collapsed; DBGSNL(););
 
-  if (packOp.isLikePad()) {
+  if (allowInsertSliceLowering && packOp.isLikePad()) {
     // Pack ops which operate as simple pads may not produce legal
     // tensor.insert_slice operations when the packed type does not rank reduce
     // to the padded type.
@@ -351,8 +352,9 @@ FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
   return LowerPackResult{padOp, reshapeOp, transposeOp};
 }
 
-FailureOr<LowerUnPackOpResult> linalg::lowerUnPack(RewriterBase &rewriter,
-                                                   tensor::UnPackOp unPackOp) {
+FailureOr<LowerUnPackOpResult>
+linalg::lowerUnPack(RewriterBase &rewriter, tensor::UnPackOp unPackOp,
+                    bool allowExtractSliceLowering) {
   Location loc = unPackOp->getLoc();
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(unPackOp);
@@ -362,7 +364,7 @@ FailureOr<LowerUnPackOpResult> linalg::lowerUnPack(RewriterBase &rewriter,
 
   OpFoldResult zero = rewriter.getIndexAttr(0), one = rewriter.getIndexAttr(1);
   auto destTensorType = cast<RankedTensorType>(unPackOp.getDest().getType());
-  if (unPackOp.isLikeUnPad()) {
+  if (allowExtractSliceLowering && unPackOp.isLikeUnPad()) {
     // This unpack is just a plain unpad.
     // Just extract the slice from the higher ranked tensor.
     ArrayRef<int64_t> destShape = destTensorType.getShape();

llvmbot · 2024-11-22T16:03:26Z

@llvm/pr-subscribers-mlir-linalg

Author: Zhuoran Yin (jerryyin)

Changes

This PR adds default option below. The new options will come as default to true and not change the original lowering behavior of pack and unpack op.

allowInsertSliceLowering to packOp (with default = true)
allowExtractSliceLowering to unPackOp (with default = true)

The motivation of the PR is finer granular control of the lowering of pack and unpack Ops. This is useful in particular when we want to guarantee that there's no additional insertslice and extractslice that interfere with tiling. With the original lowering pipeline, packOp and unPackOp may be lowered to insertslice and extractslice when the high dimensions are unit dimensions and no transpose is invovled. Under such circumstances, such insert and extract slice ops will block consumer fusion for TileAndFuse pipeline. With this PR, we will be able to disable such lowering path and allow consumer fusion to go through as expected.

Full diff: https://github.com/llvm/llvm-project/pull/117340.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td (+4-2)
(modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+5-3)
(modified) mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp (+6-2)
(modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+7-5)

diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index e3084530bd11b5..ea96da77b6c331 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -548,7 +548,8 @@ def LowerPackOp : Op<Transform_Dialect, "structured.lower_pack", [
     Return handles to the newly produced pad, expand_shape and transpose ops.
   }];
 
-  let arguments = (ins Transform_ConcreteOpType<"tensor.pack">:$target);
+  let arguments = (ins Transform_ConcreteOpType<"tensor.pack">:$target,
+                       DefaultValuedAttr<BoolAttr, "true">:$allowInsertSliceLowering);
   let results = (outs Transform_ConcreteOpType<"tensor.pad">:$pad_op,
                       Transform_ConcreteOpType<"tensor.expand_shape">:$expand_shape_op,
                       Transform_ConcreteOpType<"linalg.transpose">:$transpose_op);
@@ -588,7 +589,8 @@ def LowerUnPackOp : Op<Transform_Dialect, "structured.lower_unpack", [
     Return handles to the newly produced empty, transpose, collapse_shape and extract_slice ops.
   }];
 
-  let arguments = (ins Transform_ConcreteOpType<"tensor.unpack">:$target);
+  let arguments = (ins Transform_ConcreteOpType<"tensor.unpack">:$target,
+                       DefaultValuedAttr<BoolAttr, "true">:$allowExtractSliceLowering);
   let results = (outs Transform_ConcreteOpType<"tensor.empty">:$empty_op,
                       Transform_ConcreteOpType<"linalg.transpose">:$transpose_op,
                       Transform_ConcreteOpType<"tensor.collapse_shape">:$collapse_shape_op,
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 51967f83fee377..fd27e7929764d3 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -1121,7 +1121,8 @@ struct LowerPackResult {
 
 /// Rewrite pack as pad + reshape + transpose.
 FailureOr<LowerPackResult> lowerPack(RewriterBase &rewriter,
-                                     tensor::PackOp packOp);
+                                     tensor::PackOp packOp,
+                                     bool allowInsertSliceLowering = true);
 
 struct LowerUnPackOpResult {
   tensor::EmptyOp emptyOp;
@@ -1131,8 +1132,9 @@ struct LowerUnPackOpResult {
 };
 
 /// Rewrite pack as empty + transpose + reshape + extract_slice.
-FailureOr<LowerUnPackOpResult> lowerUnPack(RewriterBase &rewriter,
-                                           tensor::UnPackOp unPackOp);
+FailureOr<LowerUnPackOpResult>
+lowerUnPack(RewriterBase &rewriter, tensor::UnPackOp unPackOp,
+            bool allowExtractSliceLowering = true);
 
 /// Struct to hold the result of a `pack` call.
 struct PackResult {
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index ada80deacfdbfe..5117a5c58c381d 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -1171,7 +1171,9 @@ DiagnosedSilenceableFailure transform::LowerPackOp::applyToOne(
     transform::ApplyToEachResultList &transformResults,
     transform::TransformState &state) {
   rewriter.setInsertionPoint(target);
-  FailureOr<LowerPackResult> res = lowerPack(rewriter, target);
+  bool allowInsertSliceLowering = getAllowInsertSliceLowering();
+  FailureOr<LowerPackResult> res =
+      lowerPack(rewriter, target, allowInsertSliceLowering);
   if (failed(res)) {
     return mlir::emitSilenceableFailure(target->getLoc())
            << "cannot lower to pad + expand + transpose";
@@ -1191,7 +1193,9 @@ DiagnosedSilenceableFailure transform::LowerUnPackOp::applyToOne(
     transform::ApplyToEachResultList &transformResults,
     transform::TransformState &state) {
   rewriter.setInsertionPoint(target);
-  FailureOr<LowerUnPackOpResult> res = lowerUnPack(rewriter, target);
+  bool allowExtractSliceLowering = getAllowExtractSliceLowering();
+  FailureOr<LowerUnPackOpResult> res =
+      lowerUnPack(rewriter, target, allowExtractSliceLowering);
   if (failed(res)) {
     DiagnosedSilenceableFailure diag =
         emitSilenceableError()
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index d92543d7264625..0717dad4c2852f 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -217,7 +217,8 @@ struct PackedOperandsDimList {
 } // namespace
 
 FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
-                                             tensor::PackOp packOp) {
+                                             tensor::PackOp packOp,
+                                             bool allowInsertSliceLowering) {
   // 1. Filter out NYI cases.
   auto packedTensorType =
       cast<RankedTensorType>(packOp->getResultTypes().front());
@@ -295,7 +296,7 @@ FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
       llvm::interleaveComma(stripMinedShape, DBGS() << "stripMinedShape: ");
       DBGSNL(); DBGS() << "collapsed type: " << collapsed; DBGSNL(););
 
-  if (packOp.isLikePad()) {
+  if (allowInsertSliceLowering && packOp.isLikePad()) {
     // Pack ops which operate as simple pads may not produce legal
     // tensor.insert_slice operations when the packed type does not rank reduce
     // to the padded type.
@@ -351,8 +352,9 @@ FailureOr<LowerPackResult> linalg::lowerPack(RewriterBase &rewriter,
   return LowerPackResult{padOp, reshapeOp, transposeOp};
 }
 
-FailureOr<LowerUnPackOpResult> linalg::lowerUnPack(RewriterBase &rewriter,
-                                                   tensor::UnPackOp unPackOp) {
+FailureOr<LowerUnPackOpResult>
+linalg::lowerUnPack(RewriterBase &rewriter, tensor::UnPackOp unPackOp,
+                    bool allowExtractSliceLowering) {
   Location loc = unPackOp->getLoc();
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(unPackOp);
@@ -362,7 +364,7 @@ FailureOr<LowerUnPackOpResult> linalg::lowerUnPack(RewriterBase &rewriter,
 
   OpFoldResult zero = rewriter.getIndexAttr(0), one = rewriter.getIndexAttr(1);
   auto destTensorType = cast<RankedTensorType>(unPackOp.getDest().getType());
-  if (unPackOp.isLikeUnPad()) {
+  if (allowExtractSliceLowering && unPackOp.isLikeUnPad()) {
     // This unpack is just a plain unpad.
     // Just extract the slice from the higher ranked tensor.
     ArrayRef<int64_t> destShape = destTensorType.getShape();

allowExtractSliceLowering

nirvedhmeshram

LGTM!

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

joker-eph

Nit maybe, but the fact that you have tests in a NFC change is a bit surprising to me?

mlir/test/Dialect/Linalg/transform-lower-pack.mlir

banach-space · 2024-11-25T14:32:15Z

Under such circumstances, such insert and extract slice ops will block producer/consumer fusion tile + fuse transforms. With this PR, we will be able to disable such lowering path and allow consumer fusion to go through as expected.

Why not add tests for that? I don't really know how fusion is tested, but this summary implies that we should be writing tests showing how these things compose? And demonstrating the impact of this new flag.

On a different note, there's lower{Un}Pack and there's DecomposeOuterUnitDimsPackOpPattern. We should unify these things at some point.

Max191 · 2024-11-25T14:57:20Z

On a different note, there's lower{Un}Pack and there's DecomposeOuterUnitDimsPackOpPattern. We should unify these things at some point.

I agree it is good to unify and reduce the overall number of patterns, but just want to note that we should make sure we still have the same type of control over how/which patterns are applied (that's probably what you were suggesting anyway, but just wanted to add that comment).

banach-space · 2024-11-25T15:08:36Z

just want to note that we should make sure we still have the same type of control over how/which patterns are applied

Agreed - we shouldn't bundle things pre-maturely. However, if these patterns effectively duplicate one another ... That's a topic for another PR though ;-)

Max191 · 2024-11-25T15:13:18Z

Why not add tests for that? I don't really know how fusion is tested, but this summary implies that we should be writing tests showing how these things compose? And demonstrating the impact of this new flag.

I'm not really sure where a test like this belongs. If you have a suggestion for where we could put it that would be very helpful!

Here is an example case where this helps:

%producer = tensor.pack %src into %dest ...
%consumer_loop = scf.for ... {
  %slice = tensor.extract_slice %producer ...
  %consumer = "some_op" %slice ...
  ...
}

After decomposition this becomes one of 2 options:

// With the slice lowering:
%pad = tensor.pad %src ...
%producer = tensor.insert_slice %pad into %dest ...
%consumer_loop = scf.for ... {
  %slice = tensor.extract_slice %producer ...
  %consumer = "some_op" %slice ...
  ...
}

// With the non-slice (expand_shape/transpose) lowering\
%pad = tensor.pad %src
%expand = tensor.expand_shape %pad ...
%producer = linalg.transpose ins(%expand ...
%consumer_loop = scf.for ... {
  %slice = tensor.extract_slice %producer ...
  %consumer = "some_op" %slice ...
  ...
}

If we then want to run fusion of %producer into the %consumer_loop, then this would be blocked by the insert_slice in the first (slice) case, but not in the second (non-slice) case.

So if we want to test this specific case, then we would be testing the composition of these 2 transform ops, and I'm not sure where the correct test directory for this is. Alternatively, we can add a test for the fusion case above (which would be expected to fail fusion), but I'm not sure if that's a very useful test.

jerryyin · 2024-11-25T15:34:42Z

I see discussions converging here. My next commit is going to be:

Get rid of the unit test
Rename allowInsertSliceLowering to lowerPadLikeWithInsertSlice
Rename allowExtractSliceLowering to lowerUnpadLikeWithExtractSlice

- Renamed allowInsertSliceLowering to lowerPadLikeWithInsertSlice - Renamed allowExtractSliceLowering to lowerUnpadLikeWithExtractSlice - Removed the redundant unit test since this is NFC change This reverts commit 46b7202.

hanhanW · 2024-11-25T17:56:20Z

On a different note, there's lower{Un}Pack and there's DecomposeOuterUnitDimsPackOpPattern. We should unify these things at some point.

It should be much easier than before today. People have filled the gaps in the past one year.

banach-space · 2024-11-26T10:08:45Z

I see discussions converging here. My next commit is going to be:

Get rid of the unit test

I would keep the unit test.

When I asked about the test that you added (and subsequently removed), it just wasn't clear to me what to look for and what the difference with the existing tests should be. The comment from @Max191 makes that very clear:

// Current
func.func @pack_as_pad_default() {
  %pad = tensor.pad %src ...
  %producer = tensor.insert_slice %pad into %dest
}

// New
func.func @pack_as_pad_{disable|no|skip}_insert_slice() {
  %pad = tensor.pad %src
  %expand = tensor.expand_shape %pad ...
  %producer = linalg.transpose ins(%expand ..
}

Suggestion 1

This PR should include a test that follows the pattern above ^^^.

I think that the following test would be sufficient and very illustrative (trimmed for brevity - feel free to re-use/expand):

// CHECK-LABEL: func.func @pack_disallowed_as_pad(
func.func @pack_disallowed_as_pad(%arg0, %arg1) {

  //      CHECK: %[[PAD:.*]] = tensor.pad %[[SRC]] 
  //  CHECK-NOT: %[[RES:.*]] = tensor.insert_slice %[[PAD]]
  //      CHECK: %[[PAD_EXPANDED:.*]] = tensor.expand_shape %[[PAD]]
  //      CHECK: %[[RES:.*]] = linalg.transpose %[[PAD_EXPANDED]]
  %pack = tensor.pack %arg0 padding_value(%cst_0 : f32) inner_dims_pos = [0, 1, 2, 3] inner_tiles = [136, 64, 16, 16] into %arg1
    : tensor<129x47x16x16xf32> -> tensor<1x1x1x1x136x64x16x16xf32>
  return %pack :  tensor<1x1x1x1x136x64x16x16xf32>
}

IIUC, these are the existing tests that exercise similar path:

llvm-project/mlir/test/Dialect/Linalg/transform-lower-pack.mlir

Lines 3 to 60 in 1b2c8f1

    
             // CHECK-LABEL: func.func @pack( 
        
           func.func @pack(%arg0: tensor<129x47x16x16xf32>, %arg1: tensor<17x2x16x16x32x8xf32>) -> tensor<17x2x16x16x32x8xf32> { 
        
             %cst_0 = arith.constant 0.0 : f32 
        
             // tensor.pack is lowered to tensor.pad + tensor.expand_shape + linalg.transpose 
        
             //      CHECK: tensor.pad {{.*}} low[0, 0, 0, 0] 
        
             //      CHECK:   : tensor<129x47x16x16xf32> to tensor<136x64x16x16xf32> 
        
             //      CHECK: tensor.expand_shape %{{.*}} [{{.*}}[0, 1], [2, 3], [4], [5]] 
        
             // CHECK-SAME:   : tensor<136x64x16x16xf32> into tensor<17x8x2x32x16x16xf32> 
        
             //      CHECK: linalg.transpose 
        
             // CHECK-SAME:   ins(%{{.*}} : tensor<17x8x2x32x16x16xf32>) 
        
             // CHECK-SAME:   outs(%{{.*}} : tensor<17x2x16x16x32x8xf32>) 
        
             // CHECK-SAME:   permutation = [0, 2, 4, 5, 3, 1] 
        
             %pack = tensor.pack %arg0 padding_value(%cst_0 : f32) inner_dims_pos = [1, 0] inner_tiles = [32, 8] into %arg1 
        
               : tensor<129x47x16x16xf32> -> tensor<17x2x16x16x32x8xf32> 
        
             return %pack : tensor<17x2x16x16x32x8xf32> 
        
           } 
        
           module attributes {transform.with_named_sequence} { 
        
             transform.named_sequence @__transform_main(%module_op: !transform.any_op {transform.readonly}) { 
        
               %pack = transform.structured.match ops{["tensor.pack"]} in %module_op 
        
                 : (!transform.any_op) -> !transform.op<"tensor.pack"> 
        
               transform.structured.lower_pack %pack : (!transform.op<"tensor.pack">) 
        
                 -> (!transform.op<"tensor.pad">, !transform.op<"tensor.expand_shape">, !transform.op<"linalg.transpose">) 
        
                 transform.yield 
        
             } 
        
           } 
        
           // ----- 
        
             // CHECK-LABEL: func.func @pack( 
        
           func.func @pack(%arg0: tensor<128x8xf32>, %arg1: tensor<8x8x16x1xf32>) -> tensor<8x8x16x1xf32> { 
        
             // tensor.pack is lowered to tensor.pad + tensor.expand_shape + linalg.transpose 
        
             //      CHECK: tensor.pad {{.*}} low[0, 0] 
        
             //      CHECK:   : tensor<128x8xf32> to tensor<128x8xf32> 
        
             //      CHECK: tensor.expand_shape %{{.*}} [{{.*}}[0, 1], [2, 3]] 
        
             // CHECK-SAME:   : tensor<128x8xf32> into tensor<8x16x8x1xf32> 
        
             //      CHECK: linalg.transpose 
        
             // CHECK-SAME:   ins(%{{.*}} : tensor<8x16x8x1xf32>) 
        
             // CHECK-SAME:   outs(%{{.*}} : tensor<8x8x16x1xf32>) 
        
             // CHECK-SAME:   permutation = [0, 2, 1, 3] 
        
             %pack = tensor.pack %arg0 inner_dims_pos = [0, 1] inner_tiles = [16, 1] into %arg1 
        
               : tensor<128x8xf32> -> tensor<8x8x16x1xf32> 
        
             return %pack : tensor<8x8x16x1xf32> 
        
           } 
        
           module attributes {transform.with_named_sequence} { 
        
             transform.named_sequence @__transform_main(%module_op: !transform.any_op {transform.readonly}) { 
        
               %pack = transform.structured.match ops{["tensor.pack"]} in %module_op 
        
                 : (!transform.any_op) -> !transform.op<"tensor.pack"> 
        
               transform.structured.lower_pack %pack : (!transform.op<"tensor.pack">) 
        
                 -> (!transform.op<"tensor.pad">, !transform.op<"tensor.expand_shape">, !transform.op<"linalg.transpose">) 
        
                 transform.yield 
        
             } 
        
           }

Both of these tests are called func.func @pack, so it's hard to tell what the difference is? Are they duplicating one another? IIUC, these are not "pad-like"?

Suggestion 2

Update the name of these tests ^^^ with something more descriptive so that it's easier to identify the edge cases being tested.

I'm not really sure where a test like this belongs. If you have a suggestion for where we could put it that would be very helpful!

Sure, happy to help! I see multiple options:

Option 1: Add a test file in test/Dialect/Linalg (e.g. "transform-lower-pack-and-fuse.mlir") and test it there. Looks like there's already transform-tile-and-fuse.mlir, so use that as your source of inspiration :)

Option 2: Some e2e tests exercise complex compilation pipelines, e.g. pack-dynamic-inner-tile.mlir. But that's quite a heavy hammer.

Option 3: Add a bit more powerful TD Op that would include fusion and various related patterns that create fusion opportunities.

I'd start with Option 1 and leave Option 2 and Option 3 for the future. On a related note, we should start creating sub-directories in test/Dialect/Linalg/ - there's too many tests prefixed "transform-{}.mlir" 😅

Hope this helps and thanks for working on this 🙏🏻 !

…wering

jerryyin · 2024-12-05T16:06:37Z

@banach-space Thanks for the review and suggestions. It's very helpful to see so much details in your review!

For suggestion 1, please see commit 671829e that adds back the unit test with attributes.

It looks to be a reasonable decision that since the PR has touched function interfaces, this isn't exactly a NFC PR. Therefore, I'm amending the title of the PR to get rid of [NFC].
Review feedback wise, while I can understand those additional CHECK after the CHECK-NOT adds the readability of the unit test, I still think those CHECKs are duplicated coverage with previous test cases. I'm following your suggestion for this since I don't have strong opinion anyways.

For suggestion 2, option 1, see commit 3ad8cd6 for your suggested test case to exercise consumer and producer fusion.

I picked Option 1, and created the bare minimal example to illustrate the additional fusion that can be done with new attributes.
Note that I am skipping the comments (Like suggestion 2 main body) around the test cases that already exist before the PR, as I'd like this PR to be focused about the new attributes added, and not distracted with additional refactoring that we may discover. I'd have no problem filing follow-up commits or PRs if we are on the same page with the main body of this PR.

Thanks again for your suggestions. Please take a second look of the PR and let me know if you have additional questions/concerns.

banach-space · 2024-12-06T17:40:33Z

Thanks for the updates!

I've run out of cycles today, but this will be my priority for Monday morning. Apologies for the delay!

banach-space

Thank you for addressing my comments so comprehensively and for splitting your replies into separate commits - makes reviewing so much easier 🙏🏻

I've left a few minor comments inline, otherwise looks good to me.

Under such circumstances, such insert and extract slice ops will block producer/consumer fusion tile + fuse transforms.

Somewhat tangential to this PR, but it would be good to document whether this limitation is meant to be lifted at some point?

mlir/test/Dialect/Linalg/transform-lower-pack.mlir

mlir/test/Dialect/Linalg/transform-tile-and-fuse-pack-unpack.mlir

- Added additional test cases to demonstrate insert/extract slice will block producer/consumer fusion - Readability enahncements

mlir/test/Dialect/Linalg/transform-tile-and-fuse-pack-unpack.mlir

banach-space

LGTM % one final minor request.

Thanks for working on this!

This help to clearly demonstrate the produer fusion in pack case and consumer fusion in unpack case.

… shapes (#19484) This PR does two things 1. Allow all GEMM shapes to use padded TileAndFuse Matmul configuration. This is still behind the `iree-codegen-llvmgpu-test-tile-and-fuse-matmul=false` flag by default and does not change the default behavior. However following PRs that have landed in the past month make it possible to relax the guards we originally had on this. #19196 #19307 llvm/llvm-project#117340 2. Allow fused producers to use use padded TileAndFuse Matmul configuration. Following PRs make this possible now #19399 llvm/llvm-project#119039 Together this allows us to do padded IGEMM with intrinsics for shapes unaligned to intrinsic which we use by default. [Here](https://docs.google.com/spreadsheets/d/1O-SdUZCn5pHsxx7JTGjIIdH6PWCFnvlfe4XBbjEBaIM/edit?gid=0#gid=0) is the performance difference observed in conv cases in iree-kernel-benchmark-module that utilize this change. A median speedup of 2.26x was observed. The numeric changes I observed with enabling this path were the same between any aligned shape when comparing intrinsic vs no intrinsic use. Generally some differences are noticed for narrow types like f16 but they are within a relative error of 0.001 but since our tests use absolute errors we may have to change some test values to account for this change. The perf difference in CI seem to be within noise margin compared to main, https://github.com/iree-org/iree/actions/runs/12323399269/attempts/1#summary-34399247902 --------- Signed-off-by: Nirvedh <[email protected]>

…lt (#19590) This PR is a follow-up to llvm/llvm-project#117340. It disables `lowerPadLikeWithInsertSlice` and `lowerUnpadLikeWithExtractSlice` so `insertslice` or `extractslice` won't appear when high dimensions are unit dimensions. --------- Signed-off-by: jerryyin <[email protected]>

[NFC] Add allowInsertSliceLowering to packOp and allowExtractSliceLow…

012f6d4

…ering to UnPackOp

jerryyin requested review from ftynse, nicolasvasilache, dcaballe, hanhanW and rengolin as code owners November 22, 2024 16:02

llvmbot added mlir:linalg mlir labels Nov 22, 2024

jerryyin changed the title ~~[NFC] Add allow Insert/extract slice option to pack/unpack op~~ [MLIR][NFC] Add allow Insert/extract slice option to pack/unpack op Nov 22, 2024

nirvedhmeshram requested a review from Max191 November 22, 2024 16:06

banach-space self-requested a review November 22, 2024 16:08

This commit add test cases to allowInsertSliceLowering and

46b7202

allowExtractSliceLowering

nirvedhmeshram approved these changes Nov 22, 2024

View reviewed changes

Max191 approved these changes Nov 25, 2024

View reviewed changes

banach-space reviewed Nov 25, 2024

View reviewed changes

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp Outdated Show resolved Hide resolved

joker-eph reviewed Nov 25, 2024

View reviewed changes

banach-space reviewed Nov 25, 2024

View reviewed changes

mlir/test/Dialect/Linalg/transform-lower-pack.mlir Outdated Show resolved Hide resolved

Address review requests

0fa5401

- Renamed allowInsertSliceLowering to lowerPadLikeWithInsertSlice - Renamed allowExtractSliceLowering to lowerUnpadLikeWithExtractSlice - Removed the redundant unit test since this is NFC change This reverts commit 46b7202.

jerryyin requested review from joker-eph and banach-space November 25, 2024 16:56

Adding test cases to allowInsertSliceLowering and allowExtractSliceLo…

671829e

…wering

nirvedhmeshram mentioned this pull request Dec 2, 2024

TileAndFuse configuration gives numeric error iree-org/iree#19343

Closed

jerryyin self-assigned this Dec 4, 2024

Add test to verify pack/producer unpack/consumer fusion

3ad8cd6

jerryyin changed the title ~~[MLIR][NFC] Add allow Insert/extract slice option to pack/unpack op~~ [MLIR] Add allow Insert/extract slice option to pack/unpack op Dec 5, 2024

banach-space reviewed Dec 9, 2024

View reviewed changes

mlir/test/Dialect/Linalg/transform-lower-pack.mlir Outdated Show resolved Hide resolved

mlir/test/Dialect/Linalg/transform-lower-pack.mlir Outdated Show resolved Hide resolved

mlir/test/Dialect/Linalg/transform-tile-and-fuse-pack-unpack.mlir Show resolved Hide resolved

banach-space reviewed Dec 9, 2024

View reviewed changes

mlir/test/Dialect/Linalg/transform-tile-and-fuse-pack-unpack.mlir Outdated Show resolved Hide resolved

Adding additional negative test cases

68b5328

- Added additional test cases to demonstrate insert/extract slice will block producer/consumer fusion - Readability enahncements

jerryyin requested a review from banach-space December 9, 2024 17:03

jerryyin mentioned this pull request Dec 9, 2024

Request Commit Access for jerryyin #119237

Closed

banach-space reviewed Dec 9, 2024

View reviewed changes

mlir/test/Dialect/Linalg/transform-tile-and-fuse-pack-unpack.mlir Outdated Show resolved Hide resolved

banach-space approved these changes Dec 9, 2024

View reviewed changes

Add additional LIT variable

f6c54e8

This help to clearly demonstrate the produer fusion in pack case and consumer fusion in unpack case.

jerryyin force-pushed the amend-packop-transform branch from fbd32d1 to f6c54e8 Compare December 9, 2024 22:06

nirvedhmeshram merged commit d6590c1 into llvm:main Dec 10, 2024
8 checks passed

nirvedhmeshram mentioned this pull request Dec 13, 2024

[GPU] Use padding in IGEMM pipeline to support unaligned to intrinsic shapes iree-org/iree#19484

Merged

jerryyin mentioned this pull request Jan 2, 2025

[GPU] Disable insert/extract slice lowering from pack/unpack by default iree-org/iree#19590

Merged

[MLIR] Add allow Insert/extract slice option to pack/unpack op #117340

[MLIR] Add allow Insert/extract slice option to pack/unpack op #117340

Uh oh!

Conversation

jerryyin commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 22, 2024

Uh oh!

llvmbot commented Nov 22, 2024

Uh oh!

nirvedhmeshram left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joker-eph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

banach-space commented Nov 25, 2024

Uh oh!

Max191 commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

banach-space commented Nov 25, 2024

Uh oh!

Max191 commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryyin commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanhanW commented Nov 25, 2024

Uh oh!

banach-space commented Nov 26, 2024

Uh oh!

jerryyin commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

banach-space commented Dec 6, 2024

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jerryyin commented Nov 22, 2024 •

edited

Loading

Max191 commented Nov 25, 2024 •

edited

Loading

Max191 commented Nov 25, 2024 •

edited

Loading

jerryyin commented Nov 25, 2024 •

edited

Loading

jerryyin commented Dec 5, 2024 •

edited

Loading