[Tensor] Simplify tenor.pad tiling length calculations. #119039

nirvedhmeshram · 2024-12-06T23:03:25Z

The current calculations calculate ending location of the new length and then subtract the new offset from that location. It is possible to directly calculate new length. Along with requiring less operations (which can matter in dynamic case) this also has the advantage that the values are upper bounded by length rather than source size which is more friendly for range analysis. I believe the change is already being tested by
test/Dialect/Linalg/subtensor-of-padtensor.mlir and test/Dialect/Linalg/tile-and-fuse-tensors.mlir

Signed-off-by: Nirvedh <[email protected]>

llvmbot · 2024-12-06T23:04:00Z

@llvm/pr-subscribers-mlir-tensor

@llvm/pr-subscribers-mlir

Author: Nirvedh Meshram (nirvedhmeshram)

Changes

The current calculations calculate ending location of the tiled length and then subtract the new offset from that location. It is possible to directly caculate length. Along with require less operations (which can matter in dynamic case) this also has the advantage that the values are upper bounded by length rather than source size which is more friendly for range analysis. I believe the change is already being tested by
test/Dialect/Linalg/subtensor-of-padtensor.mlir and test/Dialect/Linalg/tile-and-fuse-tensors.mlir

Full diff: https://github.com/llvm/llvm-project/pull/119039.diff

1 Files Affected:

(modified) mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp (+7-15)

diff --git a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
index 68c3d1cabb11cb..e647e53bc33af6 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
@@ -746,11 +746,6 @@ FailureOr<TilingResult> tensor::bubbleUpPadSlice(OpBuilder &b,
   Location loc = padOp->getLoc();
   AffineExpr dim0, dim1;
   bindDims(b.getContext(), dim0, dim1);
-  // Add two integers.
-  auto addMap = AffineMap::get(2, 0, {dim0 + dim1});
-  auto add = [&](OpFoldResult v1, OpFoldResult v2) {
-    return affine::makeComposedFoldedAffineApply(b, loc, addMap, {v1, v2});
-  };
   // Subtract two integers.
   auto subMap = AffineMap::get(2, 0, {dim0 - dim1});
   auto sub = [&](OpFoldResult v1, OpFoldResult v2) {
@@ -825,16 +820,13 @@ FailureOr<TilingResult> tensor::bubbleUpPadSlice(OpBuilder &b,
     // The original read could also have stopped in the high padding zone.
     // In that case, set the end positition of the read should be the end of
     // the source tensor. (Similar to newOffset.)
-    //
-    // endLoc = min(max(offset - low + length, 0), srcSize)
-    //
-    // The new ExtractSliceOp length is `endLoc - newOffset`.
-    //
-    // Optimization: If low = 0, then the formula can be simplified.
-    OpFoldResult endLoc =
-        hasLowPad ? min(max(add(sub(offset, low), length), zero), srcSize)
-                  : min(add(offset, length), srcSize);
-    OpFoldResult newLength = sub(endLoc, newOffset);
+    // srcSize - newOffset represents how much length we have available
+    // and length - newLow represents how much length we want at most.
+    OpFoldResult newLength = min(sub(srcSize, newOffset), sub(length, newLow));
+    // Optimization: If low = 0, then newLow = 0. then newLength >= 0 assuming
+    // length >= 0.
+    if (hasLowPad)
+      newLength = max(newLength, zero);
     newLengths.push_back(newLength);
 
     // Check if newLength is zero. In that case, no SubTensorOp should be

Max191

This LGTM. It is a bit strange that there are no lit test changes, but I'll approve since it has been tested in a downstream project, and a lit test for all this indexing math is probably a bit of a change detector test anyway.

mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp

Signed-off-by: Nirvedh <[email protected]>

github-actions · 2024-12-12T17:28:25Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff f9e11501841fc602488fea78b88910eab7d4d396 865958f0c9fc63aa28971d3387d16f2e5f8d84a7 --extensions cpp -- mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp

View the diff from clang-format here.

diff --git a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
index 3caf93b140..6e63cf068b 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
@@ -822,7 +822,13 @@ FailureOr<TilingResult> tensor::bubbleUpPadSlice(OpBuilder &b,
     // the source tensor. (Similar to newOffset.)
     // srcSize - newOffset represents how much length we have available
     // and length - newLow represents how much length we want at most.
-    // Note that there are many ways to order this indexing math to compute newLength, but we want to make sure that the final affine.min ops in the sequence are bounding the index to as small a value as possible. If ValueBoundsOpInterface is used, this calcuation will get upper bounds from the affine.min ops, so we want to use the smallest known value to set the bound at the end of the computation sequence. In this case, the index will be upper bounded by length - newLow.
+    // Note that there are many ways to order this indexing math to compute
+    // newLength, but we want to make sure that the final affine.min ops in the
+    // sequence are bounding the index to as small a value as possible. If
+    // ValueBoundsOpInterface is used, this calcuation will get upper bounds
+    // from the affine.min ops, so we want to use the smallest known value to
+    // set the bound at the end of the computation sequence. In this case, the
+    // index will be upper bounded by length - newLow.
     OpFoldResult newLength = min(sub(srcSize, newOffset), sub(length, newLow));
     // Optimization: If low = 0, then newLow = 0. then newLength >= 0 assuming
     // length >= 0.

Missed commiting clang-fomrat in [#19903](#119039)

… shapes (#19484) This PR does two things 1. Allow all GEMM shapes to use padded TileAndFuse Matmul configuration. This is still behind the `iree-codegen-llvmgpu-test-tile-and-fuse-matmul=false` flag by default and does not change the default behavior. However following PRs that have landed in the past month make it possible to relax the guards we originally had on this. #19196 #19307 llvm/llvm-project#117340 2. Allow fused producers to use use padded TileAndFuse Matmul configuration. Following PRs make this possible now #19399 llvm/llvm-project#119039 Together this allows us to do padded IGEMM with intrinsics for shapes unaligned to intrinsic which we use by default. [Here](https://docs.google.com/spreadsheets/d/1O-SdUZCn5pHsxx7JTGjIIdH6PWCFnvlfe4XBbjEBaIM/edit?gid=0#gid=0) is the performance difference observed in conv cases in iree-kernel-benchmark-module that utilize this change. A median speedup of 2.26x was observed. The numeric changes I observed with enabling this path were the same between any aligned shape when comparing intrinsic vs no intrinsic use. Generally some differences are noticed for narrow types like f16 but they are within a relative error of 0.001 but since our tests use absolute errors we may have to change some test values to account for this change. The perf difference in CI seem to be within noise margin compared to main, https://github.com/iree-org/iree/actions/runs/12323399269/attempts/1#summary-34399247902 --------- Signed-off-by: Nirvedh <[email protected]>

[Tensor] Simplify tenor.pad tiling length calculations.

6d7024d

Signed-off-by: Nirvedh <[email protected]>

nirvedhmeshram requested review from hanhanW and nicolasvasilache as code owners December 6, 2024 23:03

nirvedhmeshram requested review from matthias-springer and removed request for hanhanW and nicolasvasilache December 6, 2024 23:03

llvmbot added mlir mlir:tensor labels Dec 6, 2024

nirvedhmeshram requested a review from Max191 December 6, 2024 23:03

nirvedhmeshram mentioned this pull request Dec 9, 2024

[DO NOT MERGE] test upstream affine simplification in CI iree-org/iree#19423

Closed

Max191 approved these changes Dec 12, 2024

View reviewed changes

mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp Show resolved Hide resolved

Address reviwer comments

473c00b

Signed-off-by: Nirvedh <[email protected]>

nirvedhmeshram force-pushed the simplify_pad_tiling_length branch from 1af0920 to 473c00b Compare December 12, 2024 17:01

Better comment

865958f

Signed-off-by: Nirvedh <[email protected]>

nirvedhmeshram merged commit 3f136f7 into llvm:main Dec 12, 2024
3 of 6 checks passed

nirvedhmeshram mentioned this pull request Dec 12, 2024

NfC fix comment in #119039 #119727

Merged

nirvedhmeshram added a commit that referenced this pull request Dec 12, 2024

NfC fix comment in #119039 (#119727)

7740010

Missed commiting clang-fomrat in [#19903](#119039)

nirvedhmeshram mentioned this pull request Dec 13, 2024

[GPU] Use padding in IGEMM pipeline to support unaligned to intrinsic shapes iree-org/iree#19484

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tensor] Simplify tenor.pad tiling length calculations. #119039

[Tensor] Simplify tenor.pad tiling length calculations. #119039

Uh oh!

nirvedhmeshram commented Dec 6, 2024 •

edited

Loading

Uh oh!

llvmbot commented Dec 6, 2024 •

edited

Loading

Uh oh!

Max191 left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 12, 2024

Uh oh!

Uh oh!

[Tensor] Simplify tenor.pad tiling length calculations. #119039

[Tensor] Simplify tenor.pad tiling length calculations. #119039

Uh oh!

Conversation

nirvedhmeshram commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Max191 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 12, 2024

Uh oh!

Uh oh!

nirvedhmeshram commented Dec 6, 2024 •

edited

Loading

llvmbot commented Dec 6, 2024 •

edited

Loading