[MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py #113713

durga4github · 2024-10-25T17:14:03Z

This patch updates the syntax for nvgpu_arrive Op
in matmulBuilder.py. This fixes the compilation
error for this test.

For the warp-specialized matmul_kernel implementation,
removing the WaitGroupSyncOp (after the mma-main-loop)
fixes the hang observed.

With these two fixes, the test compiles and
executes successfully on an sm90a machine.

This patch updates the syntax for nvgpu_arrive Op in matmulBuilder.py. This fixes the compilation error for this test. For the warp-specialized matmul_kernel implementation, removing the WaitGroupSyncOp (after the mma-main-loop) fixes the hang observed. With these two fixes, the test compiles and executes successfully on an sm90a machine. Signed-off-by: Durgadoss R <[email protected]>

llvmbot · 2024-10-25T17:14:39Z

@llvm/pr-subscribers-mlir

Author: Durgadoss R (durga4github)

Changes

This patch updates the syntax for nvgpu_arrive Op
in matmulBuilder.py. This fixes the compilation
error for this test.

For the warp-specialized matmul_kernel implementation,
removing the WaitGroupSyncOp (after the mma-main-loop)
fixes the hang observed.

With these two fixes, the test compiles and
executes successfully on an sm90a machine.

Full diff: https://github.com/llvm/llvm-project/pull/113713.diff

1 Files Affected:

(modified) mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py (+2-9)

diff --git a/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py b/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
index 75f0dc947e0681..5394d4a3272555 100644
--- a/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
+++ b/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
@@ -568,9 +568,7 @@ def generate_matmul_ws(
                                 barId,
                                 predicate=consumerPrimaryThread,
                             )
-                            nvgpu.mbarrier_arrive(
-                                ir.Type.parse("!nvgpu.mbarrier.token"), mbarDONE, barId
-                            )
+                            nvgpu.mbarrier_arrive(mbarDONE, barId)
                             debug_print(
                                 "[cons] iv={}  | mbarDONE[{}] arrive [done]",
                                 iv,
@@ -589,14 +587,9 @@ def generate_matmul_ws(
                         # Step 6.3.5. Yield
                         scf.yield_([new_acc, phaseParity])
 
-                    # Step 6.3. Wait All WGMMA
-                    nvvm.WgmmaWaitGroupSyncOp(0)
-
                     with ir.InsertionPoint(scf.IfOp(consumerPrimaryThread).then_block):
                         barId = c((K // BLOCK_K) % num_stages)
-                        nvgpu.mbarrier_arrive(
-                            ir.Type.parse("!nvgpu.mbarrier.token"), mbarDONE, barId
-                        )
+                        nvgpu.mbarrier_arrive(mbarDONE, barId)
                         scf.yield_([])
 
                     # Step 6.4. Epilogue (registers --> shared memory)

llvmbot · 2024-10-25T17:14:40Z

@llvm/pr-subscribers-mlir-gpu

Author: Durgadoss R (durga4github)

Changes

This patch updates the syntax for nvgpu_arrive Op
in matmulBuilder.py. This fixes the compilation
error for this test.

For the warp-specialized matmul_kernel implementation,
removing the WaitGroupSyncOp (after the mma-main-loop)
fixes the hang observed.

With these two fixes, the test compiles and
executes successfully on an sm90a machine.

Full diff: https://github.com/llvm/llvm-project/pull/113713.diff

1 Files Affected:

(modified) mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py (+2-9)

diff --git a/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py b/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
index 75f0dc947e0681..5394d4a3272555 100644
--- a/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
+++ b/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
@@ -568,9 +568,7 @@ def generate_matmul_ws(
                                 barId,
                                 predicate=consumerPrimaryThread,
                             )
-                            nvgpu.mbarrier_arrive(
-                                ir.Type.parse("!nvgpu.mbarrier.token"), mbarDONE, barId
-                            )
+                            nvgpu.mbarrier_arrive(mbarDONE, barId)
                             debug_print(
                                 "[cons] iv={}  | mbarDONE[{}] arrive [done]",
                                 iv,
@@ -589,14 +587,9 @@ def generate_matmul_ws(
                         # Step 6.3.5. Yield
                         scf.yield_([new_acc, phaseParity])
 
-                    # Step 6.3. Wait All WGMMA
-                    nvvm.WgmmaWaitGroupSyncOp(0)
-
                     with ir.InsertionPoint(scf.IfOp(consumerPrimaryThread).then_block):
                         barId = c((K // BLOCK_K) % num_stages)
-                        nvgpu.mbarrier_arrive(
-                            ir.Type.parse("!nvgpu.mbarrier.token"), mbarDONE, barId
-                        )
+                        nvgpu.mbarrier_arrive(mbarDONE, barId)
                         scf.yield_([])
 
                     # Step 6.4. Epilogue (registers --> shared memory)

durga4github · 2024-10-25T17:19:42Z

@grypp, Please help with a review

grypp · 2024-10-25T19:42:17Z

Can change the prefix of the title to [MLIR]

durga4github · 2024-10-26T05:44:45Z

Can change the prefix of the title to [MLIR]

Sure, updated. Builds are clean, submitting this.

This patch updates the syntax for nvgpu_arrive Op in matmulBuilder.py. This fixes the compilation error for this test. For the warp-specialized matmul_kernel implementation, removing the WaitGroupSyncOp (after the mma-main-loop) fixes the hang observed. With these two fixes, the test compiles and executes successfully on an sm90a machine. Signed-off-by: Durgadoss R <[email protected]>

durga4github requested a review from grypp as a code owner October 25, 2024 17:14

llvmbot added mlir:gpu mlir labels Oct 25, 2024

grypp approved these changes Oct 25, 2024

View reviewed changes

durga4github changed the title ~~[Tests-only][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py~~ [MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py Oct 26, 2024

durga4github merged commit 13d6233 into llvm:main Oct 26, 2024
11 checks passed

durga4github deleted the durgadossr/nvgpu_test_fix3 branch October 26, 2024 05:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py #113713

[MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py #113713

Uh oh!

durga4github commented Oct 25, 2024

Uh oh!

llvmbot commented Oct 25, 2024

Uh oh!

llvmbot commented Oct 25, 2024

Uh oh!

durga4github commented Oct 25, 2024

Uh oh!

grypp commented Oct 25, 2024

Uh oh!

durga4github commented Oct 26, 2024

Uh oh!

Uh oh!

Uh oh!

[MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py #113713

[MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py #113713

Uh oh!

Conversation

durga4github commented Oct 25, 2024

Uh oh!

llvmbot commented Oct 25, 2024

Uh oh!

llvmbot commented Oct 25, 2024

Uh oh!

durga4github commented Oct 25, 2024

Uh oh!

grypp commented Oct 25, 2024

Uh oh!

durga4github commented Oct 26, 2024

Uh oh!

Uh oh!

Uh oh!