Skip to content

[mlir][emitc] Lower SCF using memrefs #93371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aniragil
Copy link
Contributor

The SCF-to-EmitC lowering of yielded values currently generates
emitc.{variable,assign} ops implemeting the yield logic via memory. This
patch replaces these ops with memref.{alloc, load, store} ops in order
to separate the usage of memory from its concrete modeling in EmitC,
which is left for the memref-to-emitc conversion pass.
Since EmitC currently doesn't support lowering zero-rank memref types,
this patch uses 1D memrefs which translate to C arrays.

The SCF-to-EmitC lowering of yielded values currently generates
emitc.{variable,assign} ops implemeting the yield logic via memory. This
patch replaces these ops with memref.{alloc, load, store} ops in order
to separate the usage of memory from its concrete modeling in EmitC,
which is left for the memref-to-emitc conversion pass.
Since EmitC currently doesn't support lowering zero-rank memref types,
this patch uses 1D memrefs which translate to C arrays.
@llvmbot
Copy link
Member

llvmbot commented May 25, 2024

@llvm/pr-subscribers-mlir-emitc

@llvm/pr-subscribers-mlir

Author: Gil Rapaport (aniragil)

Changes

The SCF-to-EmitC lowering of yielded values currently generates
emitc.{variable,assign} ops implemeting the yield logic via memory. This
patch replaces these ops with memref.{alloc, load, store} ops in order
to separate the usage of memory from its concrete modeling in EmitC,
which is left for the memref-to-emitc conversion pass.
Since EmitC currently doesn't support lowering zero-rank memref types,
this patch uses 1D memrefs which translate to C arrays.


Patch is 31.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93371.diff

6 Files Affected:

  • (modified) mlir/include/mlir/Conversion/Passes.td (+1-1)
  • (modified) mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp (+40-12)
  • (modified) mlir/test/Conversion/SCFToEmitC/for.mlir (+38-26)
  • (modified) mlir/test/Conversion/SCFToEmitC/if.mlir (+13-8)
  • (modified) mlir/test/Target/Cpp/for.mlir (+117-87)
  • (modified) mlir/test/Target/Cpp/if.mlir (+73-41)
diff --git a/mlir/include/mlir/Conversion/Passes.td b/mlir/include/mlir/Conversion/Passes.td
index e6d678dc1b12b..18d7abe2d707c 100644
--- a/mlir/include/mlir/Conversion/Passes.td
+++ b/mlir/include/mlir/Conversion/Passes.td
@@ -976,7 +976,7 @@ def ConvertParallelLoopToGpu : Pass<"convert-parallel-loops-to-gpu"> {
 def SCFToEmitC : Pass<"convert-scf-to-emitc"> {
   let summary = "Convert SCF dialect to EmitC dialect, maintaining structured"
                 " control flow";
-  let dependentDialects = ["emitc::EmitCDialect"];
+  let dependentDialects = ["emitc::EmitCDialect", "memref::MemRefDialect"];
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp b/mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp
index 367142a520742..26fe8ba20c36f 100644
--- a/mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp
+++ b/mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp
@@ -14,11 +14,13 @@
 
 #include "mlir/Dialect/Arith/IR/Arith.h"
 #include "mlir/Dialect/EmitC/IR/EmitC.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
 #include "mlir/Dialect/SCF/IR/SCF.h"
 #include "mlir/IR/Builders.h"
 #include "mlir/IR/BuiltinOps.h"
 #include "mlir/IR/IRMapping.h"
 #include "mlir/IR/MLIRContext.h"
+#include "mlir/IR/OpDefinition.h"
 #include "mlir/IR/PatternMatch.h"
 #include "mlir/Transforms/DialectConversion.h"
 #include "mlir/Transforms/Passes.h"
@@ -63,21 +65,41 @@ static SmallVector<Value> createVariablesForResults(T op,
 
   for (OpResult result : op.getResults()) {
     Type resultType = result.getType();
-    emitc::OpaqueAttr noInit = emitc::OpaqueAttr::get(context, "");
-    emitc::VariableOp var =
-        rewriter.create<emitc::VariableOp>(loc, resultType, noInit);
+    SmallVector<OpFoldResult> dimensions = {rewriter.getIndexAttr(1)};
+    memref::AllocaOp var =
+        rewriter.create<memref::AllocaOp>(loc, dimensions, resultType);
     resultVariables.push_back(var);
   }
 
   return resultVariables;
 }
 
-// Create a series of assign ops assigning given values to given variables at
+// Create a series of load ops reading the values of given variables at
+// the current insertion point of given rewriter.
+static SmallVector<Value> readValues(SmallVector<Value> &variables,
+                                     PatternRewriter &rewriter, Location loc) {
+  Value zero;
+  if (variables.size() > 0)
+    zero = rewriter.create<arith::ConstantOp>(loc, rewriter.getIndexAttr(0));
+  SmallVector<Value> values;
+  SmallVector<Value> indices = {zero};
+  for (Value var : variables)
+    values.push_back(
+        rewriter.create<memref::LoadOp>(loc, var, indices).getResult());
+  return values;
+}
+
+// Create a series of store ops assigning given values to given variables at
 // the current insertion point of given rewriter.
 static void assignValues(ValueRange values, SmallVector<Value> &variables,
                          PatternRewriter &rewriter, Location loc) {
-  for (auto [value, var] : llvm::zip(values, variables))
-    rewriter.create<emitc::AssignOp>(loc, var, value);
+  Value zero;
+  if (variables.size() > 0)
+    zero = rewriter.create<arith::ConstantOp>(loc, rewriter.getIndexAttr(0));
+  for (auto [value, var] : llvm::zip(values, variables)) {
+    SmallVector<Value> indices = {zero};
+    rewriter.create<memref::StoreOp>(loc, value, var, indices);
+  }
 }
 
 static void lowerYield(SmallVector<Value> &resultVariables,
@@ -100,8 +122,6 @@ LogicalResult ForLowering::matchAndRewrite(ForOp forOp,
 
   // Create an emitc::variable op for each result. These variables will be
   // assigned to by emitc::assign ops within the loop body.
-  SmallVector<Value> resultVariables =
-      createVariablesForResults(forOp, rewriter);
   SmallVector<Value> iterArgsVariables =
       createVariablesForResults(forOp, rewriter);
 
@@ -115,18 +135,25 @@ LogicalResult ForLowering::matchAndRewrite(ForOp forOp,
   // Erase the auto-generated terminator for the lowered for op.
   rewriter.eraseOp(loweredBody->getTerminator());
 
+  IRRewriter::InsertPoint ip = rewriter.saveInsertionPoint();
+  rewriter.setInsertionPointToEnd(loweredBody);
+  SmallVector<Value> iterArgsValues =
+      readValues(iterArgsVariables, rewriter, loc);
+  rewriter.restoreInsertionPoint(ip);
+
   SmallVector<Value> replacingValues;
   replacingValues.push_back(loweredFor.getInductionVar());
-  replacingValues.append(iterArgsVariables.begin(), iterArgsVariables.end());
+  replacingValues.append(iterArgsValues.begin(), iterArgsValues.end());
 
   rewriter.mergeBlocks(forOp.getBody(), loweredBody, replacingValues);
   lowerYield(iterArgsVariables, rewriter,
              cast<scf::YieldOp>(loweredBody->getTerminator()));
 
   // Copy iterArgs into results after the for loop.
-  assignValues(iterArgsVariables, resultVariables, rewriter, loc);
+  SmallVector<Value> resultValues =
+      readValues(iterArgsVariables, rewriter, loc);
 
-  rewriter.replaceOp(forOp, resultVariables);
+  rewriter.replaceOp(forOp, resultValues);
   return success();
 }
 
@@ -169,6 +196,7 @@ LogicalResult IfLowering::matchAndRewrite(IfOp ifOp,
 
   auto loweredIf =
       rewriter.create<emitc::IfOp>(loc, ifOp.getCondition(), false, false);
+  SmallVector<Value> resultValues = readValues(resultVariables, rewriter, loc);
 
   Region &loweredThenRegion = loweredIf.getThenRegion();
   lowerRegion(thenRegion, loweredThenRegion);
@@ -178,7 +206,7 @@ LogicalResult IfLowering::matchAndRewrite(IfOp ifOp,
     lowerRegion(elseRegion, loweredElseRegion);
   }
 
-  rewriter.replaceOp(ifOp, resultVariables);
+  rewriter.replaceOp(ifOp, resultValues);
   return success();
 }
 
diff --git a/mlir/test/Conversion/SCFToEmitC/for.mlir b/mlir/test/Conversion/SCFToEmitC/for.mlir
index 7f90310af2189..04d3b75e97a65 100644
--- a/mlir/test/Conversion/SCFToEmitC/for.mlir
+++ b/mlir/test/Conversion/SCFToEmitC/for.mlir
@@ -47,20 +47,24 @@ func.func @for_yield(%arg0 : index, %arg1 : index, %arg2 : index) -> (f32, f32)
 // CHECK-SAME:      %[[VAL_0:.*]]: index, %[[VAL_1:.*]]: index, %[[VAL_2:.*]]: index) -> (f32, f32) {
 // CHECK-NEXT:    %[[VAL_3:.*]] = arith.constant 0.000000e+00 : f32
 // CHECK-NEXT:    %[[VAL_4:.*]] = arith.constant 1.000000e+00 : f32
-// CHECK-NEXT:    %[[VAL_5:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-// CHECK-NEXT:    %[[VAL_6:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-// CHECK-NEXT:    %[[VAL_7:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-// CHECK-NEXT:    %[[VAL_8:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-// CHECK-NEXT:    emitc.assign %[[VAL_3]] : f32 to %[[VAL_7]] : f32
-// CHECK-NEXT:    emitc.assign %[[VAL_4]] : f32 to %[[VAL_8]] : f32
-// CHECK-NEXT:    emitc.for %[[VAL_9:.*]] = %[[VAL_0]] to %[[VAL_1]] step %[[VAL_2]] {
-// CHECK-NEXT:      %[[VAL_10:.*]] = arith.addf %[[VAL_7]], %[[VAL_8]] : f32
-// CHECK-NEXT:      emitc.assign %[[VAL_10]] : f32 to %[[VAL_7]] : f32
-// CHECK-NEXT:      emitc.assign %[[VAL_10]] : f32 to %[[VAL_8]] : f32
+// CHECK-NEXT:    %[[VAL_5:.*]] = memref.alloca() : memref<1xf32>
+// CHECK-NEXT:    %[[VAL_6:.*]] = memref.alloca() : memref<1xf32>
+// CHECK-NEXT:    %[[VAL_7:.*]] = arith.constant 0 : index
+// CHECK-NEXT:    memref.store %[[VAL_3]], %[[VAL_5]]{{\[}}%[[VAL_7]]] : memref<1xf32>
+// CHECK-NEXT:    memref.store %[[VAL_4]], %[[VAL_6]]{{\[}}%[[VAL_7]]] : memref<1xf32>
+// CHECK-NEXT:    emitc.for %[[VAL_8:.*]] = %[[VAL_0]] to %[[VAL_1]] step %[[VAL_2]] {
+// CHECK-NEXT:      %[[VAL_9:.*]] = arith.constant 0 : index
+// CHECK-NEXT:      %[[VAL_10:.*]] = memref.load %[[VAL_5]]{{\[}}%[[VAL_9]]] : memref<1xf32>
+// CHECK-NEXT:      %[[VAL_11:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_9]]] : memref<1xf32>
+// CHECK-NEXT:      %[[VAL_12:.*]] = arith.addf %[[VAL_10]], %[[VAL_11]] : f32
+// CHECK-NEXT:      %[[VAL_13:.*]] = arith.constant 0 : index
+// CHECK-NEXT:      memref.store %[[VAL_12]], %[[VAL_5]]{{\[}}%[[VAL_13]]] : memref<1xf32>
+// CHECK-NEXT:      memref.store %[[VAL_12]], %[[VAL_6]]{{\[}}%[[VAL_13]]] : memref<1xf32>
 // CHECK-NEXT:    }
-// CHECK-NEXT:    emitc.assign %[[VAL_7]] : f32 to %[[VAL_5]] : f32
-// CHECK-NEXT:    emitc.assign %[[VAL_8]] : f32 to %[[VAL_6]] : f32
-// CHECK-NEXT:    return %[[VAL_5]], %[[VAL_6]] : f32, f32
+// CHECK-NEXT:    %[[VAL_14:.*]] = arith.constant 0 : index
+// CHECK-NEXT:    %[[VAL_15:.*]] = memref.load %[[VAL_5]]{{\[}}%[[VAL_14]]] : memref<1xf32>
+// CHECK-NEXT:    %[[VAL_16:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_14]]] : memref<1xf32>
+// CHECK-NEXT:    return %[[VAL_15]], %[[VAL_16]] : f32, f32
 // CHECK-NEXT:  }
 
 func.func @nested_for_yield(%arg0 : index, %arg1 : index, %arg2 : index) -> f32 {
@@ -77,20 +81,28 @@ func.func @nested_for_yield(%arg0 : index, %arg1 : index, %arg2 : index) -> f32
 // CHECK-LABEL: func.func @nested_for_yield(
 // CHECK-SAME:      %[[VAL_0:.*]]: index, %[[VAL_1:.*]]: index, %[[VAL_2:.*]]: index) -> f32 {
 // CHECK-NEXT:    %[[VAL_3:.*]] = arith.constant 1.000000e+00 : f32
-// CHECK-NEXT:    %[[VAL_4:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-// CHECK-NEXT:    %[[VAL_5:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-// CHECK-NEXT:    emitc.assign %[[VAL_3]] : f32 to %[[VAL_5]] : f32
+// CHECK-NEXT:    %[[VAL_4:.*]] = memref.alloca() : memref<1xf32>
+// CHECK-NEXT:    %[[VAL_5:.*]] = arith.constant 0 : index
+// CHECK-NEXT:    memref.store %[[VAL_3]], %[[VAL_4]]{{\[}}%[[VAL_5]]] : memref<1xf32>
 // CHECK-NEXT:    emitc.for %[[VAL_6:.*]] = %[[VAL_0]] to %[[VAL_1]] step %[[VAL_2]] {
-// CHECK-NEXT:      %[[VAL_7:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-// CHECK-NEXT:      %[[VAL_8:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-// CHECK-NEXT:      emitc.assign %[[VAL_5]] : f32 to %[[VAL_8]] : f32
-// CHECK-NEXT:      emitc.for %[[VAL_9:.*]] = %[[VAL_0]] to %[[VAL_1]] step %[[VAL_2]] {
-// CHECK-NEXT:        %[[VAL_10:.*]] = arith.addf %[[VAL_8]], %[[VAL_8]] : f32
-// CHECK-NEXT:        emitc.assign %[[VAL_10]] : f32 to %[[VAL_8]] : f32
+// CHECK-NEXT:      %[[VAL_7:.*]] = arith.constant 0 : index
+// CHECK-NEXT:      %[[VAL_8:.*]] = memref.load %[[VAL_4]]{{\[}}%[[VAL_7]]] : memref<1xf32>
+// CHECK-NEXT:      %[[VAL_9:.*]] = memref.alloca() : memref<1xf32>
+// CHECK-NEXT:      %[[VAL_10:.*]] = arith.constant 0 : index
+// CHECK-NEXT:      memref.store %[[VAL_8]], %[[VAL_9]]{{\[}}%[[VAL_10]]] : memref<1xf32>
+// CHECK-NEXT:      emitc.for %[[VAL_11:.*]] = %[[VAL_0]] to %[[VAL_1]] step %[[VAL_2]] {
+// CHECK-NEXT:        %[[VAL_12:.*]] = arith.constant 0 : index
+// CHECK-NEXT:        %[[VAL_13:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_12]]] : memref<1xf32>
+// CHECK-NEXT:        %[[VAL_14:.*]] = arith.addf %[[VAL_13]], %[[VAL_13]] : f32
+// CHECK-NEXT:        %[[VAL_15:.*]] = arith.constant 0 : index
+// CHECK-NEXT:        memref.store %[[VAL_14]], %[[VAL_9]]{{\[}}%[[VAL_15]]] : memref<1xf32>
 // CHECK-NEXT:      }
-// CHECK-NEXT:      emitc.assign %[[VAL_8]] : f32 to %[[VAL_7]] : f32
-// CHECK-NEXT:      emitc.assign %[[VAL_7]] : f32 to %[[VAL_5]] : f32
+// CHECK-NEXT:      %[[VAL_16:.*]] = arith.constant 0 : index
+// CHECK-NEXT:      %[[VAL_17:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_16]]] : memref<1xf32>
+// CHECK-NEXT:      %[[VAL_18:.*]] = arith.constant 0 : index
+// CHECK-NEXT:      memref.store %[[VAL_17]], %[[VAL_4]]{{\[}}%[[VAL_18]]] : memref<1xf32>
 // CHECK-NEXT:    }
-// CHECK-NEXT:    emitc.assign %[[VAL_5]] : f32 to %[[VAL_4]] : f32
-// CHECK-NEXT:    return %[[VAL_4]] : f32
+// CHECK-NEXT:    %[[VAL_19:.*]] = arith.constant 0 : index
+// CHECK-NEXT:    %[[VAL_20:.*]] = memref.load %[[VAL_4]]{{\[}}%[[VAL_19]]] : memref<1xf32>
+// CHECK-NEXT:    return %[[VAL_20]] : f32
 // CHECK-NEXT:  }
diff --git a/mlir/test/Conversion/SCFToEmitC/if.mlir b/mlir/test/Conversion/SCFToEmitC/if.mlir
index afc9abc761eb4..3d394f06f541f 100644
--- a/mlir/test/Conversion/SCFToEmitC/if.mlir
+++ b/mlir/test/Conversion/SCFToEmitC/if.mlir
@@ -53,18 +53,23 @@ func.func @test_if_yield(%arg0: i1, %arg1: f32) {
 // CHECK-SAME:                           %[[VAL_0:.*]]: i1,
 // CHECK-SAME:                           %[[VAL_1:.*]]: f32) {
 // CHECK-NEXT:    %[[VAL_2:.*]] = arith.constant 0 : i8
-// CHECK-NEXT:    %[[VAL_3:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> i32
-// CHECK-NEXT:    %[[VAL_4:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f64
+// CHECK-NEXT:    %[[VAL_3:.*]] = memref.alloca() : memref<1xi32>
+// CHECK-NEXT:    %[[VAL_4:.*]] = memref.alloca() : memref<1xf64>
 // CHECK-NEXT:    emitc.if %[[VAL_0]] {
 // CHECK-NEXT:      %[[VAL_5:.*]] = emitc.call_opaque "func_true_1"(%[[VAL_1]]) : (f32) -> i32
 // CHECK-NEXT:      %[[VAL_6:.*]] = emitc.call_opaque "func_true_2"(%[[VAL_1]]) : (f32) -> f64
-// CHECK-NEXT:      emitc.assign %[[VAL_5]] : i32 to %[[VAL_3]] : i32
-// CHECK-NEXT:      emitc.assign %[[VAL_6]] : f64 to %[[VAL_4]] : f64
+// CHECK-NEXT:      %[[VAL_7:.*]] = arith.constant 0 : index
+// CHECK-NEXT:      memref.store %[[VAL_5]], %[[VAL_3]]{{\[}}%[[VAL_7]]] : memref<1xi32>
+// CHECK-NEXT:      memref.store %[[VAL_6]], %[[VAL_4]]{{\[}}%[[VAL_7]]] : memref<1xf64>
 // CHECK-NEXT:    } else {
-// CHECK-NEXT:      %[[VAL_7:.*]] = emitc.call_opaque "func_false_1"(%[[VAL_1]]) : (f32) -> i32
-// CHECK-NEXT:      %[[VAL_8:.*]] = emitc.call_opaque "func_false_2"(%[[VAL_1]]) : (f32) -> f64
-// CHECK-NEXT:      emitc.assign %[[VAL_7]] : i32 to %[[VAL_3]] : i32
-// CHECK-NEXT:      emitc.assign %[[VAL_8]] : f64 to %[[VAL_4]] : f64
+// CHECK-NEXT:      %[[VAL_8:.*]] = emitc.call_opaque "func_false_1"(%[[VAL_1]]) : (f32) -> i32
+// CHECK-NEXT:      %[[VAL_9:.*]] = emitc.call_opaque "func_false_2"(%[[VAL_1]]) : (f32) -> f64
+// CHECK-NEXT:      %[[VAL_10:.*]] = arith.constant 0 : index
+// CHECK-NEXT:      memref.store %[[VAL_8]], %[[VAL_3]]{{\[}}%[[VAL_10]]] : memref<1xi32>
+// CHECK-NEXT:      memref.store %[[VAL_9]], %[[VAL_4]]{{\[}}%[[VAL_10]]] : memref<1xf64>
 // CHECK-NEXT:    }
+// CHECK-NEXT:    %[[VAL_11:.*]] = arith.constant 0 : index
+// CHECK-NEXT:    %[[VAL_12:.*]] = memref.load %[[VAL_3]]{{\[}}%[[VAL_11]]] : memref<1xi32>
+// CHECK-NEXT:    %[[VAL_13:.*]] = memref.load %[[VAL_4]]{{\[}}%[[VAL_11]]] : memref<1xf64>
 // CHECK-NEXT:    return
 // CHECK-NEXT:  }
diff --git a/mlir/test/Target/Cpp/for.mlir b/mlir/test/Target/Cpp/for.mlir
index 60988bcb46556..dc45853a0be04 100644
--- a/mlir/test/Target/Cpp/for.mlir
+++ b/mlir/test/Target/Cpp/for.mlir
@@ -32,85 +32,103 @@ func.func @test_for(%arg0 : index, %arg1 : index, %arg2 : index) {
 // CPP-DECLTOP-NEXT: }
 // CPP-DECLTOP-NEXT: return;
 
-func.func @test_for_yield() {
-  %start = "emitc.constant"() <{value = 0 : index}> : () -> index
-  %stop = "emitc.constant"() <{value = 10 : index}> : () -> index
-  %step = "emitc.constant"() <{value = 1 : index}> : () -> index
-
-  %s0 = "emitc.constant"() <{value = 0 : i32}> : () -> i32
-  %p0 = "emitc.constant"() <{value = 1.0 : f32}> : () -> f32
-
-  %0 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> i32
-  %1 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-  %2 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> i32
-  %3 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
-  emitc.assign %s0 : i32 to %2 : i32
-  emitc.assign %p0 : f32 to %3 : f32
-  emitc.for %iter = %start to %stop step %step {
-    %sn = emitc.call_opaque "add"(%2, %iter) : (i32, index) -> i32
-    %pn = emitc.call_opaque "mul"(%3, %iter) : (f32, index) -> f32
-    emitc.assign %sn : i32 to %2 : i32
-    emitc.assign %pn : f32 to %3 : f32
-    emitc.yield
+func.func @test_for_yield(%arg0: index, %arg1: index, %arg2: index) {
+  %0 = "emitc.constant"() <{value = 0.000000e+00 : f32}> : () -> f32
+  %1 = "emitc.constant"() <{value = 1.000000e+00 : f32}> : () -> f32
+  %2 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> !emitc.array<1xf32>
+  %3 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> !emitc.array<1xf32>
+  %4 = "emitc.constant"() <{value = 0 : index}> : () -> index
+  %5 = emitc.subscript %2[%4] : (!emitc.array<1xf32>, index) -> f32
+  emitc.assign %0 : f32 to %5 : f32
+  %6 = emitc.subscript %3[%4] : (!emitc.array<1xf32>, index) -> f32
+  emitc.assign %1 : f32 to %6 : f32
+  emitc.for %arg3 = %arg0 to %arg1 step %arg2 {
+    %12 = "emitc.constant"() <{value = 0 : index}> : () -> index
+    %13 = emitc.subscript %2[%12] : (!emitc.array<1xf32>, index) -> f32
+    %14 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
+    emitc.assign %13 : f32 to %14 : f32
+    %15 = emitc.subscript %3[%12] : (!emitc.array<1xf32>, index) -> f32
+    %16 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
+    emitc.assign %15 : f32 to %16 : f32
+    %17 = emitc.add %14, %16 : (f32, f32) -> f32
+    %18 = "emitc.constant"() <{value = 0 : index}> : () -> index
+    %19 = emitc.subscript %2[%18] : (!emitc.array<1xf32>, index) -> f32
+    emitc.assign %17 : f32 to %19 : f32
+    %20 = emitc.subscript %3[%18] : (!emitc.array<1xf32>, index) -> f32
+    emitc.assign %17 : f32 to %20 : f32
   }
-  emitc.assign %2 : i32 to %0 : i32
-  emitc.assign %3 : f32 to %1 : f32
-
+  %7 = "emitc.constant"() <{value = 0 : index}> : () -> index
+  %8 = emitc.subscript %2[%7] : (!emitc.array<1xf32>, index) -> f32
+  %9 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
+  emitc.assign %8 : f32 to %9 : f32
+  %10 = emitc.subscript %3[%7] : (!emitc.array<1xf32>, index) -> f32
+  %11 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> f32
+  emitc.assign %10 : f32 to %11 : f32
   return
 }
-// CPP-DEFAULT: void test_for_yield() {
-// CPP-DEFAULT-NEXT: size_t [[START:[^ ]*]] = 0;
-// CPP-DEFAULT-NEXT: size_t [[STOP:[^ ]*]] = 10;
-// CPP-DEFAULT-NEXT: size_t [[STEP:[^ ]*]] = 1;
-// CPP-DEFAULT-NEXT: int32_t [[S0:[^ ]*]] = 0;
-// CPP-DEFAULT-NEXT: float [[P0:[^ ]*]] = 1.000000000e+00f;
-// CPP-DEFAULT-NEXT: int32_t [[SE:[^ ]*]];
-// CPP-DEFAULT-NEXT: float [[PE:[^ ]*]];
-// CPP-DEFAULT-NEXT: int32_t [[SI:[^ ]*]];
-// CPP-DEFAULT-NEXT: float [[PI:[^ ]*]];
-// CPP-DEFAULT-NEXT: [[SI:[^ ]*]] = [[S0]];
-// CPP-DEFAULT-NEXT: [[PI:[^ ]*]] = [[P0]];
-// CPP-DEFAULT-NEXT: for (size_t [[ITER:[^ ]*]] = [[START]]; [[ITER]] < [[STOP]]; [[ITER]] += [[STEP]]) {
-// CPP-DEFAULT-NEXT: int32_t [[SN:[^ ]*]] = add([[SI]], [[ITER]]);
-// CPP-DEFAULT-NEXT: float [[PN:[^ ]*]] = mul([[PI]], [[ITER]]);
-// CPP-DEFAULT-NEXT: [[SI]] = [[SN]];
-// CPP-DEFAULT-NEXT: [[PI]] = [[PN]];
+// CPP-DEFAULT: void test_for_yield(size_t v1, size_t v2, size_t v3) {
+// CPP-DEFAULT-NEXT: float [[V4:[^ ]*]] = 0.0e+00f;
+// CPP-DEFAULT-NEXT: float [[V5:[^ ]*]] = 1.000000000e+00f;
+// CPP-DEFAULT-NEXT: float [[V6:[^ ]*]][1];
+// CPP-DEFAULT-NEXT: float [[V7:[^ ]*]][1];
+// CPP-DEFAULT-NEXT: size_t [[V8:[^ ]*]] = 0;
+// CPP-DEFAULT-NEXT: [[V6]][[[V8]]] = [[V4]];
+// CPP-DEFAULT-NEXT: [[V7]][[[V8]]] = [[V5]];
+// CPP-DEFAULT-NEXT: for (size_t [[V9:[^ ]*]] = [[V1]]; [[V9]] < [[V2]]; [[V9]] += [[V3]]) {
+// CPP-DEFAULT-NEXT:   size_t [[V10:[^ ]*]] = 0;
+// CPP-DEFAULT-NEXT:   float [[V11:[^ ]*]];
+// CPP-DEFAULT-NEXT:   [[V11]] = [[V6]][[[V10]]];
+// CPP-DEFAULT-NEXT:   float [[V12:[^ ]*]];
+// CPP-DEFAULT-NEXT:   [[V12]] = [[V7]][[[V10]]];
+// CPP-DEFAULT-NEXT:   float [[V13:[^ ]*]] = [[V11]] + [[V12]];
+// CPP-DEFAULT-NEXT:   size_t [[V14:[^ ]*]] = 0;
+// CPP-DEFAULT-NEXT:   [[V6]][[[V14]]] = [[V13]];
+// CPP-DEFAULT-NEXT:   [[V7]][[[V14]]] = [[V13]];
 // CPP-DEFAULT-NEXT: }
-// CPP-DEFAULT-NEXT: [[SE]] = [[SI]];
-// CPP-DEFAULT-NEXT: [[PE]] = [[PI]];
+// CPP-DEFAULT-NEXT: size_t [[V15:[^ ]*]] = 0;
+// CPP-DEFAULT-NEXT: float [[V16:[^ ]*]];
+// CPP-DEFAULT-NEXT: v16 = v6[v15];
+// CPP-DEFAULT-NEXT: float [[V17:[^ ]*]];
+// CPP-DEFAULT-NEXT: v17 = v7[v15];
 // CPP-DEFAULT-NEXT: return;
 
-// CPP-DECLTOP: void test_for_yield() {
-// CPP-DECLTOP-NEXT: size_t [[START:[^ ]*]];
-// CPP-DECLTOP-NEXT: size_t [[STOP:[^ ]*]];
-// CPP-DECLTOP-NEXT: size_t [[STEP:[^ ]*]];
-// CPP-DECLTOP-NEXT: int32_t [[S0:[^ ]*]];
-// CPP-DECLTOP-NEXT: float [[P0:[^ ]*]];
-// CPP-DECLTOP-NEXT: int32_t [[SE:[^ ]*]];
-// CPP-DECLTOP-NEXT: float [[PE:[^ ]*]];
-// CPP-DECLTOP-NEXT: int32_t [[SI:[^ ]*]];
-// CPP-DECLTOP-NEXT: float [[PI:[^ ]*]];
-// CPP-DECLTOP-NEXT: int32_t [[SN:[^ ]*]];
-// CPP-DECLTOP-NEXT: float [[PN:[^ ]*]];
-// CPP-DECLTOP-NEXT: [[START]] = 0;
-// CPP-DECLTOP-NEXT: [[STOP]] = 10;
-// CPP-D...
[truncated]

@mgehre-amd
Copy link
Contributor

I'm not so clear about the motivation of this change. From what I understand, SCF lowered correctly to emitc before.
The generated C++ code looks different in that we now use arrays with one element instead of scalars to model loop carried values. Why is that better?

@marbre
Copy link
Member

marbre commented May 27, 2024

I'm not so clear about the motivation of this change. From what I understand, SCF lowered correctly to emitc before. The generated C++ code looks different in that we now use arrays with one element instead of scalars to model loop carried values. Why is that better?

I was about to ask the same question. Are there any benefits?

@aniragil
Copy link
Contributor Author

I'm not so clear about the motivation of this change. From what I understand, SCF lowered correctly to emitc before.

Correct, this change is not functional. It's a refactoring trying to separate the concerns of (1) lowering value-carrying control-flow ops to emitc through memory and (2) lowering memory ops to emitc. In fact, we can do (1) within the scf dialect and have scf-to-emitc simply reject scf.{if, for} with return values.
The scf-to-emitc pass was written to generate emitc.variable ops directly since we didn't have memref-to-emitc lowering. Now that we do, I think it's better to have all memory modeling in emitc handled in a single pass and have other passes generate memref ops as needed.
Concrete motivation for this PR is to simplify @simon-camp's lvalue PR, which currently modifies scf-to-emitc even though its reg2mem logic remains the same. This PR replaces that code dependency with a pass order dependency, which seems cleaner.

The generated C++ code looks different in that we now use arrays with one element instead of scalars to model loop carried values. Why is that better?

This PR promotes scalars to 1-element arrays, which indeed leads to less natural C/C++ code. Getting back to natural C/C++ code can be achieved by either:
(a) Having scf-to-emitc generate rank-0 memrefs and memref-to-emitc lower them to emitc.variable ops, as suggested here.
(b) Having memref-to-emitc lower rank-1 memrefs to emitc.variable ops where possible. This seems less desired as such generic memory transformations (SROA) belong in the memref dialect (but that leads back to option (a)).

@mgehre-amd
Copy link
Contributor

I'm not so clear about the motivation of this change. From what I understand, SCF lowered correctly to emitc before.

Correct, this change is not functional. It's a refactoring trying to separate the concerns of (1) lowering value-carrying control-flow ops to emitc through memory and (2) lowering memory ops to emitc. In fact, we can do (1) within the scf dialect and have scf-to-emitc simply reject scf.{if, for} with return values. The scf-to-emitc pass was written to generate emitc.variable ops directly since we didn't have memref-to-emitc lowering. Now that we do, I think it's better to have all memory modeling in emitc handled in a single pass and have other passes generate memref ops as needed. Concrete motivation for this PR is to simplify @simon-camp's lvalue PR, which currently modifies scf-to-emitc even though its reg2mem logic remains the same. This PR replaces that code dependency with a pass order dependency, which seems cleaner.

The generated C++ code looks different in that we now use arrays with one element instead of scalars to model loop carried values. Why is that better?

This PR promotes scalars to 1-element arrays, which indeed leads to less natural C/C++ code. Getting back to natural C/C++ code can be achieved by either: (a) Having scf-to-emitc generate rank-0 memrefs and memref-to-emitc lower them to emitc.variable ops, as suggested here. (b) Having memref-to-emitc lower rank-1 memrefs to emitc.variable ops where possible. This seems less desired as such generic memory transformations (SROA) belong in the memref dialect (but that leads back to option (a)).

Ok, now I'm better seeing the whole picture. Let me try to summarize my current understanding:

  1. the current scf-to-emitc conversion needs to use memory semantics to transport values across scopes.
  2. it uses an ad-hoc lowering to reference (aka lvalue) semantics. Through that, it obtain natural C/C++ code.
  3. you would like to unify the handling of memory semantics into a single place
  4. single place here means single dialect
  5. the only dialect before emitc that has memory semantics seems to be memref. There is no other dialect before emitc that models pointer or reference semantics.
  6. that's why this PR turns the memory semantics required for scf into memref
  7. and later the memref-to-emitc turns that into arrays.

As alternative to 7), you have shown in #92684 how we could do mem2reg as part of the memref-to-emitc lowering to turn memrefs into scalars in emitc. That gives natural C/C++ code but fails to generally convert all rank 0 memrefs - because turning rank 0 memref function arguments into scalars looses their reference semantics.

Two things occur to me:
a) We could fix the function argument issue in #92684 by recognizing that the memref<f32> should not be turned into f32 (like mem2reg) but instead into lvalue<f32> (preserving reference semantics). This would essentially keep all lowering code there currently intact (because it turns memref.alloca memref<f32> into emitc.variable f32, which will have lvalue<f32> type). The key difference is that function arguments of type memref<f32> need to also become lvalue<f32>.

And that the emitter needs to understand that lvalue<f32> %a as function argument means emitting it as float& a (i.e. reference). I think this is the correct thing to do for lvalue function arguments in #91475, independent of whether we decide to lower rank 0 memrefs into lvalues. @simon-camp, what do you think?

b) Looking more closely at this PR, I don't see the lowering code benefit from unifying handling of memory semantics into the memref dialect. The code complexity is essentially the same, trading op names like emitc.variable for memref.alloca.
This PR brings a new memref.load op into the mix, but I expect #91475 to have a emitc.lvalue_to_rvalue at the same spot.

@simon-camp
Copy link
Contributor

Two things occur to me: a) We could fix the function argument issue in #92684 by recognizing that the memref<f32> should not be turned into f32 (like mem2reg) but instead into lvalue<f32> (preserving reference semantics). This would essentially keep all lowering code there currently intact (because it turns memref.alloca memref<f32> into emitc.variable f32, which will have lvalue<f32> type). The key difference is that function arguments of type memref<f32> need to also become lvalue<f32>.

And that the emitter needs to understand that lvalue<f32> %a as function argument means emitting it as float& a (i.e. reference). I think this is the correct thing to do for lvalue function arguments in #91475, independent of whether we decide to lower rank 0 memrefs into lvalues. @simon-camp, what do you think?

I'm not sure about the implications of that change, as function arguments are lvalues by design. (In the sense that you are allowed to take the address of a function argument). So I don't know if we need to distinguish lvalues and lvalue references.

b) Looking more closely at this PR, I don't see the lowering code benefit from unifying handling of memory semantics into the memref dialect. The code complexity is essentially the same, trading op names like emitc.variable for memref.alloca.
This PR brings a new memref.load op into the mix, but I expect #91475 to have a emitc.lvalue_to_rvalue at the same spot.

I would expect that the changes to lvalue variables look about equivalent to the changes here (after adopting the semantics to what we agreed on in #91475).

Side note: If I'm reading the code correctly this also unifies iter_args and results to one memref each instead of two. We could apply this change to the current lowering as well, right?

@aniragil
Copy link
Contributor Author

Ok, now I'm better seeing the whole picture. Let me try to summarize my current understanding ...

Right.

As alternative to 7), you have shown in #92684 how we could do mem2reg as part of the memref-to-emitc lowering to turn memrefs into scalars in emitc. That gives natural C/C++ code but fails to generally convert all rank 0 memrefs - because turning rank 0 memref function arguments into scalars looses their reference semantics.

Not exactly: #92684 aims to map rank-0 memrefs (mlir's representation for scalar variables) to emitc.variables that emitc treats as memory locations by allowing assignment to them, thus retaining memory semantics. The patch indeed doesn't take care of function arguments.
My mem2reg/SROA comment referred to the possibility of eliminating memory semantics where possible, replacing memrefs with SSA values. This can be done as a generic memref-to-memref transformation, but any memref surviving it would reach emitc.

a) We could fix the function argument issue in #92684 by recognizing that the memref should not be turned into f32 (like mem2reg) but instead into lvalue (preserving reference semantics). This would essentially keep all lowering code there currently intact (because it turns memref.alloca memref into emitc.variable f32, which will have lvalue type). The key difference is that function arguments of type memref need to also become lvalue.

And that the emitter needs to understand that lvalue %a as function argument means emitting it as float& a (i.e. reference). I think this is the correct thing to do for lvalue function arguments in #91475, independent of whether we decide to lower rank 0 memrefs into lvalues. @simon-camp, what do you think?

This sounds like correct lowering when emitting C++. For C, which doesn't support references, lowering could be to an !emitc.ptr. Users of such arguments would dereference them and callers would pass the address-of the argument (the same process C programmers do when they need to pass variables by reference).
I may be missing something here: what else could generate lvalue arguments other than rank-0 memrefs? (assuming other memrefs lower into !emitc.array)

I'm not sure about the implications of that change, as function arguments are lvalues by design. (In the sense that you are allowed to take the address of a function argument). So I don't know if we need to distinguish lvalues and lvalue references.

Not sure I follow. As any SSA value in mlir, func.func's arguments and return values are SSA values, not memory locations, e.g.

func.func @for_yield(%arg0 : index, %arg1 : index, %arg2 : index) -> (f32, f32) {
  %s0 = arith.constant 0.0 : f32
  %s1 = arith.constant 1.0 : f32
  %result:2 = scf.for %i0 = %arg0 to %arg1 step %arg2 iter_args(%si = %s0, %sj = %s1) -> (f32, f32) {
    %sn = arith.addf %si, %sj : f32
    scf.yield %sn, %sn : f32, f32
  }
  return %result#0, %result#1 : f32, f32
}

When lowered to C/C++, these arguments should behave as in-function SSA values in emitc, i.e. be lowered to (const) C arguments whose address cannot be taken. Am I missing something?

b) Looking more closely at this PR, I don't see the lowering code benefit from unifying handling of memory semantics into the memref dialect. The code complexity is essentially the same, trading op names like emitc.variable for memref.alloca.
This PR brings a new memref.load op into the mix, but I expect #91475 to have a emitc.lvalue_to_rvalue at the same spot.

As you state in (5), emitc's memory ops will usually be the result of lowering memref ops (the standard core dialect for modeling memory), with memref-to-emitc supporting some subset of the ops and types of the dialect. Specifically, it may support scalars (e.g. by lowering to emic.variable/!emitc.array) or expect some previous pass to promote them to rank-1 memrefs.
Whatever memref-to-emitc does, if scf-to-emitc models memory using emitc ops and types it will either
(a) duplicate work already done in memref-to-emitc, or
(b) model memory inconsistently with the rest of the emitc program.

If we're at (a), changes in emitc's memory modeling must now affect both memref-to-emitc and scf-to-emitc, or we risk falling to (b) unintentionally (#91475 is an example for such a change). Motivation for funneling all emitc memory lowering via the memref dialect is to avoid both (a) and (b) by in the spirit of mlir's gradual lowering philosophy.
Another way to achieve that might be to have common code in emitc to handle variable/array generation, loading, storing etc., to be used by memref-to-emitc, scf-to-emitc and potentially future/out-of-tree lowering passes which have memory aspects to model.

Side note: If I'm reading the code correctly this also unifies iter_args and results to one memref each instead of two. We could apply this change to the current lowering as well, right?

I think so, yes. It seems redundant to copy the variables again.

@aniragil
Copy link
Contributor Author

aniragil commented Jun 9, 2024

Side note: If I'm reading the code correctly this also unifies iter_args and results to one memref each instead of two. We could apply this change to the current lowering as well, right?

I think so, yes. It seems redundant to copy the variables again.

Up for review as #94898

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants