Skip to content

Conversation

maerhart
Copy link
Member

Add a method to the BufferDeallocationOpInterface that allows operations to
implement the interface and provide custom logic to compute the ownership
indicators of values it defines. As a demonstrating example, this new method is
implemented by the arith.select operation.

Already reviewed in https://reviews.llvm.org/D158828

Depends on #66349

@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir mlir:bufferization Bufferization infrastructure mlir:arith mlir:cf labels Sep 14, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 14, 2023

@llvm/pr-subscribers-mlir-core
@llvm/pr-subscribers-mlir-cf
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-arith

@llvm/pr-subscribers-mlir-bufferization

Changes Add a method to the BufferDeallocationOpInterface that allows operations to implement the interface and provide custom logic to compute the ownership indicators of values it defines. As a demonstrating example, this new method is implemented by the `arith.select` operation.

Already reviewed in https://reviews.llvm.org/D158828

Depends on #66349

Patch is 213.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66350.diff

26 Files Affected:

  • (modified) mlir/docs/Bufferization.md (+604)
  • (added) mlir/include/mlir/Dialect/Arith/Transforms/BufferDeallocationOpInterfaceImpl.h (+22)
  • (added) mlir/include/mlir/Dialect/Bufferization/IR/BufferDeallocationOpInterface.h (+217)
  • (added) mlir/include/mlir/Dialect/Bufferization/IR/BufferDeallocationOpInterface.td (+73)
  • (modified) mlir/include/mlir/Dialect/Bufferization/IR/CMakeLists.txt (+1)
  • (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h (+9)
  • (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td (+144)
  • (added) mlir/include/mlir/Dialect/ControlFlow/Transforms/BufferDeallocationOpInterfaceImpl.h (+22)
  • (modified) mlir/include/mlir/InitAllDialects.h (+4)
  • (added) mlir/lib/Dialect/Arith/Transforms/BufferDeallocationOpInterfaceImpl.cpp (+85)
  • (modified) mlir/lib/Dialect/Arith/Transforms/CMakeLists.txt (+1)
  • (added) mlir/lib/Dialect/Bufferization/IR/BufferDeallocationOpInterface.cpp (+274)
  • (modified) mlir/lib/Dialect/Bufferization/IR/CMakeLists.txt (+1)
  • (modified) mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt (+1)
  • (added) mlir/lib/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation.cpp (+1030)
  • (added) mlir/lib/Dialect/ControlFlow/Transforms/BufferDeallocationOpInterfaceImpl.cpp (+163)
  • (modified) mlir/lib/Dialect/ControlFlow/Transforms/CMakeLists.txt (+2-1)
  • (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-branchop-interface.mlir (+589)
  • (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-callop-interface.mlir (+113)
  • (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-existing-deallocs.mlir (+43)
  • (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-function-boundaries.mlir (+131)
  • (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-memoryeffect-interface.mlir (+124)
  • (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-region-branchop-interface.mlir (+695)
  • (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-subviews.mlir (+21)
  • (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/invalid-buffer-deallocation.mlir (+93)
  • (modified) utils/bazel/llvm-project-overlay/mlir/BUILD.bazel (+4)
diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index f03d7bb877c9c74..f64e94758c8eb28 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -224,6 +224,9 @@ dialect conversion-based bufferization.
 
 ## Buffer Deallocation
 
+**Important: this pass is deprecated, please use the ownership based buffer**
+**deallocation pass instead**
+
 One-Shot Bufferize deallocates all buffers that it allocates. This is in
 contrast to the dialect conversion-based bufferization that delegates this job
 to the
@@ -300,6 +303,607 @@ One-Shot Bufferize can be configured to leak all memory and not generate any
 buffer deallocations with `create-deallocs=0`. This can be useful for
 compatibility with legacy code that has its own method of deallocating buffers.
 
+## Ownership-based Buffer Deallocation
+
+Recommended compilation pipeline:
+```
+one-shot-bufferize
+       |          it's recommended to perform all bufferization here at latest,
+       |       <- any allocations inserted after this point have to be handled
+       V          manually
+expand-realloc
+       V
+buffer-deallocation
+       V
+  canonicalize <- mostly for scf.if simplifications
+       V
+buffer-deallocation-simplification
+       V       <- from this point onwards no tensor values are allowed
+lower-deallocations
+       V
+      CSE
+       V
+  canonicalize
+```
+
+One-Shot Bufferize does not deallocate any buffers that it allocates. This job
+is delegated to the
+[`-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-buffer-deallocation-adds-all-required-dealloc-operations-for-all-allocations-in-the-input-program)
+pass, i.e., after running One-Shot Bufferize, the result IR may have a number of
+`memref.alloc` ops, but no `memref.dealloc` ops. This pass processes operations
+implementing `FunctionOpInterface` one-by-one without analysing the call-graph.
+This means, that there have to be [some rules](#function-boundary-api) on how
+MemRefs are handled when being passed from one function to another. The rest of
+the pass revolves heavily around the `bufferization.dealloc` operation which is
+inserted at the end of each basic block with appropriate operands and should be
+optimized using the Buffer Deallocation Simplification pass
+(`--buffer-deallocation-simplification`) and the regular canonicalizer
+(`--canonicalize`). Lowering the result of the `-buffer-deallocation` pass
+directly using `--convert-bufferization-to-memref` without beforehand
+optimization is not recommended as it will lead to very inefficient code (the
+runtime-cost of `bufferization.dealloc` is
+`O(|memrefs|^2+|memref|*|retained|)`).
+
+### Function boundary ABI
+
+The Buffer Deallocation pass operates on the level of operations implementing
+the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but
+also return them. To ensure compatibility among all functions (including
+external ones), some rules have to be enforced:
+*   When a MemRef is passed as a function argument, ownership is never acquired.
+    It is always the caller's responsibility to deallocate such MemRefs.
+*   Returning a MemRef from a function always passes ownership to the caller,
+    i.e., it is also the caller's responsibility to deallocate memrefs returned
+    from a called function.
+*   A function must not return a MemRef with the same allocated base buffer as
+    one of its arguments (in this case a copy has to be created). Note that in
+    this context two subviews of the same buffer that don't overlap are also
+    considered to alias.
+
+For external functions (e.g., library functions written externally in C), the
+externally provided implementation has to adhere to these rules and they are
+just assumed by the buffer deallocation pass. Functions on which the
+deallocation pass is applied and the implementation is accessible are modified
+by the pass such that the ABI is respected (i.e., buffer copies are inserted as
+necessary).
+
+### Inserting `bufferization.dealloc` operations
+
+`bufferization.dealloc` operations are unconditionally inserted at the end of
+each basic block (just before the terminator). The majority of the pass is about
+finding the correct operands for this operation. There are three variadic
+operand lists to be populated, the first contains all MemRef values that may
+need to be deallocated, the second list contains their associated ownership
+values (of `i1` type), and the third list contains MemRef values that are still
+needed at a later point and should thus not be deallocated. This operation
+allows us to deal with any kind of aliasing behavior: it lowers to runtime
+aliasing checks when not enough information can be collected statically. When
+enough aliasing information is statically available, operands or the entire op
+may fold away.
+
+**Ownerships**
+
+To do so, we use a concept of ownership indicators of memrefs which materialize
+as an `i1` value for any SSA value of `memref` type, indicating whether the
+basic block in which it was materialized has ownership of this MemRef. Ideally,
+this is a constant `true` or `false`, but might also be a non-constant SSA
+value. To keep track of those ownership values without immediately materializing
+them (which might require insertion of `bufferization.clone` operations or
+operations checking for aliasing at runtime at positions where we don't actually
+need a materialized value), we use the `Ownership` class. This class represents
+the ownership in three states forming a lattice on a partial order:
+```
+forall X in SSA values. uninitialized < unique(X) < unknown
+forall X, Y in SSA values.
+  unique(X) == unique(Y) iff X and Y always evaluate to the same value
+  unique(X) != unique(Y) otherwise
+```
+Intuitively, the states have the following meaning:
+*   Uninitialized: the ownership is not initialized yet, this is the default
+    state; once an operation is finished processing the ownership of all
+    operation results with MemRef type should not be uninitialized anymore.
+*   Unique: there is a specific SSA value that can be queried to check ownership
+    without materializing any additional IR
+*   Unknown: no specific SSA value is available without materializing additional
+    IR, typically this is because two ownerships in 'Unique' state would have to
+    be merged manually (e.g., the result of an `arith.select` either has the
+    ownership of the then or else case depending on the condition value,
+    inserting another `arith.select` for the ownership values can perform the
+    merge and provide a 'Unique' ownership for the result), however, in the
+    general case this 'Unknown' state has to be assigned.
+
+Implied by the above partial order, the pass combines two ownerships in the
+following way:
+
+| Ownership 1   | Ownership 2   | Combined Ownership |
+|:--------------|:--------------|:-------------------|
+| uninitialized | uninitialized | uninitialized      |
+| unique(X)     | uninitialized | unique(X)          |
+| unique(X)     | unique(X)     | unique(X)          |
+| unique(X)     | unique(Y)     | unknown            |
+| unknown       | unique        | unknown            |
+| unknown       | uninitialized | unknown            |
+| <td colspan=3> + symmetric cases                   |
+
+**Collecting the list of MemRefs that potentially need to be deallocated**
+
+For a given block, the list of MemRefs that potentially need to be deallocated
+at the end of that block is computed by keeping track of all values for which
+the block potentially takes over ownership. This includes MemRefs provided as
+basic block arguments, interface handlers for operations like `memref.alloc` and
+`func.call`, but also liveness information in regions with multiple basic
+blocks.  More concretely, it is computed by taking the MemRefs in the 'in' set
+of the liveness analysis of the current basic block B, appended by the MemRef
+block arguments and by the set of MemRefs allocated in B itself (determined by
+the interface handlers), then subtracted (also determined by the interface
+handlers) by the set of MemRefs deallocated in B.
+
+Note that we don't have to take the intersection of the liveness 'in' set with
+the 'out' set of the predecessor block because a value that is in the 'in' set
+must be defined in an ancestor block that dominates all direct predecessors and
+thus the 'in' set of this block is a subset of the 'out' sets of each
+predecessor.
+
+```
+memrefs = filter((liveIn(block) U
+  allocated(block) U arguments(block)) \ deallocated(block), isMemRef)
+```
+
+The list of conditions for the second variadic operands list of
+`bufferization.dealloc` is computed by querying the stored ownership value for
+each of the MemRefs collected as described above. The ownership state is updated
+by the interface handlers while processing the basic block.
+
+**Collecting the list of MemRefs to retain**
+
+Given a basic block B, the list of MemRefs that have to be retained can be
+different for each successor block S.  For the two basic blocks B and S and the
+values passed via block arguments to the destination block S, we compute the
+list of MemRefs that have to be retained in B by taking the MemRefs in the
+successor operand list of the terminator and the MemRefs in the 'out' set of the
+liveness analysis for B intersected with the 'in' set of the destination block
+S.
+
+This list of retained values makes sure that we cannot run into use-after-free
+situations even if no aliasing information is present at compile-time.
+
+```
+toRetain = filter(successorOperands + (liveOut(fromBlock) insersect
+  liveIn(toBlock)), isMemRef)
+```
+
+### Supported interfaces
+
+The pass uses liveness analysis and a few interfaces:
+*   `FunctionOpInterface`
+*   `CallOpInterface`
+*   `MemoryEffectOpInterface`
+*   `RegionBranchOpInterface`
+*   `RegionBranchTerminatorOpInterface`
+
+Due to insufficient information provided by the interface, it also special-cases
+on the `cf.cond_br` operation and makes some assumptions about operations
+implementing the `RegionBranchOpInterface` at the moment, but improving the
+interfaces would allow us to remove those dependencies in the future.
+
+### Limitations
+
+The Buffer Deallocation pass has some requirements and limitations on the input
+IR. These are checked in the beginning of the pass and errors are emitted
+accordingly:
+*   The set of interfaces the pass operates on must be implemented (correctly).
+    E.g., if there is an operation present with a nested region, but does not
+    implement the `RegionBranchOpInterface`, an error is emitted because the
+    pass cannot know the semantics of the nested region (and does not make any
+    default assumptions on it).
+*   No explicit control-flow loops are present. Currently, only loops using
+    structural-control-flow are supported.  However, this limitation could be
+    lifted in the future.
+*   Deallocation operations should not be present already. The pass should
+    handle them correctly already (at least in most cases), but it's not
+    supported yet due to insufficient testing.
+*   Terminators must implement either `RegionBranchTerminatorOpInterface` or
+    `BranchOpInterface`, but not both. Terminators with more than one successor
+    are not supported (except `cf.cond_br`). This is not a fundamental
+    limitation, but there is no use-case justifying the more complex
+    implementation at the moment.
+
+### Example
+
+The following example contains a few interesting cases:
+*   Basic block arguments are modified to also pass along the ownership
+    indicator, but not for entry bocks of non-private functions (assuming the
+    `private-function-dynamic-ownership` pass option is disabled) where the
+    function boundary ABI is applied instead. "Private" in this context refers
+    to functions that cannot be called externally.
+*   The result of `arith.select` initially has 'Unknown' assigned as ownership,
+    but once the `bufferization.dealloc` operation is inserted it is put in the
+    'retained' list (since it has uses in a later basic block) and thus the
+    'Unknown' ownership can be replaced with a 'Unique' ownership using the
+    corresponding result of the dealloc operation.
+*   The `cf.cond_br` operation has more than one successor and thus has to
+    insert two `bufferization.dealloc` operations (one for each successor).
+    While they have the same list of MemRefs to deallocate (because they perform
+    the deallocations for the same block), it must be taken into account that
+    some MemRefs remain *live* for one branch but not the other (thus set
+    intersection is performed on the *live-out* of the current block and the
+    *live-in* of the target block). Also, `cf.cond_br` supports separate
+    forwarding operands for each successor. To make sure that no MemRef is
+    deallocated twice (because there are two `bufferization.dealloc` operations
+    with the same MemRefs to deallocate), the condition operands are adjusted to
+    take the branch condition into account. While a generic lowering for such
+    terminator operations could be implemented, a specialized implementation can
+    take all the semantics of this particular operation into account and thus
+    generate a more efficient lowering.
+
+```mlir
+func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
+  %alloc = memref.alloc() : memref<?xi8>
+  %alloca = memref.alloca() : memref<?xi8>
+  %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
+  cf.cond_br %br_cond, ^bb1(%alloc : memref<?xi8>), ^bb1(%memref : memref<?xi8>)
+^bb1(%bbarg: memref<?xi8>):
+  test.copy(%bbarg, %select) : (memref<?xi8>, memref<?xi8>)
+  return
+}
+```
+
+After running `--buffer-deallocation`, it looks as follows:
+
+```mlir
+// Since this is not a private function, the signature will not be modified even
+// when private-function-dynamic-ownership is enabled. Instead the function
+// boundary ABI has to be applied which means that ownership of `%memref` will
+// never be acquired.
+func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
+  %false = arith.constant false
+  %true = arith.constant true
+
+  // The ownership of a MemRef defined by the `memref.alloc` operation is always
+  // assigned to be 'true'.
+  %alloc = memref.alloc() : memref<?xi8>
+
+  // The ownership of a MemRef defined by the `memref.alloca` operation is
+  // always assigned to be 'false'.
+  %alloca = memref.alloca() : memref<?xi8>
+
+  // The ownership of %select will be the join of the ownership of %alloc and
+  // the ownership of %alloca, i.e., of %true and %false. Because the pass does
+  // not know about the semantics of the `arith.select` operation (unless a
+  // custom handler is implemented), the ownership join will be 'Unknown'. If
+  // the materialized ownership indicator of %select is needed, either a clone
+  // has to be created for which %true is assigned as ownership or the result
+  // of a `bufferization.dealloc` where %select is in the retain list has to be
+  // used.
+  %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
+
+  // We use `memref.extract_strided_metadata` to get the base memref since it is
+  // not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is
+  // already enforced for `bufferization.dealloc`
+  %base_buffer_memref, ... = memref.extract_strided_metadata %memref
+    : memref<?xi8> -> memref<i8>, index, index, index
+  %base_buffer_alloc, ... = memref.extract_strided_metadata %alloc
+    : memref<?xi8> -> memref<i8>, index, index, index
+  %base_buffer_alloca, ... = memref.extract_strided_metadata %alloca
+    : memref<?xi8> -> memref<i8>, index, index, index
+
+  // The deallocation conditions need to be adjusted to incorporate the branch
+  // condition. In this example, this requires only a single negation, but might
+  // also require multiple arith.andi operations.
+  %not_br_cond = arith.xori %true, %br_cond : i1
+
+  // There are two dealloc operations inserted in this basic block, one per
+  // successor. Both have the same list of MemRefs to deallocate and the
+  // conditions only differ by the branch condition conjunct.
+  // Note, however, that the retained list differs. Here, both contain the
+  // %select value because it is used in both successors (since it's the same
+  // block), but the value passed via block argument differs (%memref vs.
+  // %alloc).
+  %10:2 = bufferization.dealloc
+           (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
+             : memref<i8>, memref<i8>, memref<i8>)
+        if (%false, %br_cond, %false)
+    retain (%alloc, %select : memref<?xi8>, memref<?xi8>)
+
+  %11:2 = bufferization.dealloc
+           (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
+             : memref<i8>, memref<i8>, memref<i8>)
+        if (%false, %not_br_cond, %false)
+    retain (%memref, %select : memref<?xi8>, memref<?xi8>)
+  
+  // Because %select is used in ^bb1 without pa...

@maerhart maerhart force-pushed the buffer_deallocation_interface_custom_ownership_update branch from 4d988d7 to 2d43cfe Compare September 14, 2023 12:01
…wnership update logic

Add a method to the BufferDeallocationOpInterface that allows operations to
implement the interface and provide custom logic to compute the ownership
indicators of values it defines. As a demonstrating example, this new method is
implemented by the `arith.select` operation.
@maerhart maerhart force-pushed the buffer_deallocation_interface_custom_ownership_update branch from 2d43cfe to 6c1b5d2 Compare September 14, 2023 12:03
@maerhart maerhart merged commit 942ce31 into llvm:main Sep 14, 2023
@maerhart maerhart deleted the buffer_deallocation_interface_custom_ownership_update branch September 14, 2023 12:34
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this pull request Sep 14, 2023
…wnership update logic (llvm#66350)

Add a method to the BufferDeallocationOpInterface that allows operations to implement the interface and provide custom logic to compute the ownership indicators of values it defines. As a demonstrating example, this new method is implemented by the `arith.select` operation.
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…wnership update logic (llvm#66350)

Add a method to the BufferDeallocationOpInterface that allows operations to implement the interface and provide custom logic to compute the ownership indicators of values it defines. As a demonstrating example, this new method is implemented by the `arith.select` operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:arith mlir:bufferization Bufferization infrastructure mlir:cf mlir:core MLIR Core Infrastructure mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants