Skip to content

[mlir][Transforms] Fix build after #116524 (part 2) #121662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 4, 2025

Conversation

matthias-springer
Copy link
Member

Since #116524, an integration test started to become flaky (failure rate ~15%).

bin/mlir-opt mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir --sparsifier="enable-arm-sve=true enable-runtime-library=false vl=2 reassociate-fp-reductions=true enable-index-optimizations=true" | mlir-cpu-runner --march=aarch64 --mattr="+sve" -e main -entry-point-result=void -shared-libs=./lib/libmlir_runner_utils.so,./lib/libmlir_c_runner_utils.so | bin/FileCheck mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir
# executed command: bin/mlir-opt mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir '--sparsifier=enable-arm-sve=true enable-runtime-library=false vl=2 reassociate-fp-reductions=true enable-index-optimizations=true'
# .---command stderr------------
# | mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir:71:10: error: null operand found
# |     %0 = linalg.generic #trait_mul
# |          ^
# | mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir:71:10: note: see current operation: %70 = "arith.mulf"(<<NULL VALUE>>, %69) <{fastmath = #arith.fastmath<none>}> : (<<NULL TYPE>>, vector<[2]xf64>) -> vector<[2]xf64>
# `-----------------------------
# error: command failed with exit status: 1

I traced the issue back to the DenseMap<ValueVector, ValueVector, ValueVectorMapInfo> mapping; data structure: previously, some mapping.erase(foo) calls were unsuccessful (returning false), even though the mapping contains foo as a key.

@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir labels Jan 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 4, 2025

@llvm/pr-subscribers-mlir

Author: Matthias Springer (matthias-springer)

Changes

Since #116524, an integration test started to become flaky (failure rate ~15%).

bin/mlir-opt mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir --sparsifier="enable-arm-sve=true enable-runtime-library=false vl=2 reassociate-fp-reductions=true enable-index-optimizations=true" | mlir-cpu-runner --march=aarch64 --mattr="+sve" -e main -entry-point-result=void -shared-libs=./lib/libmlir_runner_utils.so,./lib/libmlir_c_runner_utils.so | bin/FileCheck mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir
# executed command: bin/mlir-opt mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir '--sparsifier=enable-arm-sve=true enable-runtime-library=false vl=2 reassociate-fp-reductions=true enable-index-optimizations=true'
# .---command stderr------------
# | mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir:71:10: error: null operand found
# |     %0 = linalg.generic #trait_mul
# |          ^
# | mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir:71:10: note: see current operation: %70 = "arith.mulf"(&lt;&lt;NULL VALUE&gt;&gt;, %69) &lt;{fastmath = #arith.fastmath&lt;none&gt;}&gt; : (&lt;&lt;NULL TYPE&gt;&gt;, vector&lt;[2]xf64&gt;) -&gt; vector&lt;[2]xf64&gt;
# `-----------------------------
# error: command failed with exit status: 1

I traced the issue back to the DenseMap&lt;ValueVector, ValueVector, ValueVectorMapInfo&gt; mapping; data structure: previously, some mapping.erase(foo) calls were unsuccessful (returning false), even though the mapping contains foo as a key.


Full diff: https://github.com/llvm/llvm-project/pull/121662.diff

1 Files Affected:

  • (modified) mlir/lib/Transforms/Utils/DialectConversion.cpp (+2-2)
diff --git a/mlir/lib/Transforms/Utils/DialectConversion.cpp b/mlir/lib/Transforms/Utils/DialectConversion.cpp
index 0e577d2d39de3d..48b8c727a78285 100644
--- a/mlir/lib/Transforms/Utils/DialectConversion.cpp
+++ b/mlir/lib/Transforms/Utils/DialectConversion.cpp
@@ -103,8 +103,8 @@ namespace {
 
 /// Helper class to make it possible to use `ValueVector` as a key in DenseMap.
 struct ValueVectorMapInfo {
-  static ValueVector getEmptyKey() { return ValueVector{}; }
-  static ValueVector getTombstoneKey() { return ValueVector{}; }
+  static ValueVector getEmptyKey() { return ValueVector{Value()}; }
+  static ValueVector getTombstoneKey() { return ValueVector{Value(), Value()}; }
   static ::llvm::hash_code getHashValue(const ValueVector &val) {
     return ::llvm::hash_combine_range(val.begin(), val.end());
   }

@llvmbot
Copy link
Member

llvmbot commented Jan 4, 2025

@llvm/pr-subscribers-mlir-core

Author: Matthias Springer (matthias-springer)

Changes

Since #116524, an integration test started to become flaky (failure rate ~15%).

bin/mlir-opt mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir --sparsifier="enable-arm-sve=true enable-runtime-library=false vl=2 reassociate-fp-reductions=true enable-index-optimizations=true" | mlir-cpu-runner --march=aarch64 --mattr="+sve" -e main -entry-point-result=void -shared-libs=./lib/libmlir_runner_utils.so,./lib/libmlir_c_runner_utils.so | bin/FileCheck mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir
# executed command: bin/mlir-opt mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir '--sparsifier=enable-arm-sve=true enable-runtime-library=false vl=2 reassociate-fp-reductions=true enable-index-optimizations=true'
# .---command stderr------------
# | mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir:71:10: error: null operand found
# |     %0 = linalg.generic #trait_mul
# |          ^
# | mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir:71:10: note: see current operation: %70 = "arith.mulf"(&lt;&lt;NULL VALUE&gt;&gt;, %69) &lt;{fastmath = #arith.fastmath&lt;none&gt;}&gt; : (&lt;&lt;NULL TYPE&gt;&gt;, vector&lt;[2]xf64&gt;) -&gt; vector&lt;[2]xf64&gt;
# `-----------------------------
# error: command failed with exit status: 1

I traced the issue back to the DenseMap&lt;ValueVector, ValueVector, ValueVectorMapInfo&gt; mapping; data structure: previously, some mapping.erase(foo) calls were unsuccessful (returning false), even though the mapping contains foo as a key.


Full diff: https://github.com/llvm/llvm-project/pull/121662.diff

1 Files Affected:

  • (modified) mlir/lib/Transforms/Utils/DialectConversion.cpp (+2-2)
diff --git a/mlir/lib/Transforms/Utils/DialectConversion.cpp b/mlir/lib/Transforms/Utils/DialectConversion.cpp
index 0e577d2d39de3d..48b8c727a78285 100644
--- a/mlir/lib/Transforms/Utils/DialectConversion.cpp
+++ b/mlir/lib/Transforms/Utils/DialectConversion.cpp
@@ -103,8 +103,8 @@ namespace {
 
 /// Helper class to make it possible to use `ValueVector` as a key in DenseMap.
 struct ValueVectorMapInfo {
-  static ValueVector getEmptyKey() { return ValueVector{}; }
-  static ValueVector getTombstoneKey() { return ValueVector{}; }
+  static ValueVector getEmptyKey() { return ValueVector{Value()}; }
+  static ValueVector getTombstoneKey() { return ValueVector{Value(), Value()}; }
   static ::llvm::hash_code getHashValue(const ValueVector &val) {
     return ::llvm::hash_combine_range(val.begin(), val.end());
   }

@matthias-springer matthias-springer merged commit afef716 into main Jan 4, 2025
9 of 11 checks passed
@matthias-springer matthias-springer deleted the users/matthias-springer/fix_build_1_4 branch January 4, 2025 20:29
@matthias-springer
Copy link
Member Author

I merged this PR immediately to fix the build.

But it is unclear to me why this fix was necessary. @zero9178, you commented about ValueVectorMapInfo in #116524 (comment). I found this fix by accident after many hours of debugging. Do you have any idea what could be going on here?

@matthias-springer
Copy link
Member Author

This is how I noticed that something was wrong:

  for (auto it : rewriterImpl.mapping.mapping) {
    const auto &s = it.second;
    const auto &f = it.first;
    llvm::errs() << "A : " << rewriterImpl.mapping.mapping.contains(s) << "\n";
    llvm::errs() << "B : " << rewriterImpl.mapping.mapping.contains(f) << "\n";
  }

Without this PR, the above code sometimes prints:

A : 0
B : 0

@zero9178
Copy link
Member

zero9178 commented Jan 4, 2025

I merged this PR immediately to fix the build.

But it is unclear to me why this fix was necessary. @zero9178, you commented about ValueVectorMapInfo in #116524 (comment). I found this fix by accident after many hours of debugging. Do you have any idea what could be going on here?

Mostly speculating, but based on the syndromes and the fix I think it is probably the following: Usually erase replaces a bucket item with the tombstone value to make sure that later lookups of an item in the map that require probing past the erased item still succeed:

BucketT *Bucket = BucketsPtr + BucketNo;
if (LLVM_LIKELY(KeyInfoT::isEqual(Val, Bucket->getFirst())))
return Bucket;
if (LLVM_LIKELY(KeyInfoT::isEqual(Bucket->getFirst(), EmptyKey)))
return nullptr;
// Otherwise, it's a hash collision or a tombstone, continue quadratic
// probing.
BucketNo += ProbeAmt++;
BucketNo &= NumBuckets - 1;

void erase(iterator I) {
BucketT *TheBucket = &*I;
TheBucket->getSecond().~ValueT();
TheBucket->getFirst() = getTombstoneKey();

Meaning that if:

  • We have two values in the hashmap that with the current bucket size lead to the same hash and therefore create a probing chain of "v1" followed by "v2" and
  • "v1" is erased, leading to it being replaced by a tombstone then
  • "v2" can no longer be found as the implementation believes the "v1" slot to be an empty slot instead. erase and find will fail.

DenseMapInfo<Value>::getEmptyKey() and DenseMapInfo<Value>::getTombstoneKey() return special values of Value that are never ever used ((void*)0x1 and (void*)0x2) hence safe to use.

@matthias-springer
Copy link
Member Author

matthias-springer commented Jan 4, 2025

That makes sense! So the lesson learned here is: getEmptyKey and getTombstoneKey should never return the same thing. I finally understood why we have these two functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:core MLIR Core Infrastructure mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants