[mlir] Optimize ThreadLocalCache by removing atomic bottleneck #93270

Mogball · 2024-05-24T03:36:57Z

The ThreadLocalCache implementation is used by the MLIRContext (among other things) to try to manage thread contention in the StorageUniquers. There is a bunch of fancy shared pointer/weak pointer setups that basically keeps everything alive across threads at the right time, but a huge bottleneck is the weak_ptr::lock call inside the ::get method.

This is because the lock method has to hit the atomic refcount several times, and this is bottlenecking performance across many threads. However, all this is doing is checking whether the storage is initialized. We know that it cannot be an expired weak pointer because the thread local cache object we're calling into owns the memory and is still alive for the method call to be valid. Thus, we can store and extra Value * inside the thread local cache for speedy retrieval if the cache is already initialized for the thread, which is the common case.

This also tightens the size of the critical section in the same method by scoping the mutex more to just the mutation on perInstanceState.

Before:

After:

The ThreadLocalCache implementation is used by the MLIRContext (among other things) to try to manage thread contention in the StorageUniquers. There is a bunch of fancy shared pointer/weak pointer setups that basically keeps everything alive across threads at the right time, but a huge bottleneck is the `weak_ptr::lock` call inside the `::get` method. This is because the `lock` method has to hit the atomic refcount several times, and this is bottlenecking performance across many threads. However, all this is doing is checking whether the storage is initialized. We know that it cannot be an expired weak pointer because the thread local cache object we're calling into owns the memory and is still alive for the method call to be valid. Thus, we can store and extra `Value *` inside the thread local cache for speedy retrieval if the cache is already initialized for the thread, which is the common case.

llvmbot · 2024-05-24T03:37:29Z

@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Author: Jeff Niu (Mogball)

Changes

The ThreadLocalCache implementation is used by the MLIRContext (among other things) to try to manage thread contention in the StorageUniquers. There is a bunch of fancy shared pointer/weak pointer setups that basically keeps everything alive across threads at the right time, but a huge bottleneck is the weak_ptr::lock call inside the ::get method.

This is because the lock method has to hit the atomic refcount several times, and this is bottlenecking performance across many threads. However, all this is doing is checking whether the storage is initialized. We know that it cannot be an expired weak pointer because the thread local cache object we're calling into owns the memory and is still alive for the method call to be valid. Thus, we can store and extra Value * inside the thread local cache for speedy retrieval if the cache is already initialized for the thread, which is the common case.

Before:

After:

Full diff: https://github.com/llvm/llvm-project/pull/93270.diff

1 Files Affected:

(modified) mlir/include/mlir/Support/ThreadLocalCache.h (+17-11)

diff --git a/mlir/include/mlir/Support/ThreadLocalCache.h b/mlir/include/mlir/Support/ThreadLocalCache.h
index 1be94ca14bcfa..d19257bf6e25e 100644
--- a/mlir/include/mlir/Support/ThreadLocalCache.h
+++ b/mlir/include/mlir/Support/ThreadLocalCache.h
@@ -58,11 +58,12 @@ class ThreadLocalCache {
   /// ValueT. We use a weak reference here so that the object can be destroyed
   /// without needing to lock access to the cache itself.
   struct CacheType
-      : public llvm::SmallDenseMap<PerInstanceState *, std::weak_ptr<ValueT>> {
+      : public llvm::SmallDenseMap<PerInstanceState *,
+                                   std::pair<std::weak_ptr<ValueT>, ValueT *>> {
     ~CacheType() {
       // Remove the values of this cache that haven't already expired.
       for (auto &it : *this)
-        if (std::shared_ptr<ValueT> value = it.second.lock())
+        if (std::shared_ptr<ValueT> value = it.second.first.lock())
           it.first->remove(value.get());
     }
 
@@ -71,7 +72,7 @@ class ThreadLocalCache {
     void clearExpiredEntries() {
       for (auto it = this->begin(), e = this->end(); it != e;) {
         auto curIt = it++;
-        if (curIt->second.expired())
+        if (curIt->second.first.expired())
           this->erase(curIt);
       }
     }
@@ -88,22 +89,27 @@ class ThreadLocalCache {
   ValueT &get() {
     // Check for an already existing instance for this thread.
     CacheType &staticCache = getStaticCache();
-    std::weak_ptr<ValueT> &threadInstance = staticCache[perInstanceState.get()];
-    if (std::shared_ptr<ValueT> value = threadInstance.lock())
+    std::pair<std::weak_ptr<ValueT>, ValueT *> &threadInstance =
+        staticCache[perInstanceState.get()];
+    if (ValueT *value = threadInstance.second)
       return *value;
 
     // Otherwise, create a new instance for this thread.
-    llvm::sys::SmartScopedLock<true> threadInstanceLock(
-        perInstanceState->instanceMutex);
-    perInstanceState->instances.push_back(std::make_unique<ValueT>());
-    ValueT *instance = perInstanceState->instances.back().get();
-    threadInstance = std::shared_ptr<ValueT>(perInstanceState, instance);
+    {
+      llvm::sys::SmartScopedLock<true> threadInstanceLock(
+          perInstanceState->instanceMutex);
+      threadInstance.second =
+          perInstanceState->instances.emplace_back(std::make_unique<ValueT>())
+              .get();
+    }
+    threadInstance.first =
+        std::shared_ptr<ValueT>(perInstanceState, threadInstance.second);
 
     // Before returning the new instance, take the chance to clear out any used
     // entries in the static map. The cache is only cleared within the same
     // thread to remove the need to lock the cache itself.
     staticCache.clearExpiredEntries();
-    return *instance;
+    return *threadInstance.second;
   }
   ValueT &operator*() { return get(); }
   ValueT *operator->() { return &get(); }

jpienaar

Nice :)

#93270)" This reverts commit 1b803fe.

…k" (#93306) Reverts #93270 This was found to have a race and the forward fix was reverted, reverting this until can forward fix.

Mogball requested review from jpienaar and joker-eph May 24, 2024 03:36

llvmbot added mlir:core MLIR Core Infrastructure mlir labels May 24, 2024

lattner approved these changes May 24, 2024

View reviewed changes

jpienaar approved these changes May 24, 2024

View reviewed changes

Mogball merged commit 1b803fe into llvm:main May 24, 2024
8 of 10 checks passed

jpienaar added a commit that referenced this pull request May 24, 2024

Revert "[mlir] Optimize ThreadLocalCache by removing atomic bottleneck (

50cb216

#93270)" This reverts commit 1b803fe.

jpienaar mentioned this pull request May 24, 2024

Revert "[mlir] Optimize ThreadLocalCache by removing atomic bottleneck" #93306

Merged

joker-eph pushed a commit that referenced this pull request May 24, 2024

Revert "[mlir] Optimize ThreadLocalCache by removing atomic bottlenec…

fab234a

…k" (#93306) Reverts #93270 This was found to have a race and the forward fix was reverted, reverting this until can forward fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir] Optimize ThreadLocalCache by removing atomic bottleneck #93270

[mlir] Optimize ThreadLocalCache by removing atomic bottleneck #93270

Uh oh!

Mogball commented May 24, 2024 •

edited

Loading

Uh oh!

llvmbot commented May 24, 2024 •

edited

Loading

Uh oh!

jpienaar left a comment

Uh oh!

Uh oh!

Uh oh!

[mlir] Optimize ThreadLocalCache by removing atomic bottleneck #93270

[mlir] Optimize ThreadLocalCache by removing atomic bottleneck #93270

Uh oh!

Conversation

Mogball commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpienaar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Mogball commented May 24, 2024 •

edited

Loading

llvmbot commented May 24, 2024 •

edited

Loading