Skip to content

Convert the StackFrameList mutex to a shared mutex. #117252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 12, 2024

Conversation

jimingham
Copy link
Collaborator

@jimingham jimingham commented Nov 21, 2024

In fact, there's only one public API in StackFrameList that changes
the list explicitly. The rest only change the list if you happen to
ask for more frames than lldb has currently fetched and that
always adds frames "behind the user's back". So we were
much more prone to deadlocking than we needed to be.

This patch uses a shared_mutex instead, and when we have to add more
frames (in GetFramesUpTo) we switches to exclusive long enough to add
the frames, then goes back to shared.

Most of the work here was actually getting the stack frame list locking to not
require a recursive mutex (shared mutexes aren't recursive).

I also added a test that has 5 threads progressively asking for more
frames simultaneously to make sure we get back valid frames and don't
deadlock.

In fact, there's only one public API in StackFrameList that changes
the list explicitly.  The rest only change the list if you happen to
ask for more frames than lldb has currently fetched.  So we were
much more prone to deadlocking than we needed to be.

This patch uses a shared_mutex instead, and when we have to add more
frames (in GetFramesUpTo) we switches to exclusive long enough to add
the frames, then goes back to shared.

I also added a test that has 5 threads progressively asking for more
frames simultaneously to make sure we get back valid frames and don't
deadlock.
@llvmbot llvmbot added the lldb label Nov 21, 2024
@jimingham jimingham requested a review from labath November 21, 2024 22:39
@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-lldb

Author: None (jimingham)

Changes

In fact, there's only one public API in StackFrameList that changes
the list explicitly. The rest only change the list if you happen to
ask for more frames than lldb has currently fetched. So we were
much more prone to deadlocking than we needed to be.

This patch uses a shared_mutex instead, and when we have to add more
frames (in GetFramesUpTo) we switches to exclusive long enough to add
the frames, then goes back to shared.

Most of the work here was actually getting the stack frame list locking to not
require a recursive mutex (shared mutexes aren't recursive).

I also added a test that has 5 threads progressively asking for more
frames simultaneously to make sure we get back valid frames and don't
deadlock.


Patch is 35.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117252.diff

5 Files Affected:

  • (modified) lldb/include/lldb/Target/StackFrameList.h (+28-7)
  • (modified) lldb/source/Target/StackFrameList.cpp (+258-206)
  • (modified) lldb/test/API/api/multithreaded/TestMultithreaded.py (+12-3)
  • (added) lldb/test/API/api/multithreaded/deep_stack.cpp (+17)
  • (added) lldb/test/API/api/multithreaded/test_concurrent_unwind.cpp.template (+91)
diff --git a/lldb/include/lldb/Target/StackFrameList.h b/lldb/include/lldb/Target/StackFrameList.h
index 7d0e7a5b9a71b2..8f6f6500f1ef96 100644
--- a/lldb/include/lldb/Target/StackFrameList.h
+++ b/lldb/include/lldb/Target/StackFrameList.h
@@ -11,6 +11,7 @@
 
 #include <memory>
 #include <mutex>
+#include <shared_mutex>
 #include <vector>
 
 #include "lldb/Target/StackFrame.h"
@@ -103,11 +104,19 @@ class StackFrameList {
   /// Realizes frames up to (and including) end_idx (which can be greater than  
   /// the actual number of frames.)  
   /// Returns true if the function was interrupted, false otherwise.
+  /// Must be called with a shared_locked mutex.  This might unlock the
+  /// shared side if it needs to fetch more frames, but will reacquire the 
+  /// shared lock on the way out.
   bool GetFramesUpTo(uint32_t end_idx, 
-      InterruptionControl allow_interrupt = AllowInterruption);
+      InterruptionControl allow_interrupt,
+      std::shared_lock<std::shared_mutex> &guard);
 
-  void GetOnlyConcreteFramesUpTo(uint32_t end_idx, Unwind &unwinder);
+  /// Does not hold the StackFrameList mutex.  Same comment as GetFramesUpTo.
+  void GetOnlyConcreteFramesUpTo(uint32_t end_idx, Unwind &unwinder,
+      std::shared_lock<std::shared_mutex> &guard);
 
+  // This gets called without the StackFrameList lock held, callers should 
+  // hold the lock.
   void SynthesizeTailCallFrames(StackFrame &next_frame);
 
   bool GetAllFramesFetched() { return m_concrete_frames_fetched == UINT32_MAX; }
@@ -122,6 +131,9 @@ class StackFrameList {
 
   void SetCurrentInlinedDepth(uint32_t new_depth);
 
+  /// Calls into the stack frame recognizers and stop info to set the most
+  /// relevant frame.  This can call out to arbitrary user code so it can't
+  /// hold the StackFrameList mutex.
   void SelectMostRelevantFrame();
 
   typedef std::vector<lldb::StackFrameSP> collection;
@@ -138,11 +150,16 @@ class StackFrameList {
   // source of information.
   lldb::StackFrameListSP m_prev_frames_sp;
 
-  /// A mutex for this frame list.
-  // TODO: This mutex may not always be held when required. In particular, uses
-  // of the StackFrameList APIs in lldb_private::Thread look suspect. Consider
-  // passing around a lock_guard reference to enforce proper locking.
-  mutable std::recursive_mutex m_mutex;
+  /// A mutex for this frame list.  The only public API that requires the
+  /// unique lock is Clear.  All other clients take the shared lock, though
+  /// if we need more frames we may swap shared for unique to fulfill that
+  /// requirement.
+  mutable std::shared_mutex m_list_mutex;
+  
+  // Setting the inlined depth should be protected against other attempts to
+  // change it, but since it doesn't mutate the list itself, we can limit the
+  // critical regions it produces by having a separate mutex.
+  mutable std::mutex m_inlined_depth_mutex;
 
   /// A cache of frames. This may need to be updated when the program counter
   /// changes.
@@ -171,6 +188,10 @@ class StackFrameList {
   const bool m_show_inlined_frames;
 
 private:
+  uint32_t SetSelectedFrameNoLock(lldb_private::StackFrame *frame);
+  lldb::StackFrameSP GetFrameAtIndexNoLock(uint32_t idx,
+      std::shared_lock<std::shared_mutex> &guard);
+
   StackFrameList(const StackFrameList &) = delete;
   const StackFrameList &operator=(const StackFrameList &) = delete;
 };
diff --git a/lldb/source/Target/StackFrameList.cpp b/lldb/source/Target/StackFrameList.cpp
index 94a381edd5e202..0dc4fa7797c58b 100644
--- a/lldb/source/Target/StackFrameList.cpp
+++ b/lldb/source/Target/StackFrameList.cpp
@@ -25,6 +25,7 @@
 #include "lldb/Target/Unwind.h"
 #include "lldb/Utility/LLDBLog.h"
 #include "lldb/Utility/Log.h"
+#include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SmallPtrSet.h"
 
 #include <memory>
@@ -38,7 +39,7 @@ using namespace lldb_private;
 StackFrameList::StackFrameList(Thread &thread,
                                const lldb::StackFrameListSP &prev_frames_sp,
                                bool show_inline_frames)
-    : m_thread(thread), m_prev_frames_sp(prev_frames_sp), m_mutex(), m_frames(),
+    : m_thread(thread), m_prev_frames_sp(prev_frames_sp), m_frames(),
       m_selected_frame_idx(), m_concrete_frames_fetched(0),
       m_current_inlined_depth(UINT32_MAX),
       m_current_inlined_pc(LLDB_INVALID_ADDRESS),
@@ -63,6 +64,7 @@ void StackFrameList::CalculateCurrentInlinedDepth() {
 }
 
 uint32_t StackFrameList::GetCurrentInlinedDepth() {
+  std::lock_guard<std::mutex> guard(m_inlined_depth_mutex);
   if (m_show_inlined_frames && m_current_inlined_pc != LLDB_INVALID_ADDRESS) {
     lldb::addr_t cur_pc = m_thread.GetRegisterContext()->GetPC();
     if (cur_pc != m_current_inlined_pc) {
@@ -84,11 +86,6 @@ void StackFrameList::ResetCurrentInlinedDepth() {
   if (!m_show_inlined_frames)
     return;
 
-  std::lock_guard<std::recursive_mutex> guard(m_mutex);
-
-  m_current_inlined_pc = LLDB_INVALID_ADDRESS;
-  m_current_inlined_depth = UINT32_MAX;
-
   StopInfoSP stop_info_sp = m_thread.GetStopInfo();
   if (!stop_info_sp)
     return;
@@ -98,6 +95,7 @@ void StackFrameList::ResetCurrentInlinedDepth() {
   // We're only adjusting the inlined stack here.
   Log *log = GetLog(LLDBLog::Step);
   if (inline_depth) {
+    std::lock_guard<std::mutex> guard(m_inlined_depth_mutex);
     m_current_inlined_depth = *inline_depth;
     m_current_inlined_pc = m_thread.GetRegisterContext()->GetPC();
 
@@ -107,6 +105,9 @@ void StackFrameList::ResetCurrentInlinedDepth() {
                 "depth: %d 0x%" PRIx64 ".\n",
                 m_current_inlined_depth, m_current_inlined_pc);
   } else {
+    std::lock_guard<std::mutex> guard(m_inlined_depth_mutex);
+    m_current_inlined_pc = LLDB_INVALID_ADDRESS;
+    m_current_inlined_depth = UINT32_MAX;
     if (log && log->GetVerbose())
       LLDB_LOGF(
           log,
@@ -119,6 +120,7 @@ bool StackFrameList::DecrementCurrentInlinedDepth() {
     uint32_t current_inlined_depth = GetCurrentInlinedDepth();
     if (current_inlined_depth != UINT32_MAX) {
       if (current_inlined_depth > 0) {
+        std::lock_guard<std::mutex> guard(m_inlined_depth_mutex);
         m_current_inlined_depth--;
         return true;
       }
@@ -128,6 +130,7 @@ bool StackFrameList::DecrementCurrentInlinedDepth() {
 }
 
 void StackFrameList::SetCurrentInlinedDepth(uint32_t new_depth) {
+  std::lock_guard<std::mutex> guard(m_inlined_depth_mutex);
   m_current_inlined_depth = new_depth;
   if (new_depth == UINT32_MAX)
     m_current_inlined_pc = LLDB_INVALID_ADDRESS;
@@ -136,22 +139,33 @@ void StackFrameList::SetCurrentInlinedDepth(uint32_t new_depth) {
 }
 
 void StackFrameList::GetOnlyConcreteFramesUpTo(uint32_t end_idx,
-                                               Unwind &unwinder) {
+                                               Unwind &unwinder,
+      std::shared_lock<std::shared_mutex> &guard) {
   assert(m_thread.IsValid() && "Expected valid thread");
   assert(m_frames.size() <= end_idx && "Expected there to be frames to fill");
 
   if (end_idx < m_concrete_frames_fetched)
     return;
+  { // Scope for swapping reader and writer locks
+    m_list_mutex.lock();
+    auto on_exit = llvm::make_scope_exit(
+      [&]() { 
+        m_list_mutex.unlock();
+        guard.lock();
+      });
+    if (end_idx < m_concrete_frames_fetched)
+      return;
 
-  uint32_t num_frames = unwinder.GetFramesUpTo(end_idx);
-  if (num_frames <= end_idx + 1) {
-    // Done unwinding.
-    m_concrete_frames_fetched = UINT32_MAX;
-  }
+    uint32_t num_frames = unwinder.GetFramesUpTo(end_idx);
+    if (num_frames <= end_idx + 1) {
+      // Done unwinding.
+      m_concrete_frames_fetched = UINT32_MAX;
+    }
 
-  // Don't create the frames eagerly. Defer this work to GetFrameAtIndex,
-  // which can lazily query the unwinder to create frames.
-  m_frames.resize(num_frames);
+    // Don't create the frames eagerly. Defer this work to GetFrameAtIndex,
+    // which can lazily query the unwinder to create frames.
+    m_frames.resize(num_frames);
+  }
 }
 
 /// A sequence of calls that comprise some portion of a backtrace. Each frame
@@ -167,6 +181,8 @@ using CallSequence = std::vector<CallDescriptor>;
 /// Find the unique path through the call graph from \p begin (with return PC
 /// \p return_pc) to \p end. On success this path is stored into \p path, and
 /// on failure \p path is unchanged.
+/// This function doesn't currently access StackFrameLists at all, it only looks 
+/// at the frame set in the ExecutionContext it passes around.
 static void FindInterveningFrames(Function &begin, Function &end,
                                   ExecutionContext &exe_ctx, Target &target,
                                   addr_t return_pc, CallSequence &path,
@@ -349,7 +365,8 @@ void StackFrameList::SynthesizeTailCallFrames(StackFrame &next_frame) {
 }
 
 bool StackFrameList::GetFramesUpTo(uint32_t end_idx,
-                                   InterruptionControl allow_interrupt) {
+                                   InterruptionControl allow_interrupt,
+      std::shared_lock<std::shared_mutex> &guard) {
   // Do not fetch frames for an invalid thread.
   bool was_interrupted = false;
   if (!m_thread.IsValid())
@@ -363,177 +380,193 @@ bool StackFrameList::GetFramesUpTo(uint32_t end_idx,
   Unwind &unwinder = m_thread.GetUnwinder();
 
   if (!m_show_inlined_frames) {
-    GetOnlyConcreteFramesUpTo(end_idx, unwinder);
+    GetOnlyConcreteFramesUpTo(end_idx, unwinder, guard);
     return false;
   }
-
-#if defined(DEBUG_STACK_FRAMES)
-  StreamFile s(stdout, false);
-#endif
-  // If we are hiding some frames from the outside world, we need to add
-  // those onto the total count of frames to fetch.  However, we don't need
-  // to do that if end_idx is 0 since in that case we always get the first
-  // concrete frame and all the inlined frames below it...  And of course, if
-  // end_idx is UINT32_MAX that means get all, so just do that...
-
-  uint32_t inlined_depth = 0;
-  if (end_idx > 0 && end_idx != UINT32_MAX) {
-    inlined_depth = GetCurrentInlinedDepth();
-    if (inlined_depth != UINT32_MAX) {
-      if (end_idx > 0)
-        end_idx += inlined_depth;
+  
+  // We're going to have to add frames, so get the writer side of the lock,
+  // and then when we're done, relock the reader side.
+  guard.unlock();
+  { // Scope for switching the writer -> reader and back
+    m_list_mutex.lock();
+    auto on_exit = llvm::make_scope_exit(
+      [&]() { 
+        m_list_mutex.unlock();
+        guard.lock();
+      });
+
+    if (m_frames.size() > end_idx || GetAllFramesFetched()) {
+      return false;
+    }
+    
+  #if defined(DEBUG_STACK_FRAMES)
+    StreamFile s(stdout, false);
+  #endif
+    // If we are hiding some frames from the outside world, we need to add
+    // those onto the total count of frames to fetch.  However, we don't need
+    // to do that if end_idx is 0 since in that case we always get the first
+    // concrete frame and all the inlined frames below it...  And of course, if
+    // end_idx is UINT32_MAX that means get all, so just do that...
+
+    uint32_t inlined_depth = 0;
+    if (end_idx > 0 && end_idx != UINT32_MAX) {
+      inlined_depth = GetCurrentInlinedDepth();
+      if (inlined_depth != UINT32_MAX) {
+        if (end_idx > 0)
+          end_idx += inlined_depth;
+      }
     }
-  }
 
-  StackFrameSP unwind_frame_sp;
-  Debugger &dbg = m_thread.GetProcess()->GetTarget().GetDebugger();
-  do {
-    uint32_t idx = m_concrete_frames_fetched++;
-    lldb::addr_t pc = LLDB_INVALID_ADDRESS;
-    lldb::addr_t cfa = LLDB_INVALID_ADDRESS;
-    bool behaves_like_zeroth_frame = (idx == 0);
-    if (idx == 0) {
-      // We might have already created frame zero, only create it if we need
-      // to.
-      if (m_frames.empty()) {
-        RegisterContextSP reg_ctx_sp(m_thread.GetRegisterContext());
-
-        if (reg_ctx_sp) {
-          const bool success = unwinder.GetFrameInfoAtIndex(
-              idx, cfa, pc, behaves_like_zeroth_frame);
-          // There shouldn't be any way not to get the frame info for frame
-          // 0. But if the unwinder can't make one, lets make one by hand
-          // with the SP as the CFA and see if that gets any further.
-          if (!success) {
-            cfa = reg_ctx_sp->GetSP();
-            pc = reg_ctx_sp->GetPC();
+    StackFrameSP unwind_frame_sp;
+    Debugger &dbg = m_thread.GetProcess()->GetTarget().GetDebugger();
+    do {
+      uint32_t idx = m_concrete_frames_fetched++;
+      lldb::addr_t pc = LLDB_INVALID_ADDRESS;
+      lldb::addr_t cfa = LLDB_INVALID_ADDRESS;
+      bool behaves_like_zeroth_frame = (idx == 0);
+      if (idx == 0) {
+        // We might have already created frame zero, only create it if we need
+        // to.
+        if (m_frames.empty()) {
+          RegisterContextSP reg_ctx_sp(m_thread.GetRegisterContext());
+
+          if (reg_ctx_sp) {
+            const bool success = unwinder.GetFrameInfoAtIndex(
+                idx, cfa, pc, behaves_like_zeroth_frame);
+            // There shouldn't be any way not to get the frame info for frame
+            // 0. But if the unwinder can't make one, lets make one by hand
+            // with the SP as the CFA and see if that gets any further.
+            if (!success) {
+              cfa = reg_ctx_sp->GetSP();
+              pc = reg_ctx_sp->GetPC();
+            }
+
+            unwind_frame_sp = std::make_shared<StackFrame>(
+                m_thread.shared_from_this(), m_frames.size(), idx, reg_ctx_sp,
+                cfa, pc, behaves_like_zeroth_frame, nullptr);
+            m_frames.push_back(unwind_frame_sp);
           }
-
-          unwind_frame_sp = std::make_shared<StackFrame>(
-              m_thread.shared_from_this(), m_frames.size(), idx, reg_ctx_sp,
-              cfa, pc, behaves_like_zeroth_frame, nullptr);
-          m_frames.push_back(unwind_frame_sp);
+        } else {
+          unwind_frame_sp = m_frames.front();
+          cfa = unwind_frame_sp->m_id.GetCallFrameAddress();
         }
       } else {
-        unwind_frame_sp = m_frames.front();
-        cfa = unwind_frame_sp->m_id.GetCallFrameAddress();
-      }
-    } else {
-      // Check for interruption when building the frames.
-      // Do the check in idx > 0 so that we'll always create a 0th frame.
-      if (allow_interrupt 
-          && INTERRUPT_REQUESTED(dbg, "Interrupted having fetched {0} frames",
-                                 m_frames.size())) {
-          was_interrupted = true;
-          break;
-      }
-
-      const bool success =
-          unwinder.GetFrameInfoAtIndex(idx, cfa, pc, behaves_like_zeroth_frame);
-      if (!success) {
-        // We've gotten to the end of the stack.
-        SetAllFramesFetched();
-        break;
-      }
-      const bool cfa_is_valid = true;
-      unwind_frame_sp = std::make_shared<StackFrame>(
-          m_thread.shared_from_this(), m_frames.size(), idx, cfa, cfa_is_valid,
-          pc, StackFrame::Kind::Regular, behaves_like_zeroth_frame, nullptr);
+        // Check for interruption when building the frames.
+        // Do the check in idx > 0 so that we'll always create a 0th frame.
+        if (allow_interrupt 
+            && INTERRUPT_REQUESTED(dbg, "Interrupted having fetched {0} frames",
+                                   m_frames.size())) {
+            was_interrupted = true;
+            break;
+        }
 
-      // Create synthetic tail call frames between the previous frame and the
-      // newly-found frame. The new frame's index may change after this call,
-      // although its concrete index will stay the same.
-      SynthesizeTailCallFrames(*unwind_frame_sp.get());
+        const bool success =
+            unwinder.GetFrameInfoAtIndex(idx, cfa, pc, behaves_like_zeroth_frame);
+        if (!success) {
+          // We've gotten to the end of the stack.
+          SetAllFramesFetched();
+          break;
+        }
+        const bool cfa_is_valid = true;
+        unwind_frame_sp = std::make_shared<StackFrame>(
+            m_thread.shared_from_this(), m_frames.size(), idx, cfa, cfa_is_valid,
+            pc, StackFrame::Kind::Regular, behaves_like_zeroth_frame, nullptr);
 
-      m_frames.push_back(unwind_frame_sp);
-    }
+        // Create synthetic tail call frames between the previous frame and the
+        // newly-found frame. The new frame's index may change after this call,
+        // although its concrete index will stay the same.
+        SynthesizeTailCallFrames(*unwind_frame_sp.get());
 
-    assert(unwind_frame_sp);
-    SymbolContext unwind_sc = unwind_frame_sp->GetSymbolContext(
-        eSymbolContextBlock | eSymbolContextFunction);
-    Block *unwind_block = unwind_sc.block;
-    TargetSP target_sp = m_thread.CalculateTarget();
-    if (unwind_block) {
-      Address curr_frame_address(
-          unwind_frame_sp->GetFrameCodeAddressForSymbolication());
-
-      SymbolContext next_frame_sc;
-      Address next_frame_address;
-
-      while (unwind_sc.GetParentOfInlinedScope(
-          curr_frame_address, next_frame_sc, next_frame_address)) {
-        next_frame_sc.line_entry.ApplyFileMappings(target_sp);
-        behaves_like_zeroth_frame = false;
-        StackFrameSP frame_sp(new StackFrame(
-            m_thread.shared_from_this(), m_frames.size(), idx,
-            unwind_frame_sp->GetRegisterContextSP(), cfa, next_frame_address,
-            behaves_like_zeroth_frame, &next_frame_sc));
-
-        m_frames.push_back(frame_sp);
-        unwind_sc = next_frame_sc;
-        curr_frame_address = next_frame_address;
+        m_frames.push_back(unwind_frame_sp);
       }
-    }
-  } while (m_frames.size() - 1 < end_idx);
-
-  // Don't try to merge till you've calculated all the frames in this stack.
-  if (GetAllFramesFetched() && m_prev_frames_sp) {
-    StackFrameList *prev_frames = m_prev_frames_sp.get();
-    StackFrameList *curr_frames = this;
-
-#if defined(DEBUG_STACK_FRAMES)
-    s.PutCString("\nprev_frames:\n");
-    prev_frames->Dump(&s);
-    s.PutCString("\ncurr_frames:\n");
-    curr_frames->Dump(&s);
-    s.EOL();
-#endif
-    size_t curr_frame_num, prev_frame_num;
-
-    for (curr_frame_num = curr_frames->m_frames.size(),
-        prev_frame_num = prev_frames->m_frames.size();
-         curr_frame_num > 0 && prev_frame_num > 0;
-         --curr_frame_num, --prev_frame_num) {
-      const size_t curr_frame_idx = curr_frame_num - 1;
-      const size_t prev_frame_idx = prev_frame_num - 1;
-      StackFrameSP curr_frame_sp(curr_frames->m_frames[curr_frame_idx]);
-      StackFrameSP prev_frame_sp(prev_frames->m_frames[prev_frame_idx]);
 
-#if defined(DEBUG_STACK_FRAMES)
-      s.Printf("\n\nCurr frame #%u ", curr_frame_idx);
-      if (curr_frame_sp)
-        curr_frame_sp->Dump(&s, true, false);
-      else
-        s.PutCString("NULL");
-      s.Printf("\nPrev frame #%u ", prev_frame_idx);
-      if (prev_frame_sp)
-        prev_frame_sp->Dump(&s, true, false);
-      else
-        s.PutCString("NULL");
-#endif
+      assert(unwind_frame_sp);
+      SymbolContext unwind_sc = unwind_frame_sp->GetSymbolContext(
+          eSymbolContextBlock | eSymbolContextFunction);
+      Block *unwind_block = unwind_sc.block;
+      TargetSP target_sp = m_thread.CalculateTarget();
+      if (unwind_block) {
+        Address curr_frame_address(
+            unwind_frame_sp->GetFrameCodeAddressForSymbolication());
+
+        SymbolContext next_frame_sc;
+        Address next_frame_address;
+
+        while (unwind_sc.GetParentOfInlinedScope(
+            curr_frame_address, next_frame_sc, next_frame_address)) {
+          next_frame_sc.line_entry.ApplyFileMappings(target_sp);
+          behaves_like_zeroth_frame = false;
+          StackFrameSP frame_sp(new StackFrame(
+              m_thread.shared_from_this(), m_frames.size(), idx,
+              unwind_frame_sp->GetRegisterContextSP(), cfa, next_frame_address,
+              behaves_like_zeroth_frame, &next_frame_sc));
+
+          m_frames.push_back(frame_sp);
+          unwind_sc = next_frame_sc;
+          curr_frame_address = next_frame_address;
+        }
+      }
+    } while (m_frames.size() - 1 < end_idx);
+
+    // Don't try to merge till you've calculated all the frames in this stack.
+    if (GetAllFramesFetched() && m_prev_frames_sp) {
+      StackFrameList *prev_frames = m_prev_frames_sp.get();
+      StackFrameList *curr_frames = this;
+
+  #if def...
[truncated]

Copy link

github-actions bot commented Nov 21, 2024

✅ With the latest revision this PR passed the Python code formatter.

Copy link

github-actions bot commented Nov 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 833 to 834
std::shared_lock<std::shared_mutex> guard(m_list_mutex);
result = SetSelectedFrameNoLock(frame);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shared lock around a function called "Set" is very suspicious. What is this trying to protect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list mutex protects the stack frame list, not metadata like the selected frame pointer. So this function is a reader of the stack frame list, though it will update the metadata.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switching of lock types is making me very nervous. Every switch is an opportunity to get preempted, which means you have to recheck any invariants before you do that -- you can't just assume that conditions you tested earlier in the function still hold.

That essentially forces you to treat all work being done under one lock type as a single unit -- with well defined inputs, outputs, pre- and post-conditions (okay, maybe it doesn't, but I'd argue that if you can't say what is the outcome of a block of code like this, then its very hard to guarantee or check that its being used correctly).

Now, if we have a piece of code with inputs and outputs and pre- and post-conditions, then the best thing to do is to put that in a function. For something like stack unwinding, my ideal setup would be to have one function for the "shared" work, one for the "exclusive" work and then one to orchestrate everything (each one can be further subdivided as needed).

GetFramesUpTo(idx) {
  frame = GetCachedFrame(idx); // uses read/shared lock internally
  if (!frame) {
    // uses write/exclusive lock internally, cannot assume that frame[idx] does not exist as it might have been created in the meantime.
    frame = GetOrCreateFrame(idx); 
  }
  // Do something with the frame (with or without a lock).
}

So, my question is, why can't the code be structured to look something like this (I know my example is too simple to work verbatim, but it's just an illustration)? If not, then why, and how is one supposed to reason about code like this (these functions are sufficiently complex without needing to worry about lock types and preemption)? I think it's totally reasonable and desirable (and expected?) to precede a patch like this with some refactor which makes the code more amenable to a different locking strategy.

Unwind &unwinder) {
void StackFrameList::GetOnlyConcreteFramesUpTo(
uint32_t end_idx, Unwind &unwinder,
std::shared_lock<std::shared_mutex> &guard) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the work of this function is done under the exclusive mutex. Why does it need the lock dance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. In the patch as posted, GetOnly ConcreteFramesUpTo gets called before the shared lock is dropped and the exclusive mutex acquired (since it returned right away in its only use, it seemed nicer to put that dance in the function. That should be clear by moving the equivalent "get concrete and inlined frames" code into its own function.

exclusive mutex) into a separate function for clarity.
@jimingham
Copy link
Collaborator Author

jimingham commented Dec 7, 2024

Does this look better?

The situation is that there are really only three "writer" API's: GetOnlyConcreteFramesUpTo, Clear and the second half of GetFramesUpTo, and only Clear is called with the intention of changing the list contents. And of these only Clear was a client API. All the other client API's were formally readers - they shouldn't have to know that we update frames lazily. So the responsibilities were already pretty strictly limited.

But clearer is for sure better, so I renamed GetOnlyConcreteFramesUpTo to FetchOnlyConcreteFramesUpTo and moved the second half of GetFramesUpTo into FetchFramesUpTo. Those and Clear are the only places where the list is actually mutated. Clear doesn't need to do any dance, since it's a client API and intends to change the list. Other than that, clients who want to read the list need to acquire the shared mutex, if the Fetch functions need to add frames to fulfill that request, that should happen silently when some client asks for more frames than have been fetched.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhat better, but it doesn't address my main concern, which is this mid-scope switching of lock types. It's an unusual setup, and that's usually not good thing when it comes to locking and threads.

I don't think that the fact that these are all reader APIs (for clients of StackFrameList) has anything to do with this, as the locking scheme is an implementation detail of the StackFrameList class (evidenced by the fact that you can change it without modifying any of the clients). The interface we're concerned with here is the one between m_frames member (and anything else being protected) and the code accessing it. And there are definitely writers there.

I still believe that for a code like this to be correct, each piece of code covered by a single type of a lock needs to have a clearly defined purpose. And if it has a purpose, then it can be written as a scope (block or a function).

Let's take a look at FetchOnlyConcreteFramesUpTo and FetchFramesUpTo. Their only caller is GetFramesUpTo and their called close together, which means scoping them should be fairly easy.

Now, let's see what happens before this "write" block. Inside GetNumFrames, we basically only have a "have the frames been computed already" check, which is amenable to be put into a separate function block. GetFramesUpTo is called from two places. One of them is GetNumFrames which does nothing besides locking the mutex and them calling GetFramesUpTo. The other is GetFrameAtIndexNoLock which does another "has the frame been computed check" (is there any difference between the two, and can they be merged?) and then calls GetFramesUpTo. It doesn't lock the mutex though, so lets look at where that happens.

This happens either in GetFrameAtIndex, where the pattern is again "lock the mutex and call the next function"; in GetFrameWithConcreteFrameIndex, which AFAICT is not doing anything that needs protecting, as its only working with the frames returned from GetFrameAtIndex; or in GetFrameWithStackID, which is doing a slightly different form of "has this frame been computed" check (via a binary search), and then proceeds to generate new frames (it starts calling GetFrameAtIndexNoLock with frame zero, but I think it could start with m_frames.size(), since it has checked the previous frames already).

So it seems to me that at least everything that happens before taking the write lock could be expressed as some sort of a function or a block. I haven't looked that closely at what happens after releasing the write lock, but I believe the same could be done there as well (with the exception of one block, which I think actually needs to be protected by the write lock as well -- see inline comment). I think the code would look better if we did that.

@@ -594,7 +643,7 @@ StackFrameSP StackFrameList::GetFrameAtIndex(uint32_t idx) {
// GetFramesUpTo will fill m_frames with as many frames as you asked for, if
// there are that many. If there weren't then you asked for too many frames.
// GetFramesUpTo returns true if interrupted:
if (GetFramesUpTo(idx)) {
if (GetFramesUpTo(idx, AllowInterruption, guard)) {
Log *log = GetLog(LLDBLog::Thread);
LLDB_LOG(log, "GetFrameAtIndex was interrupted");
return {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is a comment about lines from 652 to ~682, but github won't let me place a comment there)

Are you sure that code can be run with only a read/shared lock? It modifies (through the SetFrameAtIndex call) the stack frame list, which sounds like it could be bad..

Copy link
Collaborator Author

@jimingham jimingham Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have sharp eyes, thanks! I missed that code. It's based on a false premise (GetFramesUpTo fills the stack frames up to the depth requested in both the "concrete frames only" case and the "inlined frames". So this code is redundant (and will always find the required frames filled by the time you get to it.)
However, that issue is a bit moot, as it turns out. You can only make a "concrete frames only" StackFrameList at construction time by passing true to the show_inlined_frames argument. Nowhere in lldb is that passed as false, and I can't see evidence that we ever made that possible. It is also entirely untested.
I've deleted the bit of code that was setting frames here in the patch as that was never the right thing to do even if we want the "concrete frames only" feature.
But we should probably also discuss whether we need to preserve this "OnlyConcreteFrames" stack feature given that we haven't used it pretty much forever so far as I can see.
However, if we decide to get rid of it, that should be a separate patch, IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks.

That said, I think the code below still needs a read lock as it's accessing m_frames. I think that might be avoidable if GetFramesUpTo returned the frame it was asked for (see other suggestions).

That code was wrong, it was prefaced by the comment that only when
we get inlined frames does GetFramesUpTo fetch the needed frames, but
that's not true.
comments explaining how this is to be used.
@jimingham
Copy link
Collaborator Author

I simplified this as much as I could. I had started by assuming that the places where we were acquiring m_list_mutex were all required, and the problem was that some were reader and some writer uses but the single mutex type treated them all as writers. So I divided them by their actual usages and then dealt with the two usages as appropriate. But in fact in a number of places we were just grabbing the mutex because we might want it and since it was recursive it didn't hurt.

So I took out all the mutex uses that didn't surround code that either directly accesses the m_frames or m_concrete_frames_fetched. That reduced some of the noise. Then since switching roles seemed to concern you, I reworked it so that we only did writer tasks in whole functions now documented to have that role - and had to be called with no lock taken. The callers that want to do "check whether I have to do work" then "do work" have to acquire the shared lock, do the checking, drop it and call one of the writer functions.

Hopefully this will seem clearer.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Despite the number of comments, I think this looks much better. I've made some suggestions that I think could reduce the number of times the mutex is taken (not so much for the performance, but for the readability benefit). I've marked them as optional, but I think they'd could improve things, and shouldn't be too hard to implement.

Comment on lines 583 to 584
// Formally we should acquire the shared lock here, but since we've forced
// fetching all the frames, it can't change anymore so that's not required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true if can_create==true. I think you'll still need the read lock here.

SetDefaultFileAndLineToSelectedFrame();
return *m_selected_frame_idx;
result = *m_selected_frame_idx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the result variable? C++ guarantees that the result will be computed before the destructors are run precisely so that code like return protected_object->some_operation() is safe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was left over from a previous refactoring.

@@ -1449,7 +1449,7 @@ void Thread::ClearStackFrames() {
// frames:
// FIXME: At some point we can try to splice in the frames we have fetched
// into the new frame as we make it, but let's not try that now.
if (m_curr_frames_sp && m_curr_frames_sp->GetAllFramesFetched())
if (m_curr_frames_sp && m_curr_frames_sp->WereAllFramesFetched())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the past tense touch

Comment on lines 691 to 699
collection::const_iterator begin = m_frames.begin();
collection::const_iterator end = m_frames.end();
if (begin != end) {
collection::const_iterator pos =
std::lower_bound(begin, end, stack_id, CompareStackID);
if (pos != end) {
if ((*pos)->GetStackID() == stack_id)
return *pos;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
collection::const_iterator begin = m_frames.begin();
collection::const_iterator end = m_frames.end();
if (begin != end) {
collection::const_iterator pos =
std::lower_bound(begin, end, stack_id, CompareStackID);
if (pos != end) {
if ((*pos)->GetStackID() == stack_id)
return *pos;
}
collection::const_iterator pos =
llvm::lower_bound(m_frames, CompareStackID);
if (pos != m_frames.end() && (*pos)->GetStackID() == stack_id)
return *pos;

Optional, but would be a nice way to compensate for the increase in the nesting level.

@@ -25,6 +25,7 @@
#include "lldb/Target/Unwind.h"
#include "lldb/Utility/LLDBLog.h"
#include "lldb/Utility/Log.h"
#include "llvm/ADT/ScopeExit.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, that.

@@ -119,6 +120,7 @@ bool StackFrameList::DecrementCurrentInlinedDepth() {
uint32_t current_inlined_depth = GetCurrentInlinedDepth();
if (current_inlined_depth != UINT32_MAX) {
if (current_inlined_depth > 0) {
std::lock_guard<std::mutex> guard(m_inlined_depth_mutex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also very suspicious of code like this. It may be enough to make thread sanitizer happy (as there to longer a data race on the variable), but I doubt that it makes the code surrounding this this function race-free. Like, if there are multiple callers of this function they could both conclude that current_inlined_depth is greater than zero, and then both decrement (and overflow) it. And if there can't be multiple parallel callers, then what's the mutex for?

At the very least, I think the mutex should cover the whole function, but maybe this even needs synchronization at a higher level. Also, I know this is somehow related to the mutex switch in ResetCurrentInlinedDepth, but is there any way this could be a separate patch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That switch was necessitated by the change of m_list_mutex from a recursive to a non-recursive mutex. This code often gets executed under some frame that has the list mutex. So you can't just leave it as is.

The control of this feature isn't really well modeled by mutexes. I think they were just there because recursive mutexes make it easy to reflexively put them around anything...

But the inlined depth of the current stack frame has to be set before we present the StackFrameList - it doesn't really make sense to do it otherwise. BTW, that doesn't violate the "lazy stack frame building" aim because we only compute this for the inlined frames below the first concrete frame, which we always fetch.

So by the time any clients see the StackFrameList, the inlined depth should have been set. It's not as simple as "has to be done in the constructor" because we also need to reset it at the point where we decide that we can implement a "step in" by just changing the inlined stack counter. We do that without rebuilding the stack frame.

Currently this isn't a problem because we wrote the code so that it sets the inlined depth in the right place. So perhaps we should just drop this mutex altogether and rely on this getting set and accessed in the right order, as currently happens? But I'd rather do that as a separate patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do that. Someone from the future is going to thank you.

Comment on lines +117 to +118
/// Callers should first check (under the shared mutex) whether we need to
/// fetch frames or not.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) I'd consider putting the read-locked check into the function itself. I think then you wouldn't need the WereAllFramesFetched check in GetNumFrames, and I don't think it should hurt the other callers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two uses of GetFramesUpTo figure out whether they can get what they need w/o adding new frames in different ways. In general, I like that the immediate callers can figure out by their own lights whether they have the frames they need, and then only call GetFramesUpTo if you need to fetch more frames. That seems clearer to me.

Comment on lines 622 to 626
if (idx < m_frames.size())
frame_sp = m_frames[idx];

if (frame_sp)
return frame_sp;
if (frame_sp)
return frame_sp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional, idea) If GetFramesUpTo somehow returned the frame at the given index then there wouldn't be a need for this block here. I'm thinking about this in conjunction with my other suggestion to have a read-locked fastpath in GetFramesUpTo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that helps much. Of the current two uses, GetNumFrames doesn't return the frame of the index passed in, and GetFrameAtIndex has a fallback that would still require the shared lock.

@@ -594,7 +643,7 @@ StackFrameSP StackFrameList::GetFrameAtIndex(uint32_t idx) {
// GetFramesUpTo will fill m_frames with as many frames as you asked for, if
// there are that many. If there weren't then you asked for too many frames.
// GetFramesUpTo returns true if interrupted:
if (GetFramesUpTo(idx)) {
if (GetFramesUpTo(idx, AllowInterruption, guard)) {
Log *log = GetLog(LLDBLog::Thread);
LLDB_LOG(log, "GetFrameAtIndex was interrupted");
return {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks.

That said, I think the code below still needs a read lock as it's accessing m_frames. I think that might be avoidable if GetFramesUpTo returned the frame it was asked for (see other suggestions).

@jimingham
Copy link
Collaborator Author

I pushed the last set of changes implementing some of your suggestions and adding the missing shared lock.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience.

@@ -119,6 +120,7 @@ bool StackFrameList::DecrementCurrentInlinedDepth() {
uint32_t current_inlined_depth = GetCurrentInlinedDepth();
if (current_inlined_depth != UINT32_MAX) {
if (current_inlined_depth > 0) {
std::lock_guard<std::mutex> guard(m_inlined_depth_mutex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do that. Someone from the future is going to thank you.

frame_sp->GetSymbolContext(eSymbolContextSymbol).symbol);
}
SetFrameAtIndex(idx, frame_sp);
{ // Now we're accessing m_frames as a reader, so acquire the reader lock.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scope shouldn't be necessary. If I'm reading this correctly, we're returning directly afterwards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like making the scope explicit so that if someone comes along later on and needs to add code at the end of this function, they will have to think about whether their addition should be protected by the mutex or not.

@jimingham jimingham merged commit 186fac3 into llvm:main Dec 12, 2024
7 checks passed
@jimingham jimingham deleted the stack-frame-list-mutex branch December 12, 2024 20:49
@mstorsjo
Copy link
Member

FYI, this had a small impact on deployability on macOS; I used to build LLVM with 10.9 as a deployment target, but now get these errors:

/Users/runner/work/llvm-mingw/llvm-mingw/llvm-project/lldb/include/lldb/Target/StackFrameList.h:161:16: error: 'shared_mutex' is unavailable: introduced in macOS 10.12
  mutable std::shared_mutex m_list_mutex;
               ^

Raising the deployment target to 10.12 instead of 10.9 is not an issue for me, it's more than plenty old for where anybody might want to run my toolchain. But IIRC there are some downstream distributions that do build recent LLVM for very old versions of macOS too (but perhaps they provide a newer libc++ to go with it, I don't know).

So - not an issue (yet AFAIK), but just a note that this does change the potential minimum baseline.

@jimingham
Copy link
Collaborator Author

jimingham commented Dec 13, 2024

When we decided to allow C++11 usage in lldb I think we opted into requiring platforms that have relatively complete support for the C++11 additions. I don't think it makes sense to try to limit the use of C++11 to some "minimal supported subset".

@mstorsjo
Copy link
Member

When we decided to allow C++11 usage in lldb I think we opted into requiring platforms that have relatively complete support for the C++11 additions. I don't think it makes sense to try to limit the use of C++11 to some "minimal supported subset".

Yeah, I don’t disagree with that - and we’re at C++17 now already. Anyway, requiring 10.12 from 2016 seems reasonable to me.

mstorsjo added a commit to mstorsjo/llvm-mingw that referenced this pull request Dec 13, 2024
Since llvm/llvm-project#117252 and
llvm/llvm-project@186fac3,
part of the latest git main branch, LLDB uses std::shared_mutex.

On macOS, std::shared_mutex is only available since macOS 10.12.

This change raises the deployment target from 10.9 to 10.12.
@adrian-prantl
Copy link
Collaborator

Yeah, I don’t disagree with that - and we’re at C++17 now already. Anyway, requiring 10.12 from 2016 seems reasonable to me.

Especially since nobody needs to run LLDB locally in order to do debugging. If you really want to debug with the very latest LLDB on a 10 year old system, you can debug remotely.

jimingham added a commit to jimingham/from-apple-llvm-project that referenced this pull request Dec 14, 2024
In fact, there's only one public API in StackFrameList that changes
 the list explicitly.  The rest only change the list if you happen to
ask for more frames than lldb has currently fetched and that
always adds frames "behind the user's back".  So we were
much more prone to deadlocking than we needed to be.

This patch uses a shared_mutex instead, and when we have to add more
frames (in GetFramesUpTo) we switches to exclusive long enough to add
the frames, then goes back to shared.

Most of the work here was actually getting the stack frame list locking
to not
require a recursive mutex (shared mutexes aren't recursive).

I also added a test that has 5 threads progressively asking for more
frames simultaneously to make sure we get back valid frames and don't
deadlock.

(cherry picked from commit 186fac3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants