Skip to content

[lldb] make PlatformAndroid/AdbClient::GetSyncService threadsafe #145382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cs01
Copy link

@cs01 cs01 commented Jun 23, 2025

Problem

PlatformAndroid::GetSyncService, which pulls files from android devices in methods like PlatformAndroid::GetFile, is not threadsafe. Until recently, this was not an issue because it was apparently not used in a threaded environment. However when the new setting

set target.parallel-module-load true

was added, it began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb.

The top of the stack in the crash look something like this:

#0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe)
 #1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99)
 #2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda)
 #3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0
 #4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707)
 #5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41)
 #6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e)
 #7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff)
 #8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc)

Our workaround was to set set target.parallel-module-load to false to avoid the crash. This PR fixes the root cause by creating a new connection for each Adb SyncService.

Background

Android debugging with lldb relies on communication with devices via adb. lldb has an AdbClient class that is called from the PlatformAndroid code. Any time the Android platform uses AdbClient, it reuses the same connection.

Root Cause

The root cause of the crash is the single SyncService instance used and reused in AdbClient. SyncService is used to push, pull, and stat files on device using the adb protocol directly, and maintains a ConnectionFileDescriptor. When multiple threads attempt to do such operations on the same connection, the threads can interfere with eachother, often resulting in crashes since the connection may be destroyed in one thread but still be used in another.

Fix

The fix implemented in this PR is to create a new SyncService each time one is requested by the Android platform.

A new unit test was added which induces this crash. This diff fixes the crash, but if the unit test is run against existing code the crash will occur (see test plan).

Test plan

Before

./tools/lldb/unittests/Platform/Android/AdbClientTests 
[==========] Running 11 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 4 tests from AdbClientTest
[ RUN      ] AdbClientTest.CreateByDeviceId
[       OK ] AdbClientTest.CreateByDeviceId (0 ms)
[ RUN      ] AdbClientTest.CreateByDeviceId_ByEnvVar
[       OK ] AdbClientTest.CreateByDeviceId_ByEnvVar (0 ms)
[ RUN      ] AdbClientTest.GetSyncServiceThreadSafe
Segmentation fault (core dumped)

After

Unit tests now pass instead of crash

> ./tools/lldb/unittests/Platform/Android/AdbClientTests 
[==========] Running 11 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 4 tests from AdbClientTest
[ RUN      ] AdbClientTest.CreateByDeviceId
[       OK ] AdbClientTest.CreateByDeviceId (0 ms)
[ RUN      ] AdbClientTest.CreateByDeviceId_ByEnvVar
[       OK ] AdbClientTest.CreateByDeviceId_ByEnvVar (0 ms)
[ RUN      ] AdbClientTest.GetSyncServiceThreadSafe
[       OK ] AdbClientTest.GetSyncServiceThreadSafe (46 ms)
[ RUN      ] AdbClientTest.ConnectionMoveRaceCondition
[       OK ] AdbClientTest.ConnectionMoveRaceCondition (4 ms)
[----------] 4 tests from AdbClientTest (51 ms total)

[----------] 7 tests from PlatformAndroidTest
[ RUN      ] PlatformAndroidTest.DownloadModuleSliceWithAdbClientError
[       OK ] PlatformAndroidTest.DownloadModuleSliceWithAdbClientError (0 ms)
[ RUN      ] PlatformAndroidTest.DownloadModuleSliceWithNormalFile
[       OK ] PlatformAndroidTest.DownloadModuleSliceWithNormalFile (0 ms)
[ RUN      ] PlatformAndroidTest.DownloadModuleSliceWithZipFile

GMOCK WARNING:
Uninteresting mock function call - returning default value.
    Function call: GetPropertyPackageName()
          Returns: ""
NOTE: You can safely ignore the above warning unless this call should not happen.  Do not suppress it by blindly adding an EXPECT_CALL() if you don't mean to enforce the call.  See https://github.com/google/googletest/blob/main/docs/gmock_cook_book.md#knowing-when-to-expect-useoncall for details.
[       OK ] PlatformAndroidTest.DownloadModuleSliceWithZipFile (0 ms)
[ RUN      ] PlatformAndroidTest.DownloadModuleSliceWithZipFileAndRunAs
[       OK ] PlatformAndroidTest.DownloadModuleSliceWithZipFileAndRunAs (0 ms)
[ RUN      ] PlatformAndroidTest.GetFileWithNormalFile
[       OK ] PlatformAndroidTest.GetFileWithNormalFile (0 ms)
[ RUN      ] PlatformAndroidTest.GetFileWithCatFallback

GMOCK WARNING:
Uninteresting mock function call - returning default value.
    Function call: GetPropertyPackageName()
          Returns: ""
NOTE: You can safely ignore the above warning unless this call should not happen.  Do not suppress it by blindly adding an EXPECT_CALL() if you don't mean to enforce the call.  See https://github.com/google/googletest/blob/main/docs/gmock_cook_book.md#knowing-when-to-expect-useoncall for details.
[       OK ] PlatformAndroidTest.GetFileWithCatFallback (0 ms)
[ RUN      ] PlatformAndroidTest.GetFileWithCatFallbackAndRunAs
[       OK ] PlatformAndroidTest.GetFileWithCatFallbackAndRunAs (0 ms)
[----------] 7 tests from PlatformAndroidTest (2 ms total)

[----------] Global test environment tear-down
[==========] 11 tests from 2 test suites ran. (54 ms total)
[  PASSED  ] 11 tests.

@cs01 cs01 requested a review from JDevlieghere as a code owner June 23, 2025 18:42
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Jun 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 23, 2025

@llvm/pr-subscribers-lldb

Author: Chad Smith (cs01)

Changes

AdbClient and PlatformAndroid GetSyncService is not threadsafe. This was not an issue because it was (apparently) not used in a threaded environment. However when the new setting

set target.parallel-module-load true

was added, it began being called from multiple threads to load modules simultaneously.

Setting it to True triggers crashes, False stops it from crashing.

The top of the stack in the crash looked something like this:

lldb_private::platform_android::AdbClient::SyncService::SendSyncRequest(char const*, unsigned int, void const*) at /usr/lib/liblldb.so.15.0.0:0
lldb_private::platform_android::AdbClient::SyncService::internalStat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) at /usr/lib/liblldb.so.15.0.0:0
lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) at /usr/lib/liblldb.so.15.0.0:0

Our workaround was to set it to False, but the root cause was not fixed. This PR fixes the root cause by creating a new connection for each Adb SyncService.


Full diff: https://github.com/llvm/llvm-project/pull/145382.diff

5 Files Affected:

  • (modified) lldb/source/Plugins/Platform/Android/AdbClient.cpp (+24-5)
  • (modified) lldb/source/Plugins/Platform/Android/AdbClient.h (+2)
  • (modified) lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp (+4-6)
  • (modified) lldb/source/Plugins/Platform/Android/PlatformAndroid.h (+1-3)
  • (modified) lldb/unittests/Platform/Android/AdbClientTest.cpp (+118-3)
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.cpp b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
index a179260ca15f6..4c5342f7af60b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.cpp
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
@@ -417,13 +417,30 @@ Status AdbClient::ShellToFile(const char *command, milliseconds timeout,
   return Status();
 }
 
+// Returns a sync service for file operations.
+// This operation is thread-safe - each call creates an isolated sync service
+// with its own connection to avoid race conditions.
 std::unique_ptr<AdbClient::SyncService>
 AdbClient::GetSyncService(Status &error) {
-  std::unique_ptr<SyncService> sync_service;
-  error = StartSync();
-  if (error.Success())
-    sync_service.reset(new SyncService(std::move(m_conn)));
+  std::lock_guard<std::mutex> lock(m_sync_mutex);
+
+  // Create a temporary AdbClient with its own connection for this sync service
+  // This avoids the race condition of multiple threads accessing the same
+  // connection
+  AdbClient temp_client(m_device_id);
+
+  // Connect and start sync on the temporary client
+  error = temp_client.Connect();
+  if (error.Fail())
+    return nullptr;
 
+  error = temp_client.StartSync();
+  if (error.Fail())
+    return nullptr;
+
+  // Move the connection from the temporary client to the sync service
+  std::unique_ptr<SyncService> sync_service;
+  sync_service.reset(new SyncService(std::move(temp_client.m_conn)));
   return sync_service;
 }
 
@@ -487,7 +504,9 @@ Status AdbClient::SyncService::internalPushFile(const FileSpec &local_file,
                                                error.AsCString());
   }
   error = SendSyncRequest(
-      kDONE, llvm::sys::toTimeT(FileSystem::Instance().GetModificationTime(local_file)),
+      kDONE,
+      llvm::sys::toTimeT(
+          FileSystem::Instance().GetModificationTime(local_file)),
       nullptr);
   if (error.Fail())
     return error;
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.h b/lldb/source/Plugins/Platform/Android/AdbClient.h
index 851c09957bd4a..9ef780bc6202b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.h
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.h
@@ -14,6 +14,7 @@
 #include <functional>
 #include <list>
 #include <memory>
+#include <mutex>
 #include <string>
 #include <vector>
 
@@ -135,6 +136,7 @@ class AdbClient {
 
   std::string m_device_id;
   std::unique_ptr<Connection> m_conn;
+  mutable std::mutex m_sync_mutex;
 };
 
 } // namespace platform_android
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
index 5bc9cc133fbd3..68e4886cb1c7a 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
@@ -474,13 +474,11 @@ std::string PlatformAndroid::GetRunAs() {
   return run_as.str();
 }
 
-AdbClient::SyncService *PlatformAndroid::GetSyncService(Status &error) {
-  if (m_adb_sync_svc && m_adb_sync_svc->IsConnected())
-    return m_adb_sync_svc.get();
-
+std::unique_ptr<AdbClient::SyncService>
+PlatformAndroid::GetSyncService(Status &error) {
   AdbClientUP adb(GetAdbClient(error));
   if (error.Fail())
     return nullptr;
-  m_adb_sync_svc = adb->GetSyncService(error);
-  return (error.Success()) ? m_adb_sync_svc.get() : nullptr;
+
+  return adb->GetSyncService(error);
 }
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
index 5602edf73c1d3..3140acb573416 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
@@ -80,9 +80,7 @@ class PlatformAndroid : public platform_linux::PlatformLinux {
   std::string GetRunAs();
 
 private:
-  AdbClient::SyncService *GetSyncService(Status &error);
-
-  std::unique_ptr<AdbClient::SyncService> m_adb_sync_svc;
+  std::unique_ptr<AdbClient::SyncService> GetSyncService(Status &error);
   std::string m_device_id;
   uint32_t m_sdk_version;
 };
diff --git a/lldb/unittests/Platform/Android/AdbClientTest.cpp b/lldb/unittests/Platform/Android/AdbClientTest.cpp
index 0808b96f69fc8..c2f658b9d1bc1 100644
--- a/lldb/unittests/Platform/Android/AdbClientTest.cpp
+++ b/lldb/unittests/Platform/Android/AdbClientTest.cpp
@@ -6,9 +6,13 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "gtest/gtest.h"
 #include "Plugins/Platform/Android/AdbClient.h"
+#include "gtest/gtest.h"
+#include <atomic>
 #include <cstdlib>
+#include <future>
+#include <thread>
+#include <vector>
 
 static void set_env(const char *var, const char *value) {
 #ifdef _WIN32
@@ -31,14 +35,14 @@ class AdbClientTest : public ::testing::Test {
   void TearDown() override { set_env("ANDROID_SERIAL", ""); }
 };
 
-TEST(AdbClientTest, CreateByDeviceId) {
+TEST_F(AdbClientTest, CreateByDeviceId) {
   AdbClient adb;
   Status error = AdbClient::CreateByDeviceID("device1", adb);
   EXPECT_TRUE(error.Success());
   EXPECT_EQ("device1", adb.GetDeviceID());
 }
 
-TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
+TEST_F(AdbClientTest, CreateByDeviceId_ByEnvVar) {
   set_env("ANDROID_SERIAL", "device2");
 
   AdbClient adb;
@@ -47,5 +51,116 @@ TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
   EXPECT_EQ("device2", adb.GetDeviceID());
 }
 
+TEST_F(AdbClientTest, GetSyncServiceThreadSafe) {
+  // Test high-volume concurrent access to GetSyncService
+  // This test verifies thread safety under sustained load with many rapid calls
+  // Catches race conditions that emerge when multiple threads make repeated
+  // calls to GetSyncService on the same AdbClient instance
+
+  AdbClient shared_adb_client("test_device");
+
+  const int num_threads = 8;
+  std::vector<std::future<bool>> futures;
+  std::atomic<int> success_count{0};
+  std::atomic<int> null_count{0};
+
+  // Launch multiple threads that all call GetSyncService on the SAME AdbClient
+  for (int i = 0; i < num_threads; ++i) {
+    futures.push_back(std::async(std::launch::async, [&]() {
+      // Multiple rapid calls to trigger the race condition
+      for (int j = 0; j < 20; ++j) {
+        Status error;
+
+        auto sync_service = shared_adb_client.GetSyncService(error);
+
+        if (sync_service != nullptr) {
+          success_count++;
+        } else {
+          null_count++;
+        }
+
+        // Small delay to increase chance of hitting the race condition
+        std::this_thread::sleep_for(std::chrono::microseconds(1));
+      }
+      return true;
+    }));
+  }
+
+  // Wait for all threads to complete
+  bool all_completed = true;
+  for (auto &future : futures) {
+    bool thread_result = future.get();
+    if (!thread_result) {
+      all_completed = false;
+    }
+  }
+
+  // This should pass (though sync services may fail
+  // to connect)
+  EXPECT_TRUE(all_completed) << "Parallel GetSyncService calls should not "
+                                "crash due to race conditions. "
+                             << "Successes: " << success_count.load()
+                             << ", Nulls: " << null_count.load();
+
+  // The key test: we should complete all operations without crashing
+  int total_operations = num_threads * 20;
+  int completed_operations = success_count.load() + null_count.load();
+  EXPECT_EQ(total_operations, completed_operations)
+      << "All operations should complete without crashing";
+}
+
+TEST_F(AdbClientTest, ConnectionMoveRaceCondition) {
+  // Test simultaneous access timing to GetSyncService
+  // This test verifies thread safety when multiple threads start at exactly
+  // the same time, maximizing the chance of hitting precise timing conflicts
+  // Catches race conditions that occur with synchronized simultaneous access
+
+  AdbClient adb_client("test_device");
+
+  // Try to trigger the exact race condition by having multiple threads
+  // simultaneously call GetSyncService
+
+  std::atomic<bool> start_flag{false};
+  std::vector<std::thread> threads;
+  std::atomic<int> null_service_count{0};
+  std::atomic<int> valid_service_count{0};
+
+  const int num_threads = 10;
+
+  // Create threads that will all start simultaneously
+  for (int i = 0; i < num_threads; ++i) {
+    threads.emplace_back([&]() {
+      // Wait for all threads to be ready
+      while (!start_flag.load()) {
+        std::this_thread::yield();
+      }
+
+      Status error;
+      auto sync_service = adb_client.GetSyncService(error);
+
+      if (sync_service == nullptr) {
+        null_service_count++;
+      } else {
+        valid_service_count++;
+      }
+    });
+  }
+
+  // Start all threads simultaneously to maximize chance of race condition
+  start_flag.store(true);
+
+  // Wait for all threads to complete
+  for (auto &thread : threads) {
+    thread.join();
+  }
+
+  // The test passes if we don't crash
+  int total_results = null_service_count.load() + valid_service_count.load();
+  EXPECT_EQ(num_threads, total_results)
+      << "All threads should complete without crashing. "
+      << "Null services: " << null_service_count.load()
+      << ", Valid services: " << valid_service_count.load();
+}
+
 } // end namespace platform_android
 } // end namespace lldb_private

@cs01 cs01 marked this pull request as draft June 23, 2025 18:43
@JDevlieghere JDevlieghere requested review from labath and splhack June 23, 2025 22:44
@royitaqi royitaqi requested a review from clayborg June 24, 2025 01:20
@cs01 cs01 marked this pull request as ready for review June 24, 2025 01:42
Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one way to fix the problem. Another (maybe even more obvious) would be to use a single SyncService object but synchronize all access to it. I think your approach makes sense -- it unlocks more parallelism, though that parallelism may be a mirage if the device connection is going to be the bottleneck. And all the extra connections may add some overhead. Have you looked into how the two approaches compare?

As for the patch itself, the code makes sense, but the result looks very path-dependent (meaning: you'd never write code like this -- creating a temporary AdbClient object -- if you were writing this from scratch). It's also not optimal, as you're creating a temporary AdbClient in PlatformAndroid::GetSyncService and then creating another temporary object inside AdbClient::GetSyncService.

I think this would look better if PlatformAndroid::GetSyncService constructed a (new) SyncService directly. And the SyncService could create a temporary AdbClient object as an implementation detail (or not -- maybe it could handle all of the connection setup internally)

error = StartSync();
if (error.Success())
sync_service.reset(new SyncService(std::move(m_conn)));
std::lock_guard<std::mutex> lock(m_sync_mutex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this actually protecting?

Comment on lines +442 to 444
std::unique_ptr<SyncService> sync_service;
sync_service.reset(new SyncService(std::move(temp_client.m_conn)));
return sync_service;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unique_ptr<SyncService> sync_service;
sync_service.reset(new SyncService(std::move(temp_client.m_conn)));
return sync_service;
return std::make_unique<SyncService>(std::move(temp_client.m_conn));

@dmpots dmpots requested a review from jeffreytan81 June 26, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants