-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb] make PlatformAndroid/AdbClient::GetSyncService threadsafe #145382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lldb Author: Chad Smith (cs01) ChangesAdbClient and PlatformAndroid GetSyncService is not threadsafe. This was not an issue because it was (apparently) not used in a threaded environment. However when the new setting
was added, it began being called from multiple threads to load modules simultaneously. Setting it to True triggers crashes, False stops it from crashing. The top of the stack in the crash looked something like this:
Our workaround was to set it to False, but the root cause was not fixed. This PR fixes the root cause by creating a new connection for each Adb SyncService. Full diff: https://github.com/llvm/llvm-project/pull/145382.diff 5 Files Affected:
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.cpp b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
index a179260ca15f6..4c5342f7af60b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.cpp
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
@@ -417,13 +417,30 @@ Status AdbClient::ShellToFile(const char *command, milliseconds timeout,
return Status();
}
+// Returns a sync service for file operations.
+// This operation is thread-safe - each call creates an isolated sync service
+// with its own connection to avoid race conditions.
std::unique_ptr<AdbClient::SyncService>
AdbClient::GetSyncService(Status &error) {
- std::unique_ptr<SyncService> sync_service;
- error = StartSync();
- if (error.Success())
- sync_service.reset(new SyncService(std::move(m_conn)));
+ std::lock_guard<std::mutex> lock(m_sync_mutex);
+
+ // Create a temporary AdbClient with its own connection for this sync service
+ // This avoids the race condition of multiple threads accessing the same
+ // connection
+ AdbClient temp_client(m_device_id);
+
+ // Connect and start sync on the temporary client
+ error = temp_client.Connect();
+ if (error.Fail())
+ return nullptr;
+ error = temp_client.StartSync();
+ if (error.Fail())
+ return nullptr;
+
+ // Move the connection from the temporary client to the sync service
+ std::unique_ptr<SyncService> sync_service;
+ sync_service.reset(new SyncService(std::move(temp_client.m_conn)));
return sync_service;
}
@@ -487,7 +504,9 @@ Status AdbClient::SyncService::internalPushFile(const FileSpec &local_file,
error.AsCString());
}
error = SendSyncRequest(
- kDONE, llvm::sys::toTimeT(FileSystem::Instance().GetModificationTime(local_file)),
+ kDONE,
+ llvm::sys::toTimeT(
+ FileSystem::Instance().GetModificationTime(local_file)),
nullptr);
if (error.Fail())
return error;
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.h b/lldb/source/Plugins/Platform/Android/AdbClient.h
index 851c09957bd4a..9ef780bc6202b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.h
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.h
@@ -14,6 +14,7 @@
#include <functional>
#include <list>
#include <memory>
+#include <mutex>
#include <string>
#include <vector>
@@ -135,6 +136,7 @@ class AdbClient {
std::string m_device_id;
std::unique_ptr<Connection> m_conn;
+ mutable std::mutex m_sync_mutex;
};
} // namespace platform_android
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
index 5bc9cc133fbd3..68e4886cb1c7a 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
@@ -474,13 +474,11 @@ std::string PlatformAndroid::GetRunAs() {
return run_as.str();
}
-AdbClient::SyncService *PlatformAndroid::GetSyncService(Status &error) {
- if (m_adb_sync_svc && m_adb_sync_svc->IsConnected())
- return m_adb_sync_svc.get();
-
+std::unique_ptr<AdbClient::SyncService>
+PlatformAndroid::GetSyncService(Status &error) {
AdbClientUP adb(GetAdbClient(error));
if (error.Fail())
return nullptr;
- m_adb_sync_svc = adb->GetSyncService(error);
- return (error.Success()) ? m_adb_sync_svc.get() : nullptr;
+
+ return adb->GetSyncService(error);
}
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
index 5602edf73c1d3..3140acb573416 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
@@ -80,9 +80,7 @@ class PlatformAndroid : public platform_linux::PlatformLinux {
std::string GetRunAs();
private:
- AdbClient::SyncService *GetSyncService(Status &error);
-
- std::unique_ptr<AdbClient::SyncService> m_adb_sync_svc;
+ std::unique_ptr<AdbClient::SyncService> GetSyncService(Status &error);
std::string m_device_id;
uint32_t m_sdk_version;
};
diff --git a/lldb/unittests/Platform/Android/AdbClientTest.cpp b/lldb/unittests/Platform/Android/AdbClientTest.cpp
index 0808b96f69fc8..c2f658b9d1bc1 100644
--- a/lldb/unittests/Platform/Android/AdbClientTest.cpp
+++ b/lldb/unittests/Platform/Android/AdbClientTest.cpp
@@ -6,9 +6,13 @@
//
//===----------------------------------------------------------------------===//
-#include "gtest/gtest.h"
#include "Plugins/Platform/Android/AdbClient.h"
+#include "gtest/gtest.h"
+#include <atomic>
#include <cstdlib>
+#include <future>
+#include <thread>
+#include <vector>
static void set_env(const char *var, const char *value) {
#ifdef _WIN32
@@ -31,14 +35,14 @@ class AdbClientTest : public ::testing::Test {
void TearDown() override { set_env("ANDROID_SERIAL", ""); }
};
-TEST(AdbClientTest, CreateByDeviceId) {
+TEST_F(AdbClientTest, CreateByDeviceId) {
AdbClient adb;
Status error = AdbClient::CreateByDeviceID("device1", adb);
EXPECT_TRUE(error.Success());
EXPECT_EQ("device1", adb.GetDeviceID());
}
-TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
+TEST_F(AdbClientTest, CreateByDeviceId_ByEnvVar) {
set_env("ANDROID_SERIAL", "device2");
AdbClient adb;
@@ -47,5 +51,116 @@ TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
EXPECT_EQ("device2", adb.GetDeviceID());
}
+TEST_F(AdbClientTest, GetSyncServiceThreadSafe) {
+ // Test high-volume concurrent access to GetSyncService
+ // This test verifies thread safety under sustained load with many rapid calls
+ // Catches race conditions that emerge when multiple threads make repeated
+ // calls to GetSyncService on the same AdbClient instance
+
+ AdbClient shared_adb_client("test_device");
+
+ const int num_threads = 8;
+ std::vector<std::future<bool>> futures;
+ std::atomic<int> success_count{0};
+ std::atomic<int> null_count{0};
+
+ // Launch multiple threads that all call GetSyncService on the SAME AdbClient
+ for (int i = 0; i < num_threads; ++i) {
+ futures.push_back(std::async(std::launch::async, [&]() {
+ // Multiple rapid calls to trigger the race condition
+ for (int j = 0; j < 20; ++j) {
+ Status error;
+
+ auto sync_service = shared_adb_client.GetSyncService(error);
+
+ if (sync_service != nullptr) {
+ success_count++;
+ } else {
+ null_count++;
+ }
+
+ // Small delay to increase chance of hitting the race condition
+ std::this_thread::sleep_for(std::chrono::microseconds(1));
+ }
+ return true;
+ }));
+ }
+
+ // Wait for all threads to complete
+ bool all_completed = true;
+ for (auto &future : futures) {
+ bool thread_result = future.get();
+ if (!thread_result) {
+ all_completed = false;
+ }
+ }
+
+ // This should pass (though sync services may fail
+ // to connect)
+ EXPECT_TRUE(all_completed) << "Parallel GetSyncService calls should not "
+ "crash due to race conditions. "
+ << "Successes: " << success_count.load()
+ << ", Nulls: " << null_count.load();
+
+ // The key test: we should complete all operations without crashing
+ int total_operations = num_threads * 20;
+ int completed_operations = success_count.load() + null_count.load();
+ EXPECT_EQ(total_operations, completed_operations)
+ << "All operations should complete without crashing";
+}
+
+TEST_F(AdbClientTest, ConnectionMoveRaceCondition) {
+ // Test simultaneous access timing to GetSyncService
+ // This test verifies thread safety when multiple threads start at exactly
+ // the same time, maximizing the chance of hitting precise timing conflicts
+ // Catches race conditions that occur with synchronized simultaneous access
+
+ AdbClient adb_client("test_device");
+
+ // Try to trigger the exact race condition by having multiple threads
+ // simultaneously call GetSyncService
+
+ std::atomic<bool> start_flag{false};
+ std::vector<std::thread> threads;
+ std::atomic<int> null_service_count{0};
+ std::atomic<int> valid_service_count{0};
+
+ const int num_threads = 10;
+
+ // Create threads that will all start simultaneously
+ for (int i = 0; i < num_threads; ++i) {
+ threads.emplace_back([&]() {
+ // Wait for all threads to be ready
+ while (!start_flag.load()) {
+ std::this_thread::yield();
+ }
+
+ Status error;
+ auto sync_service = adb_client.GetSyncService(error);
+
+ if (sync_service == nullptr) {
+ null_service_count++;
+ } else {
+ valid_service_count++;
+ }
+ });
+ }
+
+ // Start all threads simultaneously to maximize chance of race condition
+ start_flag.store(true);
+
+ // Wait for all threads to complete
+ for (auto &thread : threads) {
+ thread.join();
+ }
+
+ // The test passes if we don't crash
+ int total_results = null_service_count.load() + valid_service_count.load();
+ EXPECT_EQ(num_threads, total_results)
+ << "All threads should complete without crashing. "
+ << "Null services: " << null_service_count.load()
+ << ", Valid services: " << valid_service_count.load();
+}
+
} // end namespace platform_android
} // end namespace lldb_private
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one way to fix the problem. Another (maybe even more obvious) would be to use a single SyncService object but synchronize all access to it. I think your approach makes sense -- it unlocks more parallelism, though that parallelism may be a mirage if the device connection is going to be the bottleneck. And all the extra connections may add some overhead. Have you looked into how the two approaches compare?
As for the patch itself, the code makes sense, but the result looks very path-dependent (meaning: you'd never write code like this -- creating a temporary AdbClient object -- if you were writing this from scratch). It's also not optimal, as you're creating a temporary AdbClient in PlatformAndroid::GetSyncService and then creating another temporary object inside AdbClient::GetSyncService.
I think this would look better if PlatformAndroid::GetSyncService constructed a (new) SyncService directly. And the SyncService could create a temporary AdbClient object as an implementation detail (or not -- maybe it could handle all of the connection setup internally)
error = StartSync(); | ||
if (error.Success()) | ||
sync_service.reset(new SyncService(std::move(m_conn))); | ||
std::lock_guard<std::mutex> lock(m_sync_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this actually protecting?
std::unique_ptr<SyncService> sync_service; | ||
sync_service.reset(new SyncService(std::move(temp_client.m_conn))); | ||
return sync_service; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique_ptr<SyncService> sync_service; | |
sync_service.reset(new SyncService(std::move(temp_client.m_conn))); | |
return sync_service; | |
return std::make_unique<SyncService>(std::move(temp_client.m_conn)); |
Problem
PlatformAndroid::GetSyncService, which pulls files from android devices in methods like
PlatformAndroid::GetFile
, is not threadsafe. Until recently, this was not an issue because it was apparently not used in a threaded environment. However when the new settingwas added, it began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb.
The top of the stack in the crash look something like this:
Our workaround was to set
set target.parallel-module-load
tofalse
to avoid the crash. This PR fixes the root cause by creating a new connection for each Adb SyncService.Background
Android debugging with lldb relies on communication with devices via adb. lldb has an AdbClient class that is called from the PlatformAndroid code. Any time the Android platform uses AdbClient, it reuses the same connection.
Root Cause
The root cause of the crash is the single SyncService instance used and reused in AdbClient. SyncService is used to push, pull, and stat files on device using the adb protocol directly, and maintains a
ConnectionFileDescriptor
. When multiple threads attempt to do such operations on the same connection, the threads can interfere with eachother, often resulting in crashes since the connection may be destroyed in one thread but still be used in another.Fix
The fix implemented in this PR is to create a new SyncService each time one is requested by the Android platform.
A new unit test was added which induces this crash. This diff fixes the crash, but if the unit test is run against existing code the crash will occur (see test plan).
Test plan
Before
After
Unit tests now pass instead of crash