Skip to content

Read and store gnu build id from loaded core file #92078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 14, 2024
Merged

Read and store gnu build id from loaded core file #92078

merged 1 commit into from
May 14, 2024

Conversation

GeorgeHuyubo
Copy link
Contributor

As we have debuginfod as symbol locator available in lldb now, we want to make full use of it.
In case of post mortem debugging, we don't always have the main executable available.
However, the .note.gnu.build-id of the main executable(some other modules too), should be available in the core file, as those binaries are loaded in memory and dumped in the core file.

We try to iterate through the NT_FILE entries, read and store the gnu build id if possible. This will be very useful as this id is the unique key which is needed for querying the debuginfod server.

Test:
Build and run lldb. Breakpoint set to https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147
Verified after this commit, module_uuid is the correct gnu build id of the main executable which caused the crash(first in the NT_FILE entry)

@llvmbot
Copy link
Member

llvmbot commented May 14, 2024

@llvm/pr-subscribers-lldb

Author: None (GeorgeHuyubo)

Changes

As we have debuginfod as symbol locator available in lldb now, we want to make full use of it.
In case of post mortem debugging, we don't always have the main executable available.
However, the .note.gnu.build-id of the main executable(some other modules too), should be available in the core file, as those binaries are loaded in memory and dumped in the core file.

We try to iterate through the NT_FILE entries, read and store the gnu build id if possible. This will be very useful as this id is the unique key which is needed for querying the debuginfod server.

Test:
Build and run lldb. Breakpoint set to https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147
Verified after this commit, module_uuid is the correct gnu build id of the main executable which caused the crash(first in the NT_FILE entry)


Full diff: https://github.com/llvm/llvm-project/pull/92078.diff

5 Files Affected:

  • (modified) lldb/include/lldb/Target/Process.h (+48)
  • (modified) lldb/source/Commands/CommandObjectMemory.cpp (+1-58)
  • (modified) lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp (+46)
  • (modified) lldb/source/Plugins/Process/elf-core/ProcessElfCore.h (+10)
  • (modified) lldb/source/Target/Process.cpp (+27)
diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index aac0cf51680a9..989434955512f 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -406,6 +406,36 @@ class Process : public std::enable_shared_from_this<Process>,
                                   lldb::StateType state);
   } Notifications;
 
+  class ProcessMemoryIterator {
+  public:
+    ProcessMemoryIterator(lldb::ProcessSP process_sp, lldb::addr_t base)
+        : m_process_sp(process_sp), m_base_addr(base) {
+      lldbassert(process_sp.get() != nullptr);
+    }
+
+    bool IsValid() { return m_is_valid; }
+
+    uint8_t operator[](lldb::addr_t offset) {
+      if (!IsValid())
+        return 0;
+
+      uint8_t retval = 0;
+      Status error;
+      if (0 ==
+          m_process_sp->ReadMemory(m_base_addr + offset, &retval, 1, error)) {
+        m_is_valid = false;
+        return 0;
+      }
+
+      return retval;
+    }
+
+  private:
+    lldb::ProcessSP m_process_sp;
+    lldb::addr_t m_base_addr;
+    bool m_is_valid = true;
+  };
+
   class ProcessEventData : public EventData {
     friend class Process;
 
@@ -1649,6 +1679,24 @@ class Process : public std::enable_shared_from_this<Process>,
 
   lldb::addr_t ReadPointerFromMemory(lldb::addr_t vm_addr, Status &error);
 
+  /// Fast search for a string within a memory region.
+  ///
+  /// This function searches for the string represented by the provided buffer
+  /// within the memory range specified by the low and high addresses. It uses
+  /// a bad character heuristic to optimize the search process.
+  ///
+  /// \param[in] low The starting address of the memory region to be searched.
+  ///
+  /// \param[in] high The ending address of the memory region to be searched.
+  ///
+  /// \param[in] buffer A pointer to the buffer containing the string to be searched.
+  ///
+  /// \param[in] buffer_size The size of the buffer in bytes.
+  ///
+  /// \return The address where the string was found or LLDB_INVALID_ADDRESS if not found.
+  lldb::addr_t FastSearch(lldb::addr_t low, lldb::addr_t high, uint8_t *buffer,
+                          size_t buffer_size);
+
   bool WritePointerToMemory(lldb::addr_t vm_addr, lldb::addr_t ptr_value,
                             Status &error);
 
diff --git a/lldb/source/Commands/CommandObjectMemory.cpp b/lldb/source/Commands/CommandObjectMemory.cpp
index b78a0492cca55..fd37465e92c46 100644
--- a/lldb/source/Commands/CommandObjectMemory.cpp
+++ b/lldb/source/Commands/CommandObjectMemory.cpp
@@ -977,35 +977,6 @@ class CommandObjectMemoryFind : public CommandObjectParsed {
   Options *GetOptions() override { return &m_option_group; }
 
 protected:
-  class ProcessMemoryIterator {
-  public:
-    ProcessMemoryIterator(ProcessSP process_sp, lldb::addr_t base)
-        : m_process_sp(process_sp), m_base_addr(base) {
-      lldbassert(process_sp.get() != nullptr);
-    }
-
-    bool IsValid() { return m_is_valid; }
-
-    uint8_t operator[](lldb::addr_t offset) {
-      if (!IsValid())
-        return 0;
-
-      uint8_t retval = 0;
-      Status error;
-      if (0 ==
-          m_process_sp->ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-        m_is_valid = false;
-        return 0;
-      }
-
-      return retval;
-    }
-
-  private:
-    ProcessSP m_process_sp;
-    lldb::addr_t m_base_addr;
-    bool m_is_valid = true;
-  };
   void DoExecute(Args &command, CommandReturnObject &result) override {
     // No need to check "process" for validity as eCommandRequiresProcess
     // ensures it is valid
@@ -1106,7 +1077,7 @@ class CommandObjectMemoryFind : public CommandObjectParsed {
     found_location = low_addr;
     bool ever_found = false;
     while (count) {
-      found_location = FastSearch(found_location, high_addr, buffer.GetBytes(),
+      found_location = process->FastSearch(found_location, high_addr, buffer.GetBytes(),
                                   buffer.GetByteSize());
       if (found_location == LLDB_INVALID_ADDRESS) {
         if (!ever_found) {
@@ -1144,34 +1115,6 @@ class CommandObjectMemoryFind : public CommandObjectParsed {
     result.SetStatus(lldb::eReturnStatusSuccessFinishResult);
   }
 
-  lldb::addr_t FastSearch(lldb::addr_t low, lldb::addr_t high, uint8_t *buffer,
-                          size_t buffer_size) {
-    const size_t region_size = high - low;
-
-    if (region_size < buffer_size)
-      return LLDB_INVALID_ADDRESS;
-
-    std::vector<size_t> bad_char_heuristic(256, buffer_size);
-    ProcessSP process_sp = m_exe_ctx.GetProcessSP();
-    ProcessMemoryIterator iterator(process_sp, low);
-
-    for (size_t idx = 0; idx < buffer_size - 1; idx++) {
-      decltype(bad_char_heuristic)::size_type bcu_idx = buffer[idx];
-      bad_char_heuristic[bcu_idx] = buffer_size - idx - 1;
-    }
-    for (size_t s = 0; s <= (region_size - buffer_size);) {
-      int64_t j = buffer_size - 1;
-      while (j >= 0 && buffer[j] == iterator[s + j])
-        j--;
-      if (j < 0)
-        return low + s;
-      else
-        s += bad_char_heuristic[iterator[s + buffer_size - 1]];
-    }
-
-    return LLDB_INVALID_ADDRESS;
-  }
-
   OptionGroupOptions m_option_group;
   OptionGroupFindMemory m_memory_options;
   OptionGroupMemoryTag m_memory_tag_options;
diff --git a/lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp b/lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp
index 36812c27a5b6d..7dd25774088ce 100644
--- a/lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp
+++ b/lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp
@@ -210,6 +210,9 @@ Status ProcessElfCore::DoLoadCore() {
     }
   }
 
+  // We need to update uuid after address range is populated.
+  UpdateBuildIdForNTFileEntries();
+
   if (!ranges_are_sorted) {
     m_core_aranges.Sort();
     m_core_range_infos.Sort();
@@ -258,6 +261,7 @@ Status ProcessElfCore::DoLoadCore() {
     if (!m_nt_file_entries.empty()) {
       ModuleSpec exe_module_spec;
       exe_module_spec.GetArchitecture() = arch;
+      exe_module_spec.GetUUID() = m_nt_file_entries[0].uuid;
       exe_module_spec.GetFileSpec().SetFile(m_nt_file_entries[0].path,
                                             FileSpec::Style::native);
       if (exe_module_spec.GetFileSpec()) {
@@ -271,6 +275,14 @@ Status ProcessElfCore::DoLoadCore() {
   return error;
 }
 
+void ProcessElfCore::UpdateBuildIdForNTFileEntries() {
+  if (!m_nt_file_entries.empty()) {
+    for (NT_FILE_Entry &entry : m_nt_file_entries) {
+      entry.uuid = FindBuildId(entry);
+    }
+  }
+}
+
 lldb_private::DynamicLoader *ProcessElfCore::GetDynamicLoader() {
   if (m_dyld_up.get() == nullptr)
     m_dyld_up.reset(DynamicLoader::FindPlugin(
@@ -983,6 +995,40 @@ llvm::Error ProcessElfCore::ParseThreadContextsFromNoteSegment(
   }
 }
 
+bool ProcessElfCore::IsElf(const NT_FILE_Entry entry) {
+  const uint8_t elf_header[4] = {0x7f, 0x45, 0x4c, 0x46}; // ELF file begin with this 4 bytes
+  uint8_t buf[4];
+  Status error;
+  size_t byte_read = ReadMemory(entry.start, buf, 4, error);
+  if (byte_read == 4) {
+    return memcmp(elf_header, buf, 4) == 0;
+  } else{
+    return false;
+  }
+}
+
+UUID ProcessElfCore::FindBuildId(const NT_FILE_Entry entry) {
+  if(!IsElf(entry)) {
+    return UUID();
+  }
+  // Build ID is stored in the ELF file as a section named ".note.gnu.build-id"
+  uint8_t gnu_build_id_bytes[8] = {0x03, 0x00, 0x00, 0x00, 0x47, 0x4e, 0x55, 0x00};
+  lldb::addr_t gnu_build_id_addr = FastSearch(entry.start, entry.end , gnu_build_id_bytes, 8);
+  if (load_addr == LLDB_INVALID_ADDRESS) {
+    return UUID();
+  }
+  uint8_t buf[36];
+  Status error;
+  size_t byte_read = ReadMemory(gnu_build_id_addr - 8, buf, 36, error);
+  // .note.gnu.build-id starts with 04 00 00 00 {id_byte_size} 00 00 00 03 00 00 00 47 4e 55 00 
+  if (byte_read == 36) {
+    if (buf[0] == 0x04) {
+      return UUID(llvm::ArrayRef<uint8_t>(buf + 16, buf[4]/*byte size*/));
+    }
+  }
+  return UUID();
+}
+
 uint32_t ProcessElfCore::GetNumThreadContexts() {
   if (!m_thread_data_valid)
     DoLoadCore();
diff --git a/lldb/source/Plugins/Process/elf-core/ProcessElfCore.h b/lldb/source/Plugins/Process/elf-core/ProcessElfCore.h
index 2cec635bbacfe..3d78991ed57c2 100644
--- a/lldb/source/Plugins/Process/elf-core/ProcessElfCore.h
+++ b/lldb/source/Plugins/Process/elf-core/ProcessElfCore.h
@@ -117,6 +117,7 @@ class ProcessElfCore : public lldb_private::PostMortemProcess {
     lldb::addr_t end;
     lldb::addr_t file_ofs;
     std::string path;
+    lldb_private::UUID uuid; // extracted from .note.gnu.build-id section from core file
   };
 
   // For ProcessElfCore only
@@ -158,6 +159,15 @@ class ProcessElfCore : public lldb_private::PostMortemProcess {
   // Returns number of thread contexts stored in the core file
   uint32_t GetNumThreadContexts();
 
+  // Populate gnu uuid for each NT_FILE entry
+  void UpdateBuildIdForNTFileEntries();
+
+  // Returns the UUID of a given NT_FILE entry
+  lldb_private::UUID FindBuildId(const NT_FILE_Entry entry);
+
+  // Returns true if the given NT_FILE entry is an ELF file
+  bool IsElf(const NT_FILE_Entry entry);
+
   // Parse a contiguous address range of the process from LOAD segment
   lldb::addr_t
   AddAddressRangeFromLoadSegment(const elf::ELFProgramHeader &header);
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index 25afade9a8275..7f8a19c7be3ed 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -3191,6 +3191,33 @@ Status Process::Halt(bool clear_thread_plans, bool use_run_lock) {
   return Status();
 }
 
+lldb::addr_t Process::FastSearch(lldb::addr_t low, lldb::addr_t high, uint8_t *buffer,
+                          size_t buffer_size) {
+  const size_t region_size = high - low;
+
+  if (region_size < buffer_size)
+    return LLDB_INVALID_ADDRESS;
+
+  std::vector<size_t> bad_char_heuristic(256, buffer_size);
+  ProcessMemoryIterator iterator(shared_from_this(), low);
+
+  for (size_t idx = 0; idx < buffer_size - 1; idx++) {
+    decltype(bad_char_heuristic)::size_type bcu_idx = buffer[idx];
+    bad_char_heuristic[bcu_idx] = buffer_size - idx - 1;
+  }
+  for (size_t s = 0; s <= (region_size - buffer_size);) {
+    int64_t j = buffer_size - 1;
+    while (j >= 0 && buffer[j] == iterator[s + j])
+      j--;
+    if (j < 0)
+      return low + s;
+    else
+      s += bad_char_heuristic[iterator[s + buffer_size - 1]];
+  }
+
+  return LLDB_INVALID_ADDRESS;
+}
+
 Status Process::StopForDestroyOrDetach(lldb::EventSP &exit_event_sp) {
   Status error;
 

Copy link

github-actions bot commented May 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@labath
Copy link
Collaborator

labath commented May 14, 2024

Can we make this less brute force? I believe searching for the Build ID Note should be a completely deterministic process, without the need for heuristics. You start with the elf header, find the program headers, iterate to find the PT_NOTE segment (there could be more of them), and iterate over the notes until you find NT_GNU_BUILD_ID. All of these things should be in the memory (=> also in the core file) and within the first 4k of the mapping. There should be no need to search through a potentially huge file mapping just to see if it contains the build id.

@JDevlieghere JDevlieghere requested a review from labath May 14, 2024 14:36
Comment on lines -995 to -998
if (0 ==
m_process_sp->ReadMemory(m_base_addr + offset, &retval, 1, error)) {
m_is_valid = false;
return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is efficient only because our process caches memory on its own... Otherwise reading memory 1 byte at a time would be more expensive. Not sure if we can get any speed up by reading more than one byte at a time and then using our own internal buffer.

@GeorgeHuyubo
Copy link
Contributor Author

Can we make this less brute force? I believe searching for the Build ID Note should be a completely deterministic process, without the need for heuristics. You start with the elf header, find the program headers, iterate to find the PT_NOTE segment (there could be more of them), and iterate over the notes until you find NT_GNU_BUILD_ID. All of these things should be in the memory (=> also in the core file) and within the first 4k of the mapping. There should be no need to search through a potentially huge file mapping just to see if it contains the build id.

Here is a real life binary:
0x7f2e7b6a5000+0x4000 b3bd5db2152ddae68640e781e1a6c039@0x7f2e7b6a7280 - - /tmp/hhvm_member_reflection_0d17d1c87f353d813d3e6b43e6db1d59e428c014_HFrPOT

so the .note.gnu.build-id is found at 0x7f2e7b6a7280 which is the file start address 0x7f2e7b6a5000 plus 0x2280(8832 bytes in decimal).
I don't know what is the precise search range limit which can guarantee us to find the NT_GNU_BUILD_ID. Open for suggestion.

@GeorgeHuyubo GeorgeHuyubo requested a review from clayborg May 14, 2024 21:33
@GeorgeHuyubo GeorgeHuyubo merged commit 536abf8 into llvm:main May 14, 2024
3 of 4 checks passed
GeorgeHuyubo added a commit that referenced this pull request May 14, 2024
@@ -983,6 +999,40 @@ llvm::Error ProcessElfCore::ParseThreadContextsFromNoteSegment(
}
}

bool ProcessElfCore::IsElf(const NT_FILE_Entry entry) {
size_t size = strlen(llvm::ELF::ElfMagic);
uint8_t buf[size];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15: error: variable length arrays in C++ are a Clang extension [-Werror,-Wvla-cxx-extension]
 1004 |   uint8_t buf[size];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can allocate a buffer of 4 and read it.
Can ELFHeader::MagicBytesMatch be used here?

@clayborg
Copy link
Collaborator

Can we make this less brute force? I believe searching for the Build ID Note should be a completely deterministic process, without the need for heuristics. You start with the elf header, find the program headers, iterate to find the PT_NOTE segment (there could be more of them), and iterate over the notes until you find NT_GNU_BUILD_ID. All of these things should be in the memory (=> also in the core file) and within the first 4k of the mapping. There should be no need to search through a potentially huge file mapping just to see if it contains the build id.

Yes, we can parse the program headers. That is a good idea as long as all the info we need is mapped

@GeorgeHuyubo
Copy link
Contributor Author

Gonna re-open this in a new PR

@labath
Copy link
Collaborator

labath commented May 15, 2024

Can we make this less brute force? I believe searching for the Build ID Note should be a completely deterministic process, without the need for heuristics. You start with the elf header, find the program headers, iterate to find the PT_NOTE segment (there could be more of them), and iterate over the notes until you find NT_GNU_BUILD_ID. All of these things should be in the memory (=> also in the core file) and within the first 4k of the mapping. There should be no need to search through a potentially huge file mapping just to see if it contains the build id.

Yes, we can parse the program headers. That is a good idea as long as all the info we need is mapped

It should all be there. In the files I looked at, both the program headers and all the note segments are in the first page of the mapping (which is the page containing the elf header). I can't find a reference for that now but I remember reading somewhere that this was intentional (to ensure the build id survives).

BTW, we have some elf parsing code in NativeProcessELF::GetELFImageInfoAddress(), that you could draw inspiration from.

GeorgeHuyubo added a commit that referenced this pull request May 24, 2024
As we have debuginfod as symbol locator available in lldb now, we want
to make full use of it.
In case of post mortem debugging, we don't always have the main
executable available.
However, the .note.gnu.build-id of the main executable(some other
modules too), should be available in the core file, as those binaries
are loaded in memory and dumped in the core file.

We try to iterate through the NT_FILE entries, read and store the gnu
build id if possible. This will be very useful as this id is the unique
key which is needed for querying the debuginfod server.

Test:
Build and run lldb. Breakpoint set to
https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147
Verified after this commit, module_uuid is the correct gnu build id of
the main executable which caused the crash(first in the NT_FILE entry)

Previous PR: #92078 was
mistakenly merged. This PR is re-opening the commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants