Skip to content

[lldb][AIX] Added XCOFF ParseSymtab handling #141577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DhruvSrivastavaX
Copy link
Contributor

This PR is in reference to porting LLDB on AIX.

Link to discussions on llvm discourse and github:

  1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640
  2. Extending LLDB to work on AIX #101657
    The complete changes for porting are present in this draft PR:
    Extending LLDB to work on AIX #102601

Description:
Adding ParseSymtab logic after creating sections. It is able to handle both 32 and 64 bit symbols,
without the need to add template logic.

This is an incremental PR on top of my previous couple of XCOFF support commits.

@llvmbot
Copy link
Member

llvmbot commented May 27, 2025

@llvm/pr-subscribers-lldb

Author: Dhruv Srivastava (DhruvSrivastavaX)

Changes

This PR is in reference to porting LLDB on AIX.

Link to discussions on llvm discourse and github:

  1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640
  2. Extending LLDB to work on AIX #101657
    The complete changes for porting are present in this draft PR:
    Extending LLDB to work on AIX #102601

Description:
Adding ParseSymtab logic after creating sections. It is able to handle both 32 and 64 bit symbols,
without the need to add template logic.

This is an incremental PR on top of my previous couple of XCOFF support commits.


Full diff: https://github.com/llvm/llvm-project/pull/141577.diff

1 Files Affected:

  • (modified) lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp (+65-1)
diff --git a/lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp b/lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp
index e629355cd40b9..7dfe6c362add4 100644
--- a/lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp
+++ b/lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp
@@ -188,7 +188,71 @@ AddressClass ObjectFileXCOFF::GetAddressClass(addr_t file_addr) {
   return AddressClass::eUnknown;
 }
 
-void ObjectFileXCOFF::ParseSymtab(Symtab &lldb_symtab) {}
+lldb::SymbolType MapSymbolType(llvm::object::SymbolRef::Type sym_type) {
+  if (sym_type == llvm::object::SymbolRef::ST_Function)
+    return lldb::eSymbolTypeCode;
+  else if (sym_type == llvm::object::SymbolRef::ST_Data)
+    return lldb::eSymbolTypeData;
+  else if (sym_type == llvm::object::SymbolRef::ST_File)
+    return lldb::eSymbolTypeSourceFile;
+  return lldb::eSymbolTypeInvalid;
+}
+
+void ObjectFileXCOFF::ParseSymtab(Symtab &lldb_symtab) {
+  SectionList *sectionList = GetSectionList();
+
+  for (const auto &symbol_ref : m_binary->symbols()) {
+    llvm::object::XCOFFSymbolRef xcoff_sym_ref(symbol_ref);
+    llvm::Expected<llvm::StringRef> name_or_err = xcoff_sym_ref.getName();
+    if (!name_or_err) {
+      consumeError(name_or_err.takeError());
+      continue;
+    }
+    llvm::StringRef symbolName = name_or_err.get();
+    // Remove the dot prefix for demangle
+    llvm::StringRef symbol_name =
+        symbolName.starts_with(".") ? symbolName.drop_front() : symbolName;
+    auto storageClass = xcoff_sym_ref.getStorageClass();
+    if (storageClass == XCOFF::C_HIDEXT && symbolName != "TOC") {
+      if (xcoff_sym_ref.getNumberOfAuxEntries() != 1)
+        continue;
+      auto aux_csect_or_err = xcoff_sym_ref.getXCOFFCsectAuxRef();
+      if (!aux_csect_or_err) {
+        consumeError(aux_csect_or_err.takeError());
+        continue;
+      }
+      const llvm::object::XCOFFCsectAuxRef csect_aux = aux_csect_or_err.get();
+      if (csect_aux.getStorageMappingClass() != XCOFF::XMC_PR ||
+          (m_binary->is64Bit() ? (csect_aux.getAuxType64() != XCOFF::AUX_CSECT)
+                               : false))
+        continue;
+    }
+
+    Symbol symbol;
+    symbol.GetMangled().SetValue(ConstString(symbol_name));
+
+    int16_t sectionNumber = xcoff_sym_ref.getSectionNumber();
+    size_t sectionIndex = static_cast<size_t>(sectionNumber - 1);
+    if (sectionNumber > 0 && sectionIndex < sectionList->GetSize()) {
+      lldb::SectionSP section_sp =
+          sectionList->GetSectionAtIndex(sectionNumber - 1);
+      if (!section_sp || section_sp->GetFileAddress() == LLDB_INVALID_ADDRESS)
+        continue;
+      lldb::addr_t file_addr = section_sp->GetFileAddress();
+      lldb::addr_t symbolValue = xcoff_sym_ref.getValue();
+      if (symbolValue < file_addr)
+        continue;
+      symbol.GetAddressRef() = Address(section_sp, symbolValue - file_addr);
+    }
+
+    Expected<llvm::object::SymbolRef::Type> sym_type_or_err =
+        symbol_ref.getType();
+    symbol.SetType(MapSymbolType(sym_type_or_err.get()));
+    printf("%s %d\n", symbol.GetName(), *sym_type_or_err);
+
+    lldb_symtab.AddSymbol(symbol);
+  }
+}
 
 bool ObjectFileXCOFF::IsStripped() { return false; }
 

@DhruvSrivastavaX
Copy link
Contributor Author

DhruvSrivastavaX commented May 27, 2025

Hi @labath @DavidSpickett
I have added the logic for ParseSymtab, but while trying to modify my test cases, I noticed that lldb-test is unable to emit Symbol data.
Is there any other way that i can use to add test for this code or am I missing something?
Although I did verify via prompt, that my symbols are getting added in the symtab.

@DhruvSrivastavaX DhruvSrivastavaX force-pushed the aix-xcoff-parsesymtab branch from 3b425c9 to 21f5a39 Compare May 28, 2025 12:42
Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems okay. I don't know anything about the format, so I don't know which parts are "obvious" and which not, but I'd encourage you to add comments where you think it might be useful.

You're right that lldb-test doesn't dump symbols. You should be able to test these changes with something like lldb %t -o "image dump symtab", but if you find that is missing some crucial piece of information, it would also be fine to extend lldb-test to dump the data you need.

lldb::SymbolType MapSymbolType(llvm::object::SymbolRef::Type sym_type) {
if (sym_type == llvm::object::SymbolRef::ST_Function)
return lldb::eSymbolTypeCode;
else if (sym_type == llvm::object::SymbolRef::ST_Data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be written as:

switch (symboltype):
case foo: return bla;
...
default: ...

Comment on lines 225 to 228
if (csect_aux.getStorageMappingClass() != XCOFF::XMC_PR ||
(m_binary->is64Bit() ? (csect_aux.getAuxType64() != XCOFF::AUX_CSECT)
: false))
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd split this into two conditions:

if (csect_aux.getStorageMappingClass() != XCOFF::XMC_PR)
  continue;
if (m_binary->is64Bit() && csect_aux.getAuxType64() != XCOFF::AUX_CSECT)
  continue;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, my reaction was "this looks very clever but needs to be refactored" :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my effort to avoid another continue 🙂 .
But I have modified that now.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit bothered by how many continue are here, but they seem to be here for good reasons.

Once you've addressed this round of comments I'll read it again and see if it's clearer to me.

@@ -188,7 +188,74 @@ AddressClass ObjectFileXCOFF::GetAddressClass(addr_t file_addr) {
return AddressClass::eUnknown;
}

void ObjectFileXCOFF::ParseSymtab(Symtab &lldb_symtab) {}
lldb::SymbolType MapSymbolType(llvm::object::SymbolRef::Type sym_type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this static, because it's only used in this file.

lldb::SymbolType MapSymbolType(llvm::object::SymbolRef::Type sym_type) {
if (sym_type == llvm::object::SymbolRef::ST_Function)
return lldb::eSymbolTypeCode;
else if (sym_type == llvm::object::SymbolRef::ST_Data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be written as:

switch (symboltype):
case foo: return bla;
...
default: ...

if (!name_or_err) {
consumeError(name_or_err.takeError());
continue;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of continue is ok but I find it harder to read when there is:

continue;
immediately more code here

Please put a blank line after each continue or block that contains the continue, like:

if (bla) {
  continue;
}

rest of code goes here

continue;
}
llvm::StringRef symbolName = name_or_err.get();
// Remove the dot prefix for demangle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dot prefix is what exactly, is it an expectation of XCOFF in general? Seems like only some symbols will have it.

You say "for demangle" but it's not clear where that happens or if you mean remove it so that some other thing later can do it. If it does happen in this function perhaps clarify:

// Remove the dot prefix so that we can demangle it.

Now the link between action and purpose is clearer.

Comment on lines 225 to 228
if (csect_aux.getStorageMappingClass() != XCOFF::XMC_PR ||
(m_binary->is64Bit() ? (csect_aux.getAuxType64() != XCOFF::AUX_CSECT)
: false))
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, my reaction was "this looks very clever but needs to be refactored" :)

Symbol symbol;
symbol.GetMangled().SetValue(ConstString(symbol_name));

int16_t sectionNumber = xcoff_sym_ref.getSectionNumber();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is defined by some XCOFF standard to start at 1? So the line below isn't going to be doing 0 - 1, at minimum it will be 1-1.

I would add a clarifying comment here:

// Note that XCOFF section numbers start from 1.

Assuming that's correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But later you say:

if (sectionNumber > 0 

and it's a signed int, so it could be negative.

So if it can be 0, then sectionIndex could become 0xffffffffffffffff. Now maybe it doesn't get used in that scenario but if it did, it would not do good things.

I would move the if sectionNumber > 0 up to before you calculate the section index. Then you aren't creating random sectionIndex values.

Plus, it should be clearer then what range of values we expect from sectionNumber.

size_t sectionIndex = static_cast<size_t>(sectionNumber - 1);
if (sectionNumber > 0 && sectionIndex < sectionList->GetSize()) {
lldb::SectionSP section_sp =
sectionList->GetSectionAtIndex(sectionNumber - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sectionNumber -1 is just the sectionIndex you already made, I think the parameter type is also size_t so you can just use it here.

@DhruvSrivastavaX
Copy link
Contributor Author

@labath

The code seems okay. I don't know anything about the format, so I don't know which parts are "obvious" and which not, but I'd encourage you to add comments where you think it might be useful.

Sure, I have added some comments for that now.

You're right that lldb-test doesn't dump symbols. You should be able to test these changes with something like lldb %t -o "image dump symtab", but if you find that is missing some crucial piece of information, it would also be fine to extend lldb-test to dump the data you need

Yes, I had checked it using image dump symtab itself, my symbols do get added.
As for the lldb-test not emitting symbols, I will create an issue for it so that we can start upgrading lldb-test as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants