Skip to content

[lldb] Don't create instance of SymbolFileDWARFDebugMap for non-Mach-O files #139170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

royitaqi
Copy link
Contributor

@royitaqi royitaqi commented May 8, 2025

Change

SymbolFileDWARFDebugMap::CreateInstance() will return nullptr if the file is not a Mach-O.

Benefit

This may improve Linux debugger launch time by skipping the creation of SymbolFileDWARFDebugMap during the SymbolFile::FindPlugin() call, which loops through a list of SymbolFile plugins and tries to find the one that provides the best abilities. If the SymbolFileDWARFDebugMap is created during this loop, it will load the symbol table of the file in question and loop through all the compile units in the debug map (the OSO entries) to calculate the abilities.

Tests

Two unittests are added to verify the creation and the absence of SymbolFileDWARFDebugMap for Mach-O and non-Mach-O files, respectively.

Ran ninja check-lldb on both Darwin and Linux. No regression spotted. See results compared for mac and linux.

@royitaqi royitaqi requested a review from JDevlieghere as a code owner May 8, 2025 22:59
@llvmbot llvmbot added the lldb label May 8, 2025
@royitaqi royitaqi changed the title Don't create instance of SymbolFileDWARFDebugMap for non-Mach-O files [lldb] Don't create instance of SymbolFileDWARFDebugMap for non-Mach-O files May 8, 2025
@llvmbot
Copy link
Member

llvmbot commented May 8, 2025

@llvm/pr-subscribers-lldb

Author: None (royitaqi)

Changes

Change

SymbolFileDWARFDebugMap::CreateInstance() will return nullptr if the file is not a Mach-O.

Benefit

This may improve Linux debugger launch time by skipping the creation of SymbolFileDWARFDebugMap during the SymbolFile::FindPlugin() call, which loops through a list of SymbolFile plugins and tries to find the one that provides the best abilities. If the SymbolFileDWARFDebugMap is created during this loop, it will load the symbol table of the file in question and loop through all the compile units in the debug map (the OSO entries) to calculate the abilities.

Tests

Two tests are added to verify the creation and the absence for Mach-O and non-Mach-O files.

Ran ninja check-lldb on both Darwin and Linux. No regression spotted. See results compared for mac and linux.


Full diff: https://github.com/llvm/llvm-project/pull/139170.diff

3 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp (+3)
  • (modified) lldb/unittests/SymbolFile/DWARF/CMakeLists.txt (+1)
  • (added) lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFDebugMapTests.cpp (+142)
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
index 961c212e2e6dc..1793c23950d55 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
@@ -38,6 +38,7 @@
 #include "lldb/Target/StackFrame.h"
 
 #include "LogChannelDWARF.h"
+#include "Plugins/ObjectFile/Mach-O/ObjectFileMachO.h"
 #include "SymbolFileDWARF.h"
 #include "lldb/lldb-private-enumerations.h"
 
@@ -246,6 +247,8 @@ llvm::StringRef SymbolFileDWARFDebugMap::GetPluginDescriptionStatic() {
 }
 
 SymbolFile *SymbolFileDWARFDebugMap::CreateInstance(ObjectFileSP objfile_sp) {
+  if (objfile_sp->GetPluginName() != ObjectFileMachO::GetPluginNameStatic())
+    return nullptr;
   return new SymbolFileDWARFDebugMap(std::move(objfile_sp));
 }
 
diff --git a/lldb/unittests/SymbolFile/DWARF/CMakeLists.txt b/lldb/unittests/SymbolFile/DWARF/CMakeLists.txt
index d5b0be7ea2a28..5aacb24fc5206 100644
--- a/lldb/unittests/SymbolFile/DWARF/CMakeLists.txt
+++ b/lldb/unittests/SymbolFile/DWARF/CMakeLists.txt
@@ -4,6 +4,7 @@ add_lldb_unittest(SymbolFileDWARFTests
   DWARFDIETest.cpp
   DWARFIndexCachingTest.cpp
   DWARFUnitTest.cpp
+  SymbolFileDWARFDebugMapTests.cpp
   SymbolFileDWARFTests.cpp
   XcodeSDKModuleTests.cpp
 
diff --git a/lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFDebugMapTests.cpp b/lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFDebugMapTests.cpp
new file mode 100644
index 0000000000000..8c65fca889a40
--- /dev/null
+++ b/lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFDebugMapTests.cpp
@@ -0,0 +1,142 @@
+//===-- SymbolFileDWARFDebugMapTests.cpp ----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "Plugins/ObjectFile/ELF/ObjectFileELF.h"
+#include "Plugins/ObjectFile/Mach-O/ObjectFileMachO.h"
+#include "Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.h"
+#include "TestingSupport/SubsystemRAII.h"
+#include "TestingSupport/TestUtilities.h"
+
+#include "lldb/Core/Module.h"
+#include "llvm/Testing/Support/Error.h"
+
+#include "gtest/gtest.h"
+
+using namespace lldb;
+using namespace lldb_private;
+using namespace lldb_private::plugin::dwarf;
+
+class SymbolFileDWARFDebugMapTests : public testing::Test {
+  SubsystemRAII<ObjectFileELF, ObjectFileMachO> subsystems;
+};
+
+TEST_F(SymbolFileDWARFDebugMapTests, CreateInstanceReturnNonNullForMachOFile) {
+  // Make sure we don't crash parsing a null unit DIE.
+  const char *yamldata = R"(
+--- !mach-o
+FileHeader:
+  magic:           0xFEEDFACF
+  cputype:         0x01000007
+  cpusubtype:      0x80000003
+  filetype:        0x00000001
+  ncmds:           1
+  sizeofcmds:      152
+  flags:           0x00002000
+  reserved:        0x00000000
+LoadCommands:
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         152
+    segname:         __TEXT
+    vmaddr:          0
+    vmsize:          4
+    fileoff:         184
+    filesize:        4
+    maxprot:         7
+    initprot:        7
+    nsects:          1
+    flags:           0
+    Sections:
+      - sectname:        __text
+        segname:         __TEXT
+        addr:            0x0000000000000000
+        content:         'AABBCCDD'
+        size:            4
+        offset:          184
+        align:           0
+        reloff:          0x00000000
+        nreloc:          0
+        flags:           0x80000400
+        reserved1:       0x00000000
+        reserved2:       0x00000000
+        reserved3:       0x00000000
+...
+)";
+
+  llvm::Expected<TestFile> file = TestFile::fromYaml(yamldata);
+  EXPECT_THAT_EXPECTED(file, llvm::Succeeded());
+  auto module_sp = std::make_shared<Module>(file->moduleSpec());
+  ASSERT_NE(module_sp, nullptr);
+  auto object_file = module_sp->GetObjectFile();
+  ASSERT_NE(object_file, nullptr);
+  auto debug_map =
+      SymbolFileDWARFDebugMap::CreateInstance(object_file->shared_from_this());
+  ASSERT_NE(debug_map, nullptr);
+}
+
+TEST_F(SymbolFileDWARFDebugMapTests, CreateInstanceReturnNullForNonMachOFile) {
+  // Make sure we don't crash parsing a null unit DIE.
+  const char *yamldata = R"(
+--- !ELF
+FileHeader:
+  Class:   ELFCLASS64
+  Data:    ELFDATA2LSB
+  Type:    ET_EXEC
+  Machine: EM_386
+DWARF:
+  debug_abbrev:
+    - Table:
+        - Code:            0x00000001
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_addr_base
+              Form:            DW_FORM_sec_offset
+  debug_info:
+    - Version:         5
+      AddrSize:        4
+      UnitType:        DW_UT_compile
+      Entries:
+        - AbbrCode:        0x00000001
+          Values:
+            - Value:           0x8 # Offset of the first Address past the header
+        - AbbrCode:        0x0
+  debug_addr:
+    - Version: 5
+      AddressSize: 4
+      Entries:
+        - Address: 0x1234
+        - Address: 0x5678
+  debug_line:
+    - Length:          42
+      Version:         2
+      PrologueLength:  36
+      MinInstLength:   1
+      DefaultIsStmt:   1
+      LineBase:        251
+      LineRange:       14
+      OpcodeBase:      13
+      StandardOpcodeLengths: [ 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 ]
+      IncludeDirs:
+        - '/tmp'
+      Files:
+        - Name:            main.cpp
+          DirIdx:          1
+          ModTime:         0
+          Length:          0
+)";
+
+  llvm::Expected<TestFile> file = TestFile::fromYaml(yamldata);
+  EXPECT_THAT_EXPECTED(file, llvm::Succeeded());
+  auto module_sp = std::make_shared<Module>(file->moduleSpec());
+  ASSERT_NE(module_sp, nullptr);
+  auto object_file = module_sp->GetObjectFile();
+  ASSERT_NE(object_file, nullptr);
+  auto debug_map =
+      SymbolFileDWARFDebugMap::CreateInstance(object_file->shared_from_this());
+  ASSERT_EQ(debug_map, nullptr);
+}

@royitaqi royitaqi requested review from clayborg and jeffreytan81 May 8, 2025 23:00
@@ -246,6 +247,8 @@ llvm::StringRef SymbolFileDWARFDebugMap::GetPluginDescriptionStatic() {
}

SymbolFile *SymbolFileDWARFDebugMap::CreateInstance(ObjectFileSP objfile_sp) {
if (objfile_sp->GetPluginName() != ObjectFileMachO::GetPluginNameStatic())
return nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is calling this? I'm wondering if the check might make more sense at the call site?

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugins shouldn't depend on each other. The correct way to do this would be to have a method in ObjectFile, something like ObjectFile::SupportsDebugMap() which returns false for all implementations except the ObjectFileMachO one.

@bulbazord
Copy link
Member

+1 to Jonas's comment. Plugins ideally don't depend on each other and they should strive to fail gracefully if they depend on a specific plugin instance that's not available. I recognize that many plugins depend on each other today, but I don't think we should be adding more direct dependencies if we can help it.

@royitaqi
Copy link
Contributor Author

royitaqi commented May 9, 2025

@JDevlieghere I see your point (some learning questions below about the principle).

However, I feel ObjectFile::SupportsDebugMap() is still closely tied to a specific plugin (not through compilation, but semantics). Imagine each plugin needs to add such a method into the ObjectFile class, then the class will be filled with plugin-specific methods like this, which I feel breaks the API scalability the plugin system wanted.

Instead, maybe it's better if we frame "DebugMap" as an ObjectFileCapability enum. Other capabilities can be added, too, e.g. does the object file imply split dwarf, etc. Then the API will be ObjectFile::GetSupportedCapabilities().

WDYT?

(Note: I am trying to think about how does this relate to the CalculateAbilities() function in those SymbolFile subclasses. I feel they are different, so it should be okay.)

-- (my learning question about the principle of "Plugins shouldn't depend on each other") --

I guess I'm missing a lot of context here. From my limited and naive view, there are natural connections between different kinds of plugins because how they usually work together in existing systems. E.g. macOS usually use Mach-O files and can use either SymbolFileDWARF and SymbolFileDWARFDebugMap, but never split-DWARF (dwp/dwo). I don't understand the design principle of "Plugins shouldn't depend on each other". It feels like it's trying to ignore the natural connection between these plugins. I'm guessing we have that principle so as to avoid having to model a complex (and dynamic) dependency problem? (E.g. "A and B" and "B and C" should always work together, but "A and C" should never work together.)

@labath
Copy link
Collaborator

labath commented May 12, 2025

FWIW, I completely agree @royitaqi. I'm not saying inter-plugin dependencies should be a free-for-all (*), but I think that some kinds of dependencies make sense and I definitely think this is one of them. The debug map format is intimately tied to the MachO and darwin ecosystems and I think there's approximately zero chance that it will be implemented elsewhere. Using the debug map requires MachO files, so why not just admit that?

This isn't the only example of that. For example, ProcessElfCore requires elf core files. ProcessMachCore requires MachO core files. And I think that's a perfectly natural state of the world and doing anything else would make things a lot more convoluted. I think something like SupportsDebugMap is a step in the wrong direction. It's true that removes the link time dependency but the moral dependency remains, and in fact this makes it worse, because now every object file plugin needs to know about the debug map. Why should ObjectFilePECOFF need to think about whether it supports a random symbol file plugin?

(*) I think that plugin dependencies should form a DAG. The fact that SymbolFileDWARFDebugMap depends on ObjectFileMachO means that ObjectFileMachO should not depend on SymbolFileDWARFDebugMap -- which I think is also perfectly natural restriction. And even if there is a reason why you might want to depend the other way, you can try to break it using the usual dependency reversal techniques, but you'd only need to touch these two plugins -- not every plugin of the two kinds.

I might also say that, for the sake of clarity, the dependency direction should be uniform across all plugins of the same type. So that SymbolFileDWARFDebugMap depends on ObjectFileMachO means that no other object file plugin can depend on any other symbol file plugin, which is, again, I think a reasonable requirement, and I would say that if we ever find a case where we have two plugin pairs which want to depend in the opposite directions, that this means that those plugin types are not designed properly (maybe they ought to be a single plugin?).

@JDevlieghere
Copy link
Member

FWIW, I completely agree @royitaqi. I'm not saying inter-plugin dependencies should be a free-for-all (*), but I think that some kinds of dependencies make sense and I definitely think this is one of them. The debug map format is intimately tied to the MachO and darwin ecosystems and I think there's approximately zero chance that it will be implemented elsewhere. Using the debug map requires MachO files, so why not just admit that?

To answer your last question: because the plugins (which is really a misnomer), as I understand it, are there to provide abstraction: the caller shouldn't have to know what concrete instance an ObjectFile is. The no-plugins-dependency enforces that at compile/link time.

I'll be the first to admit that the current approach has shortcomings (like it does here, I totally agree with that). However we should also remember that it exists for a reason and that a lot of time and effort has been spent on detangling the plugins to fix layering issues and reduce binary sizes. The current rule is easy to understand and easy to enforce which makes it appealing and ensures we don't make things worse as we're still working on this.

IMHO, the downsides being listed here don't outweigh the benefits, especially since we have a bunch of examples of those "moral dependencies". That said, if there's a way to enforce the DAG described in the asterisk, I'm totally fine with changing the rule to allow inter-plugin dependencies "where it makes sense" (which is still subjective). I made a similar argument in another PR a while ago, but I don't think we can expect reviewers to mentally verify this DAG every time someone tries to introduce a new inter-plugin dependencies.

TL;DR: My stance is that if there's a way to enforce the DAG, I'm totally fine with allowing inter-plugin dependencies where they make sense. Barring that I think the current rule, albeit overly strict, is the easiest way to enforce we don't regress inter-plugin dependencies in general.

PS: I'm actually glad we're having this discussion. Whatever the outcome, I'd like to document it on the website so contributors can know what to expect going forward. This comes up periodically and there's no reason this should be a surprise for existing or new contributors.

@labath
Copy link
Collaborator

labath commented May 13, 2025

To answer your last question: because the plugins (which is really a misnomer), as I understand it, are there to provide abstraction: the caller shouldn't have to know what concrete instance an ObjectFile is. The no-plugins-dependency enforces that at compile/link time.

This is my problem with that. You're saying the "caller shouldn't have to know" -- which is a statement about his state of mind. We can't enforce the state of mind through syntactic restrictions. We can prevent the caller from saying he expects a specific subclass, but there's not way to prevent code from expecting that implicitly (perhaps even without it knowing). Functions like SupportsDebugMap make it worse because they encourage writing code like that and that makes the plugin system (the abstractions created by it) less useful as a whole. Like, if I now wanted to create a new symbol file plugin that works only with a specific object file, do I add SupportsPsym to every object file plugin? Or we don't have to look at new plugins, we have two good examples already: SymbolFileBreakpad and SymbolFileJSON. Both of these currently depend on the their respective object file plugin. Should that be replaced by something else (and would the result be better if we did)?

Nonetheless, I agree that plugins should create abstractions, but I think it's important to think about who are those abstractions directed at. The core code definitely shouldn't need to "look behind the curtain" (ideally not even through calls like SupportsDebugMap), but I don't think that means that noone else isn't allowed to do that. This situation is really simple, so it doesn't demonstrate the full scope of the problem, which is why I want to go back to the ProcessCore->ObjectFile example.

ProcessElfCore needs to extract a lot of detailed information from ObjectFileELF. Same goes for ProcessMachCore. At some level, the information these two pairs are exchanging is the same (threads, registers, ...), so one could imagine a generic interface to pass this -- but would that help anything? I say "no", because you still want to have the two plugins to work in pair -- if you didn't, then you wouldn't need separate process core plugins. This setup would make sense if we had a generic ProcessCore class (not a plugin) which works off of information provided by the object plugins -- but that's not the situation we're in now. You can make the same case for dynamic loaders -- should MacOSX-DYLD work with ObjectFilePECOFF?

TL;DR: My stance is that if there's a way to enforce the DAG, I'm totally fine with allowing inter-plugin dependencies where they make sense. Barring that I think the current rule, albeit overly strict, is the easiest way to enforce we don't regress inter-plugin dependencies in general.

Enforcement is tricky, because "where it makes sense" is obviously a judgement call. It's also complicated by the fact that current graph is not a DAG. Linux linkers actually kind of enforce the graph implicitly, but since it's not a DAG now, we have a workaround and that prevents us catching any new bad deps. I guess it would be possible to do something similar to NO_PLUGIN_DEPENDENCIES in cmake, which only allows dependencies that we consider acceptable, though it's not foolproof: for example SymbolFileJSON doesn't list ObjectFileJSON in its cmake dependencies even though it definitely needs it.

I think the best we can do is some cmake check plus some documentation about which dependencies we consider "okay".
I don't think this can be foolproof though. Going by the examples above, I would say that we should permit dependencies on object file plugins from SymbolFiles, DynamicLoaders and Processes -- but there are still exceptions to that. For example, ProcessGdbRemote and DynamicLoaderStatic (and maybe SymbolFileDWARF) probably shouldn't depend on any specific ObjectFile plugin since they're meant to work with more than one (I guess that could be a general rule -- but it's not enforcable one).

A rule like "there shall be no inter-plugin dependencies" is easy to enforce and that makes it appealing, but I think it will be very hard for us to come into compliance with that (and I don't think we'd like the result even if we did). If we say some deps are okay, then a lot of things become automatically compliant, and we can focus on the high-value high-complexity deps like SymbolFile<->TypeSystem.

@JDevlieghere
Copy link
Member

This is my problem with that. You're saying the "caller shouldn't have to know" -- which is a statement about his state of mind. We can't enforce the state of mind through syntactic restrictions. We can prevent the caller from saying he expects a specific subclass, but there's not way to prevent code from expecting that implicitly (perhaps even without it knowing). Functions like SupportsDebugMap make it worse because they encourage writing code like that and that makes the plugin system (the abstractions created by it) less useful as a whole. Like, if I now wanted to create a new symbol file plugin that works only with a specific object file, do I add SupportsPsym to every object file plugin? Or we don't have to look at new plugins, we have two good examples already: SymbolFileBreakpad and SymbolFileJSON. Both of these currently depend on the their respective object file plugin. Should that be replaced by something else (and would the result be better if we did)?

Nonetheless, I agree that plugins should create abstractions, but I think it's important to think about who are those abstractions directed at. The core code definitely shouldn't need to "look behind the curtain" (ideally not even through calls like SupportsDebugMap), but I don't think that means that noone else isn't allowed to do that. This situation is really simple, so it doesn't demonstrate the full scope of the problem, which is why I want to go back to the ProcessCore->ObjectFile example.

ProcessElfCore needs to extract a lot of detailed information from ObjectFileELF. Same goes for ProcessMachCore. At some level, the information these two pairs are exchanging is the same (threads, registers, ...), so one could imagine a generic interface to pass this -- but would that help anything? I say "no", because you still want to have the two plugins to work in pair -- if you didn't, then you wouldn't need separate process core plugins. This setup would make sense if we had a generic ProcessCore class (not a plugin) which works off of information provided by the object plugins -- but that's not the situation we're in now. You can make the same case for dynamic loaders -- should MacOSX-DYLD work with ObjectFilePECOFF?

To be clear, I pretty much agree with everything you've said. I'm not arguing that the current approach is better, at least not on merit. My point is that barring a better alternative, those shortcoming are the price to pay to avoid the free-for-all nobody wants. I'm glad that you're offering an alternative below.

TL;DR: My stance is that if there's a way to enforce the DAG, I'm totally fine with allowing inter-plugin dependencies where they make sense. Barring that I think the current rule, albeit overly strict, is the easiest way to enforce we don't regress inter-plugin dependencies in general.

Enforcement is tricky, because "where it makes sense" is obviously a judgement call. It's also complicated by the fact that current graph is not a DAG. Linux linkers actually kind of enforce the graph implicitly, but since it's not a DAG now, we have a workaround and that prevents us catching any new bad deps. I guess it would be possible to do something similar to NO_PLUGIN_DEPENDENCIES in cmake, which only allows dependencies that we consider acceptable, though it's not foolproof: for example SymbolFileJSON doesn't list ObjectFileJSON in its cmake dependencies even though it definitely needs it.

I think the best we can do is some cmake check plus some documentation about which dependencies we consider "okay". I don't think this can be foolproof though. Going by the examples above, I would say that we should permit dependencies on object file plugins from SymbolFiles, DynamicLoaders and Processes -- but there are still exceptions to that. For example, ProcessGdbRemote and DynamicLoaderStatic (and maybe SymbolFileDWARF) probably shouldn't depend on any specific ObjectFile plugin since they're meant to work with more than one (I guess that could be a general rule -- but it's not enforcable one).

I'm fine with that. What I'm looking for is something that's (relatively) easy to understand and (relatively) easy to spot in code reviews. I'd love to have something foolproof (or work towards that) but I acknowledge that given the current state of things that's not achievable. What you describe here checks both boxes.

A rule like "there shall be no inter-plugin dependencies" is easy to enforce and that makes it appealing, but I think it will be very hard for us to come into compliance with that (and I don't think we'd like the result even if we did). If we say some deps are okay, then a lot of things become automatically compliant, and we can focus on the high-value high-complexity deps like SymbolFile<->TypeSystem.

Ack, agreed.

To make this actionable, @labath or @royitaqi, is either of you willing to sign up to add the CMakery and document the new policy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants