Skip to content

[lld][ELF] Merge equivalent symbols found during ICF #139493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pranavk
Copy link
Contributor

@pranavk pranavk commented May 12, 2025

This reapplies #134342 which was reverted in fd3fecf

Original description:
Fixes a correctness issue for AArch64 when ADRP and LDR instructions are outlined in separate sections and sections are fed to ICF for deduplication.

See test case (based on #129122) for details. All rodata.* sections are folded into a single section with ICF. This leads to all f2_* function sections getting folded into one (as their relocation target symbols g* belong to .rodata.g* sections that have already been folded into one). Since relocations still refer original g* symbols, we end up creating duplicate GOT entry for all such symbols. This PR addresses that by tracking such folded symbols and create one GOT entry for all such symbols.

Fixes #129122

In addition to the original fix, this reland also fixes following issues:

  1. When two relocations have non-zero addends, we must prevent symbol folding.
  2. It takes care of non-globals (original PR in [lld][ICF] Prevent merging two sections when they point to non-globals #136641).

@llvmbot
Copy link
Member

llvmbot commented May 12, 2025

@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: Pranav Kant (pranavk)

Changes

Fixes scenario reported in #134342 (comment)


Full diff: https://github.com/llvm/llvm-project/pull/139493.diff

2 Files Affected:

  • (modified) lld/ELF/ICF.cpp (+15-14)
  • (added) lld/test/ELF/icf-addend.s (+33)
diff --git a/lld/ELF/ICF.cpp b/lld/ELF/ICF.cpp
index 1882116d4d5f0..797d0968da6c2 100644
--- a/lld/ELF/ICF.cpp
+++ b/lld/ELF/ICF.cpp
@@ -334,26 +334,27 @@ bool ICF<ELFT>::equalsConstant(const InputSection *a, const InputSection *b) {
              : constantEq(a, ra.relas, b, rb.relas);
 }
 
-template <class RelTy>
-static SmallVector<Symbol *> getReloc(const InputSection *sec,
+template <class ELFT, class RelTy>
+static SmallVector<std::pair<Symbol *, uint64_t>> getReloc(const InputSection *sec,
                                       Relocs<RelTy> relocs) {
-  SmallVector<Symbol *> syms;
+  SmallVector<std::pair<Symbol *, uint64_t>> syms;
   for (auto ri = relocs.begin(), re = relocs.end(); ri != re; ++ri) {
     Symbol &sym = sec->file->getRelocTargetSym(*ri);
-    syms.push_back(&sym);
+    uint64_t addend = getAddend<ELFT>(*ri);
+    syms.push_back(std::make_pair(&sym, addend));
   }
   return syms;
 }
 
 template <class ELFT>
-static SmallVector<Symbol *> getRelocTargetSyms(const InputSection *sec) {
+static SmallVector<std::pair<Symbol *, uint64_t>> getRelocTargets(const InputSection *sec) {
   const RelsOrRelas<ELFT> rel = sec->template relsOrRelas<ELFT>();
   if (rel.areRelocsCrel())
-    return getReloc(sec, rel.crels);
+    return getReloc<ELFT>(sec, rel.crels);
   if (rel.areRelocsRel())
-    return getReloc(sec, rel.rels);
+    return getReloc<ELFT>(sec, rel.rels);
 
-  return getReloc(sec, rel.relas);
+  return getReloc<ELFT>(sec, rel.relas);
 }
 
 // Compare two lists of relocations. Returns true if all pairs of
@@ -568,19 +569,19 @@ template <class ELFT> void ICF<ELFT>::run() {
     if (end - begin == 1)
       return;
     print() << "selected section " << sections[begin];
-    SmallVector<Symbol *> syms = getRelocTargetSyms<ELFT>(sections[begin]);
+    SmallVector<std::pair<Symbol *, uint64_t>> syms = getRelocTargets<ELFT>(sections[begin]);
     for (size_t i = begin + 1; i < end; ++i) {
       print() << "  removing identical section " << sections[i];
       sections[begin]->replace(sections[i]);
-      SmallVector<Symbol *> replacedSyms =
-          getRelocTargetSyms<ELFT>(sections[i]);
+      SmallVector<std::pair<Symbol *, uint64_t>> replacedSyms =
+          getRelocTargets<ELFT>(sections[i]);
       assert(syms.size() == replacedSyms.size() &&
              "Should have same number of syms!");
       for (size_t i = 0; i < syms.size(); i++) {
-        if (syms[i] == replacedSyms[i] || !syms[i]->isGlobal() ||
-            !replacedSyms[i]->isGlobal())
+        if (syms[i].first == replacedSyms[i].first || !syms[i].first->isGlobal() ||
+            !replacedSyms[i].first->isGlobal() || syms[i].second != replacedSyms[i].second)
           continue;
-        symbolEquivalence.unionSets(syms[i], replacedSyms[i]);
+        symbolEquivalence.unionSets(syms[i].first, replacedSyms[i].first);
       }
 
       // At this point we know sections merged are fully identical and hence
diff --git a/lld/test/ELF/icf-addend.s b/lld/test/ELF/icf-addend.s
new file mode 100644
index 0000000000000..0023bbf2ff421
--- /dev/null
+++ b/lld/test/ELF/icf-addend.s
@@ -0,0 +1,33 @@
+# REQUIRES: x86
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
+# RUN: ld.lld %t.o -o /dev/null --icf=all --print-icf-sections | FileCheck %s
+
+# Check that ICF doesn't fold different symbols when functions referencing them
+# can be folded because they are pointing to the same address.
+
+# CHECK: selected section {{.*}}:(.text.f1)
+# CHECK:   removing identical section {{.*}}:(.text.f2)
+# CHECK-NOT: redirecting 'y' in symtab to 'x'
+# CHECK-NOT: redirecting 'y' to 'x'
+
+.globl x, y
+
+.section .rodata,"a",@progbits
+x:
+.long 11
+y:
+.long 12
+
+.section .text.f1,"ax",@progbits
+f1:
+movq x+4(%rip), %rax 
+
+.section .text.f2,"ax",@progbits
+f2:
+movq y(%rip), %rax
+
+.section .text._start,"ax",@progbits
+.globl _start
+_start:
+call f1
+call f2

Copy link

github-actions bot commented May 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@pranavk pranavk changed the title [DO NOT MERGE][WIP] -- [lld][ICF] Don't merge symbols with different addends [lld][ICF] Don't merge symbols with different addends May 12, 2025
@pranavk pranavk requested review from jyknight and MaskRay May 12, 2025 14:33
@jyknight
Copy link
Member

As per previous discussion, the symbol folding needs to be done in all circumstances where section folding considers the symbols as equivalent -- at least for the special relocation types.

So (at least for those kinds of relocs) I think we must forbid the fold of sections with relocations having different offsets in the first place. (Rather than just avoid folding the symbols, as this patch does).

lld/ELF/ICF.cpp Outdated
// We should not merge sections in the first place if their symbols
// cannot be merged together.
bool canMergeSymbols = addA == addB;
if (da->value + addA == db->value + addB && canMergeSymbols)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can simplify this condition too but i think this is more readable in its current form.

@pranavk
Copy link
Contributor Author

pranavk commented May 13, 2025

Original lld commit was reverted. So I have reapplied that commit in this PR and committed changes to fix the problem in a different commit. You can just look at latest commit to see those changes to make review simple.

This commit basically stops section folding when addends are different which would automatically stop symbol folding. Yes, it means that we would be folding less sections now. I am hoping we wouldn't see too much of this pattern. So it's not going to be noticeable regression. One way to fix this would be to only stop section folding when the relocation is non-trivial but as we discussed in #136641, that's most likely going to bring target-specific logic in which we are not sure we want to do. We can continue building consensus on that in that PR while we let this in?

@pranavk pranavk requested a review from smithp35 May 13, 2025 21:47
@pranavk
Copy link
Contributor Author

pranavk commented May 14, 2025

I have added separate commits for addressing concern in #136641. I am okay splitting it across multiple PRs but I feel everything is so closely related, perhaps it should be just one PR.

@MaskRay
Copy link
Member

MaskRay commented May 14, 2025

[lld][ICF] Don't merge symbols with different addends

Since the "merge symbol" PR has been reverted. This title is no longer accurate. You need to include the original description and append description about addends, potentially simplify the concatenated description.

@pranavk pranavk changed the title [lld][ICF] Don't merge symbols with different addends [lld][ELF] Merge equivalent symbols found during ICF May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mislink with ICF and multi-instruction GOT entry references
4 participants