[1/3] ir: Speedup compute_aliases_for_elf_symbol

Message ID 20260224132624.3469488-2-christopher.burr@cern.ch
State New
Headers
Series Speedup abidiff for large symbol diffs |

Commit Message

Chris Burr Feb. 24, 2026, 1:26 p.m. UTC
  When symbols have no pre-linked alias chain (e.g. when loaded from
abixml), compute_aliases_for_elf_symbol scans the entire symbol
table to find aliases, making each call O(N) where N is the total
number of symbols.  This function is called from
get_aliases_id_string, which is invoked by the reporting path
(show_linkage_name_and_aliases) for every added or deleted
unreferenced symbol.  With ~47K added symbols, this results in
O(N^2) comparisons via elf_symbol::operator== (which itself calls
textually_equals and walks alias chains), dominating abidiff
runtime.

The symtab is an unordered_map keyed by symbol name.  Since
textually_equals requires matching names, only entries under the
same name key can possibly match.  Replace the full table scan
with a direct hash lookup of sym.get_name(), reducing the per-call
cost from O(N) to O(1).

On a comparison of two versions of libCling.so (~57K symbol
changes), this reduces abidiff wall time from ~9 minutes to ~20
seconds.

	* src/abg-ir.cc (compute_aliases_for_elf_symbol): In the
	no-alias-chain fallback, look up sym.get_name() in the
	symtab rather than iterating all entries.

Signed-off-by: Chris Burr <christopher.burr@cern.ch>
---
 src/abg-ir.cc | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)
  

Comments

Dodji Seketeli Feb. 27, 2026, 2:31 p.m. UTC | #1
Hello Chris,

Chris Burr <christopher.burr@cern.ch> a écrit:

> When symbols have no pre-linked alias chain (e.g. when loaded from
> abixml), compute_aliases_for_elf_symbol scans the entire symbol
> table to find aliases, making each call O(N) where N is the total
> number of symbols.  This function is called from
> get_aliases_id_string, which is invoked by the reporting path
> (show_linkage_name_and_aliases) for every added or deleted
> unreferenced symbol.  With ~47K added symbols, this results in
> O(N^2) comparisons via elf_symbol::operator== (which itself calls
> textually_equals and walks alias chains), dominating abidiff
> runtime.
>
> The symtab is an unordered_map keyed by symbol name.  Since
> textually_equals requires matching names, only entries under the
> same name key can possibly match.  Replace the full table scan
> with a direct hash lookup of sym.get_name(), reducing the per-call
> cost from O(N) to O(1).
>
> On a comparison of two versions of libCling.so (~57K symbol
> changes), this reduces abidiff wall time from ~9 minutes to ~20
> seconds.
>
> 	* src/abg-ir.cc (compute_aliases_for_elf_symbol): In the
> 	no-alias-chain fallback, look up sym.get_name() in the
> 	symtab rather than iterating all entries.
>
> Signed-off-by: Chris Burr <christopher.burr@cern.ch>

I have applied to this to the master branch at
https://sourceware.org/cgit/libabigail/commit/?id=c7032a2b4a08d1b78b91d297e74e7c33230f7165.

Many thanks.

[...]

Cheers,
  

Patch

diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 034a57e3..8ae643fd 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -2877,9 +2877,15 @@  compute_aliases_for_elf_symbol(const elf_symbol& sym,
     for (; a && !a->is_main_symbol(); a = a->get_next_alias())
       aliases.push_back(a);
   else
-    for (string_elf_symbols_map_type::const_iterator i = symtab.begin();
-	 i != symtab.end();
-	 ++i)
+    {
+      // No pre-linked alias chain (e.g. symbols loaded from abixml).
+      // Look up by name in the symtab (O(1) hash lookup) instead of
+      // scanning the entire table (which was O(N) per call).
+      string_elf_symbols_map_type::const_iterator i =
+	symtab.find(sym.get_name());
+      if (i == symtab.end())
+	return;
+
       for (elf_symbols::const_iterator j = i->second.begin();
 	   j != i->second.end();
 	   ++j)
@@ -2896,6 +2902,7 @@  compute_aliases_for_elf_symbol(const elf_symbol& sym,
 	      if (*s == sym)
 		aliases.push_back(*j);
 	}
+    }
 }
 
 /// Test if two symbols alias.