[2/3] comparison: Speedup corpus diff computation

Message ID 20260224132624.3469488-3-christopher.burr@cern.ch
State New
Headers
Series Speedup abidiff for large symbol diffs |

Commit Message

Chris Burr Feb. 24, 2026, 1:26 p.m. UTC
  The Myers diff algorithm used to compare functions, variables, and
unreferenced symbols between corpora is O(N*D) where D is the edit
distance.  When most entries differ between large corpora, D
approaches N+M, yielding quadratic equality comparisons.  Each
comparison invokes elf_symbol::operator== which calls
textually_equals and walks alias chains via weak_ptr::lock().

The downstream consumer (ensure_lookup_tables_populated) converts
the edit script into unordered maps keyed by string identifiers,
discarding ordering.  Replace the Myers diff with direct hash-based
set difference in O(N+M).

For functions and variables, the hash-based classification is done
inside ensure_lookup_tables_populated, building maps keyed by
function/variable ID directly from the corpora vectors.  For
unreferenced symbols, a new template helper populates the result
maps in compute_diff.

Combined with the previous alias lookup fix, this reduces abidiff
runtime on a libCling.so comparison with ~57K symbol changes from
~20 seconds to ~1 second.

	* include/abg-comparison.h (corpus_diff::function_changes)
	(corpus_diff::variable_changes): Remove unused methods.
	* src/abg-comparison-priv.h
	(corpus_diff::priv::fns_edit_script_)
	(corpus_diff::priv::vars_edit_script_)
	(corpus_diff::priv::unrefed_fn_syms_edit_script_)
	(corpus_diff::priv::unrefed_var_syms_edit_script_): Remove
	unused data members.
	* src/abg-comparison.cc
	(compute_unreferenced_symbol_set_difference): New static
	template function.
	(compute_diff): Remove Myers diff calls for functions,
	variables, and unreferenced symbols.  Call
	compute_unreferenced_symbol_set_difference for unreferenced
	symbols.  Remove unused typedefs.
	(corpus_diff::priv::ensure_lookup_tables_populated): Replace
	edit script processing for functions and variables with
	hash-based set difference built from the corpora vectors
	directly.  Remove dead edit script processing for
	unreferenced symbols.
	(corpus_diff::function_changes)
	(corpus_diff::variable_changes): Remove unused methods.

Signed-off-by: Chris Burr <christopher.burr@cern.ch>
---
 include/abg-comparison.h  |   6 -
 src/abg-comparison-priv.h |   4 -
 src/abg-comparison.cc     | 430 +++++++++++++++-----------------------
 3 files changed, 163 insertions(+), 277 deletions(-)
  

Comments

Dodji Seketeli Feb. 27, 2026, 3:35 p.m. UTC | #1
Hello Chris,

Chris Burr <christopher.burr@cern.ch> a écrit:

[...]

> The Myers diff algorithm used to compare functions, variables, and
> unreferenced symbols between corpora is O(N*D) where D is the edit
> distance.  When most entries differ between large corpora, D
> approaches N+M, yielding quadratic equality comparisons.  Each
> comparison invokes elf_symbol::operator== which calls
> textually_equals and walks alias chains via weak_ptr::lock().
>
> The downstream consumer (ensure_lookup_tables_populated) converts
> the edit script into unordered maps keyed by string identifiers,
> discarding ordering.  Replace the Myers diff with direct hash-based
> set difference in O(N+M).
>
> For functions and variables, the hash-based classification is done
> inside ensure_lookup_tables_populated, building maps keyed by
> function/variable ID directly from the corpora vectors.  For
> unreferenced symbols, a new template helper populates the result
> maps in compute_diff.
>
> Combined with the previous alias lookup fix, this reduces abidiff
> runtime on a libCling.so comparison with ~57K symbol changes from
> ~20 seconds to ~1 second.

The approach sounds solid to me.  Also, even on the moderately sized
binaries of the testsuite, "make check -C build -j15" is faster with
this patch than the current state of the master branch.  So really,
great work.

I have made just one cosmetic change below:

[...]


> diff --git a/src/abg-comparison.cc b/src/abg-comparison.cc

[...]

> @@ -9906,60 +9906,46 @@ corpus_diff::priv::ensure_lookup_tables_populated()

I changed the name of this function into
corpus_diff::priv::compare_fns_vars_and_ensure_lookup_tables_populated
to the reflect the fact that it also compares functions & variables now.

[...]

Thus, here is below the patch that I am applying to the tree.

Many thanks!

Cheers,

From d5314d04d8c3f7f0bdec4d3f10e95c78807feee3 Mon Sep 17 00:00:00 2001
From: Chris Burr <christopher.burr@cern.ch>
Date: Tue, 24 Feb 2026 14:26:23 +0100
Subject: [PATCH] comparison: Speedup corpus diff computation

The Myers diff algorithm used to compare functions, variables, and
unreferenced symbols between corpora is O(N*D) where D is the edit
distance.  When most entries differ between large corpora, D
approaches N+M, yielding quadratic equality comparisons.  Each
comparison invokes elf_symbol::operator== which calls
textually_equals and walks alias chains via weak_ptr::lock().

The downstream consumer (ensure_lookup_tables_populated) converts
the edit script into unordered maps keyed by string identifiers,
discarding ordering.  Replace the Myers diff with direct hash-based
set difference in O(N+M).

For functions and variables, the hash-based classification is done
inside ensure_lookup_tables_populated, building maps keyed by
function/variable ID directly from the corpora vectors.  For
unreferenced symbols, a new template helper populates the result
maps in compute_diff.

Combined with the previous alias lookup fix, this reduces abidiff
runtime on a libCling.so comparison with ~57K symbol changes from
~20 seconds to ~1 second.

	* include/abg-comparison.h (corpus_diff::function_changes)
	(corpus_diff::variable_changes): Remove unused methods.
	* src/abg-comparison-priv.h
	(corpus_diff::priv::fns_edit_script_)
	(corpus_diff::priv::vars_edit_script_)
	(corpus_diff::priv::unrefed_fn_syms_edit_script_)
	(corpus_diff::priv::unrefed_var_syms_edit_script_): Remove unused
	data members.
	* src/abg-comparison.cc
	(compute_unreferenced_symbol_set_diff): New static template
	function.
	(compute_diff): Remove Myers diff calls for functions, variables,
	and unreferenced symbols.  Call
	compute_unreferenced_symbol_set_difference for unreferenced
	symbols.  Remove unused typedefs.
	(corpus_diff::priv::compare_fns_vars_and_ensure_lookup_tables_populated):
	Renamed corpus_diff::priv::ensure_lookup_tables_populated into
	this.  Replace edit script processing for functions and variables
	with hash-based set difference built from the corpora vectors
	directly.  Remove dead edit script processing for unreferenced
	symbols.
	(corpus_diff::{function,variable}_changes): Remove unused methods.

Signed-off-by: Chris Burr <christopher.burr@cern.ch>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 include/abg-comparison.h  |   6 -
 src/abg-comparison-priv.h |   6 +-
 src/abg-comparison.cc     | 434 +++++++++++++++-----------------------
 3 files changed, 166 insertions(+), 280 deletions(-)

diff --git a/include/abg-comparison.h b/include/abg-comparison.h
index b4ae3f5f..56f75e4f 100644
--- a/include/abg-comparison.h
+++ b/include/abg-comparison.h
@@ -2536,12 +2536,6 @@ public:
   void
   append_child_node(diff_sptr);
 
-  edit_script&
-  function_changes() const;
-
-  edit_script&
-  variable_changes() const;
-
   bool
   soname_changed() const;
 
diff --git a/src/abg-comparison-priv.h b/src/abg-comparison-priv.h
index fbd797c6..865cd9f9 100644
--- a/src/abg-comparison-priv.h
+++ b/src/abg-comparison-priv.h
@@ -1073,10 +1073,6 @@ struct corpus_diff::priv
   corpus_diff::diff_stats_sptr		diff_stats_;
   bool					sonames_equal_;
   bool					architectures_equal_;
-  edit_script				fns_edit_script_;
-  edit_script				vars_edit_script_;
-  edit_script				unrefed_fn_syms_edit_script_;
-  edit_script				unrefed_var_syms_edit_script_;
   string_function_ptr_map		deleted_fns_;
   string_function_ptr_map		suppressed_deleted_fns_;
   string_function_ptr_map		added_fns_;
@@ -1145,7 +1141,7 @@ struct corpus_diff::priv
   clear_lookup_tables();
 
   void
-  ensure_lookup_tables_populated();
+  compare_fns_vars_and_ensure_lookup_tables_populated();
 
   void
   apply_supprs_to_added_removed_fns_vars_unreachable_types();
diff --git a/src/abg-comparison.cc b/src/abg-comparison.cc
index e7897844..e7e3256e 100644
--- a/src/abg-comparison.cc
+++ b/src/abg-comparison.cc
@@ -9899,67 +9899,53 @@ corpus_diff::priv::clear_lookup_tables()
 /// If the lookup tables are not yet built, walk the differences and
 /// fill the lookup tables.
 void
-corpus_diff::priv::ensure_lookup_tables_populated()
+corpus_diff::priv::compare_fns_vars_and_ensure_lookup_tables_populated()
 {
   if (!lookup_tables_empty())
     return;
 
   diff_context_sptr ctxt = get_context();
 
+  // Use hash-based set difference to classify functions as
+  // added/deleted/changed, avoiding the quadratic Myers diff.
   {
-    edit_script& e = fns_edit_script_;
-
-    for (vector<deletion>::const_iterator it = e.deletions().begin();
-	 it != e.deletions().end();
-	 ++it)
+    // Build a map of function ID -> function_decl* for each corpus.
+    unordered_map<string, const function_decl*> first_fns_map;
+    first_fns_map.reserve(first_->get_functions().size());
+    for (const auto* fn : first_->get_functions())
       {
-	unsigned i = it->index();
-	ABG_ASSERT(i < first_->get_functions().size());
-
-	const function_decl* deleted_fn = first_->get_functions()[i];
-	string n = get_function_id_or_pretty_representation(deleted_fn);
+	string n = get_function_id_or_pretty_representation(fn);
 	ABG_ASSERT(!n.empty());
-	// The below is commented out because there can be several
-	// functions with the same ID in the corpus.  So several
-	// functions with the same ID can be deleted.
-	// ABG_ASSERT(deleted_fns_.find(n) == deleted_fns_.end());
-	deleted_fns_[n] = deleted_fn;
+	first_fns_map[n] = fn;
       }
 
-    for (vector<insertion>::const_iterator it = e.insertions().begin();
-	 it != e.insertions().end();
-	 ++it)
+    for (const auto* fn : second_->get_functions())
       {
-	for (vector<unsigned>::const_iterator iit =
-	       it->inserted_indexes().begin();
-	     iit != it->inserted_indexes().end();
-	     ++iit)
+	string n = get_function_id_or_pretty_representation(fn);
+	ABG_ASSERT(!n.empty());
+	auto j = first_fns_map.find(n);
+	if (j != first_fns_map.end())
 	  {
-	    unsigned i = *iit;
-	    const function_decl* added_fn = second_->get_functions()[i];
-	    string n = get_function_id_or_pretty_representation(added_fn);
-	    ABG_ASSERT(!n.empty());
-	    // The below is commented out because there can be several
-	    // functions with the same ID in the corpus.  So several
-	    // functions with the same ID can be added.
-	    // ABG_ASSERT(added_fns_.find(n) == added_fns_.end());
-	    string_function_ptr_map::const_iterator j =
-	      deleted_fns_.find(n);
-	    if (j != deleted_fns_.end())
+	    // Function exists in both corpora -- check if changed.
+	    if (*j->second != *fn)
 	      {
 		function_decl_sptr f(const_cast<function_decl*>(j->second),
 				     noop_deleter());
-		function_decl_sptr s(const_cast<function_decl*>(added_fn),
+		function_decl_sptr s(const_cast<function_decl*>(fn),
 				     noop_deleter());
 		function_decl_diff_sptr d = compute_diff(f, s, ctxt);
-		if (*j->second != *added_fn)
-		  changed_fns_map_[j->first] = d;
-		deleted_fns_.erase(j);
+		changed_fns_map_[j->first] = d;
 	      }
-	    else
-	      added_fns_[n] = added_fn;
+	    first_fns_map.erase(j);
 	  }
+	else
+	  added_fns_[n] = fn;
       }
+
+    // Remaining entries in first_fns_map are deleted functions.
+    for (const auto& entry : first_fns_map)
+      deleted_fns_[entry.first] = entry.second;
+
     sort_string_function_decl_diff_sptr_map(changed_fns_map_, changed_fns_);
 
     // Now walk the allegedly deleted functions; check if their
@@ -10008,83 +9994,50 @@ corpus_diff::priv::ensure_lookup_tables_populated()
       added_fns_.erase(*i);
   }
 
+  // Use hash-based set difference to classify variables as
+  // added/deleted/changed, avoiding the quadratic Myers diff.
   {
-    edit_script& e = vars_edit_script_;
-
-    for (vector<deletion>::const_iterator it = e.deletions().begin();
-	 it != e.deletions().end();
-	 ++it)
+    // Build a map of variable ID -> var_decl_sptr for the first corpus.
+    unordered_map<string, var_decl_sptr> first_vars_map;
+    first_vars_map.reserve(first_->get_variables().size());
+    for (const auto& var : first_->get_variables())
       {
-	unsigned i = it->index();
-	ABG_ASSERT(i < first_->get_variables().size());
-
-	const var_decl_sptr deleted_var = first_->get_variables()[i];
-	string n = deleted_var->get_id();
+	string n = var->get_id();
 	ABG_ASSERT(!n.empty());
-	// The below is commented out because there can be several
-	// global variables with the same ID in the corpus.  So
-	// several global variables with the same ID can be deleted.
-	//
-	// In general these are static member variables.  There can be
-	// multiple instances of these (one per translation unit) that
-	// appear to be different, but they are put into a COMDAT
-	// section (with CLOOS ELF binding in modern toolchains) in
-	// the end.
-	//
-	// In that case, let's keep track of the first global variable
-	// removed and let's ignore the subsequent ones as they should
-	// all be "merged" into one by the linker.
-	//
-	// ABG_ASSERT(deleted_vars_.find(n) == deleted_vars_.end());
-	string_var_ptr_map::const_iterator j = deleted_vars_.find(n);
-	if (j != deleted_vars_.end())
-	  {
-	    ABG_ASSERT(is_member_decl(j->second)
-		       && get_member_is_static(j->second));
-	    continue;
-	  }
+	// Keep only the first instance of duplicate IDs (static
+	// member variables from multiple translation units).
+	if (first_vars_map.find(n) == first_vars_map.end())
+	  first_vars_map[n] = var;
 	else
-	  deleted_vars_[n] = deleted_var;
+	  ABG_ASSERT(is_member_decl(var) && get_member_is_static(var));
       }
 
-    for (vector<insertion>::const_iterator it = e.insertions().begin();
-	 it != e.insertions().end();
-	 ++it)
+    for (const auto& var : second_->get_variables())
       {
-	for (vector<unsigned>::const_iterator iit =
-	       it->inserted_indexes().begin();
-	     iit != it->inserted_indexes().end();
-	     ++iit)
+	string n = var->get_id();
+	ABG_ASSERT(!n.empty());
+	auto j = first_vars_map.find(n);
+	if (j != first_vars_map.end())
 	  {
-	    unsigned i = *iit;
-	    const var_decl_sptr added_var = second_->get_variables()[i];
-	    string n = added_var->get_id();
-	    ABG_ASSERT(!n.empty());
-	    {
-	      string_var_ptr_map::const_iterator k = added_vars_.find(n);
-	      if ( k != added_vars_.end())
-		{
-		  ABG_ASSERT(is_member_decl(k->second)
-			     && get_member_is_static(k->second));
-		  continue;
-		}
-	    }
-	    string_var_ptr_map::const_iterator j =
-	      deleted_vars_.find(n);
-	    if (j != deleted_vars_.end())
+	    // Variable exists in both corpora -- check if changed.
+	    if (*j->second != *var)
 	      {
-		if (*j->second != *added_var)
-		  {
-		    var_decl_sptr f = j->second;
-		    var_decl_sptr s = added_var;
-		    changed_vars_map_[n] = compute_diff(f, s, ctxt);
-		  }
-		deleted_vars_.erase(j);
+		var_decl_sptr f = j->second;
+		var_decl_sptr s = var;
+		changed_vars_map_[n] = compute_diff(f, s, ctxt);
 	      }
-	    else
-	      added_vars_[n] = added_var;
+	    first_vars_map.erase(j);
 	  }
+	else if (added_vars_.find(n) == added_vars_.end())
+	  added_vars_[n] = var;
+	else
+	  ABG_ASSERT(is_member_decl(var) && get_member_is_static(var));
       }
+
+    // Remaining entries in first_vars_map are deleted variables.
+    for (const auto& entry : first_vars_map)
+      deleted_vars_[entry.first] = entry.second;
+
     sort_string_var_diff_sptr_map(changed_vars_map_,
 				  sorted_changed_vars_);
 
@@ -10132,128 +10085,6 @@ corpus_diff::priv::ensure_lookup_tables_populated()
       added_vars_.erase(*i);
   }
 
-  // Massage the edit script for added/removed function symbols that
-  // were not referenced by any debug info and turn them into maps of
-  // {symbol_name, symbol}.
-  {
-    edit_script& e = unrefed_fn_syms_edit_script_;
-    for (vector<deletion>::const_iterator it = e.deletions().begin();
-	 it != e.deletions().end();
-	 ++it)
-      {
-	unsigned i = it->index();
-	ABG_ASSERT(i < first_->get_unreferenced_function_symbols().size());
-	elf_symbol_sptr deleted_sym =
-	  first_->get_unreferenced_function_symbols()[i];
-	if (!second_->lookup_function_symbol(*deleted_sym))
-	  deleted_unrefed_fn_syms_[deleted_sym->get_id_string()] = deleted_sym;
-      }
-
-    for (vector<insertion>::const_iterator it = e.insertions().begin();
-	 it != e.insertions().end();
-	 ++it)
-      {
-	for (vector<unsigned>::const_iterator iit =
-	       it->inserted_indexes().begin();
-	     iit != it->inserted_indexes().end();
-	     ++iit)
-	  {
-	    unsigned i = *iit;
-	    ABG_ASSERT(i < second_->get_unreferenced_function_symbols().size());
-	    elf_symbol_sptr added_sym =
-	      second_->get_unreferenced_function_symbols()[i];
-	    if ((deleted_unrefed_fn_syms_.find(added_sym->get_id_string())
-		 == deleted_unrefed_fn_syms_.end()))
-	      {
-		if (!first_->lookup_function_symbol(*added_sym))
-		  {
-		    bool do_add = true;
-		    if (! added_sym->get_version().is_empty()
-			&& added_sym->get_version().is_default())
-		      {
-			// So added_seem has a default version.  If
-			// the former corpus had a symbol with the
-			// same name as added_sym but with *no*
-			// version, then added_sym shouldn't be
-			// considered as a newly added symbol.
-			elf_symbol::version empty_version;
-			if (first_->lookup_function_symbol(added_sym->get_name(),
-							   empty_version))
-			  do_add = false;
-		      }
-
-		    if (do_add)
-		      added_unrefed_fn_syms_[added_sym->get_id_string()] =
-			added_sym;
-		  }
-	      }
-	    else
-	      deleted_unrefed_fn_syms_.erase(added_sym->get_id_string());
-	  }
-      }
-  }
-
-  // Massage the edit script for added/removed variable symbols that
-  // were not referenced by any debug info and turn them into maps of
-  // {symbol_name, symbol}.
-  {
-    edit_script& e = unrefed_var_syms_edit_script_;
-    for (vector<deletion>::const_iterator it = e.deletions().begin();
-	 it != e.deletions().end();
-	 ++it)
-      {
-	unsigned i = it->index();
-	ABG_ASSERT(i < first_->get_unreferenced_variable_symbols().size());
-	elf_symbol_sptr deleted_sym =
-	  first_->get_unreferenced_variable_symbols()[i];
-	if (!second_->lookup_variable_symbol(*deleted_sym))
-	  deleted_unrefed_var_syms_[deleted_sym->get_id_string()] = deleted_sym;
-      }
-
-    for (vector<insertion>::const_iterator it = e.insertions().begin();
-	 it != e.insertions().end();
-	 ++it)
-      {
-	for (vector<unsigned>::const_iterator iit =
-	       it->inserted_indexes().begin();
-	     iit != it->inserted_indexes().end();
-	     ++iit)
-	  {
-	    unsigned i = *iit;
-	    ABG_ASSERT(i < second_->get_unreferenced_variable_symbols().size());
-	    elf_symbol_sptr added_sym =
-	      second_->get_unreferenced_variable_symbols()[i];
-	    if (deleted_unrefed_var_syms_.find(added_sym->get_id_string())
-		== deleted_unrefed_var_syms_.end())
-	      {
-		if (!first_->lookup_variable_symbol(*added_sym))
-		  {
-		    bool do_add = true;
-		    if (! added_sym->get_version().is_empty()
-			&& added_sym->get_version().is_default())
-		      {
-			// So added_seem has a default version.  If
-			// the former corpus had a symbol with the
-			// same name as added_sym but with *no*
-			// version, then added_sym shouldn't be
-			// considered as a newly added symbol.
-			elf_symbol::version empty_version;
-			if (first_->lookup_variable_symbol(added_sym->get_name(),
-							   empty_version))
-			  do_add = false;
-		      }
-
-		    if (do_add)
-		      added_unrefed_var_syms_[added_sym->get_id_string()] =
-			added_sym;
-		  }
-	      }
-	    else
-	      deleted_unrefed_var_syms_.erase(added_sym->get_id_string());
-	  }
-      }
-  }
-
   // Handle the unreachable_types_edit_script_
   {
     edit_script& e = unreachable_types_edit_script_;
@@ -11879,18 +11710,6 @@ corpus_diff::append_child_node(diff_sptr d)
     }
 }
 
-/// @return the bare edit script of the functions changed as recorded
-/// by the diff.
-edit_script&
-corpus_diff::function_changes() const
-{return priv_->fns_edit_script_;}
-
-/// @return the bare edit script of the variables changed as recorded
-/// by the diff.
-edit_script&
-corpus_diff::variable_changes() const
-{return priv_->vars_edit_script_;}
-
 /// Test if the soname of the underlying corpus has changed.
 ///
 /// @return true iff the soname has changed.
@@ -12560,6 +12379,87 @@ corpus_diff::traverse(diff_node_visitor& v)
   return true;
 }
 
+/// Use a hash-based set difference to compute added/deleted
+/// unreferenced symbols between two corpora.  This replaces the
+/// previous Myers diff approach which was O(N*D) and became quadratic
+/// when most symbols differ.  The downstream consumer only needs
+/// unordered maps keyed by symbol id_string, so ordering is not
+/// required.
+///
+/// @param first the first corpus.
+///
+/// @param second the second corpus.
+///
+/// @param first_syms unreferenced symbols from the first corpus.
+///
+/// @param second_syms unreferenced symbols from the second corpus.
+///
+/// @param deleted_syms output map of deleted symbols (in first but
+/// not second), keyed by id_string.
+///
+/// @param added_syms output map of added symbols (in second but not
+/// first), keyed by id_string.
+///
+/// @param lookup_in_corpus a function that looks up whether a symbol
+/// exists in a given corpus.
+template<typename lookup_fn>
+static void
+compute_unreferenced_symbol_set_diff(const corpus_sptr&	first,
+				     const corpus_sptr&	second,
+				     const elf_symbols&	first_syms,
+				     const elf_symbols&	second_syms,
+				     string_elf_symbol_map&	deleted_syms,
+				     string_elf_symbol_map&	added_syms,
+				     lookup_fn			lookup_in_corpus)
+{
+  // Build a map of id_string -> symbol for each side.
+  unordered_map<string, elf_symbol_sptr> first_map, second_map;
+  first_map.reserve(first_syms.size());
+  second_map.reserve(second_syms.size());
+
+  for (const auto& sym : first_syms)
+    first_map[sym->get_id_string()] = sym;
+
+  for (const auto& sym : second_syms)
+    second_map[sym->get_id_string()] = sym;
+
+  // Symbols in first but not in second are candidate deletions.
+  for (const auto& entry : first_map)
+    {
+      if (second_map.find(entry.first) == second_map.end())
+	{
+	  if (!lookup_in_corpus(second, *entry.second))
+	    deleted_syms[entry.first] = entry.second;
+	}
+    }
+
+  // Symbols in second but not in first are candidate additions.
+  for (const auto& entry : second_map)
+    {
+      if (first_map.find(entry.first) != first_map.end())
+	continue;
+
+      if (!lookup_in_corpus(first, *entry.second))
+	{
+	  bool do_add = true;
+	  if (!entry.second->get_version().is_empty()
+	      && entry.second->get_version().is_default())
+	    {
+	      // added_sym has a default version.  If the former
+	      // corpus had a symbol with the same name but with
+	      // *no* version, then it shouldn't be considered as
+	      // newly added.
+	      elf_symbol::version empty_version;
+	      if (lookup_in_corpus(first, entry.second->get_name(),
+				   empty_version))
+		do_add = false;
+	    }
+	  if (do_add)
+	    added_syms[entry.first] = entry.second;
+	}
+    }
+}
+
 /// Compute the diff between two instances of @ref corpus.
 ///
 /// Note that the two corpora must have been created in the same @ref
@@ -12577,9 +12477,6 @@ compute_diff(const corpus_sptr	f,
 	     const corpus_sptr	s,
 	     diff_context_sptr	ctxt)
 {
-  typedef corpus::functions::const_iterator fns_it_type;
-  typedef corpus::variables::const_iterator vars_it_type;
-  typedef elf_symbols::const_iterator symbols_it_type;
   typedef diff_utils::deep_ptr_eq_functor eq_type;
   typedef vector<type_base_wptr>::const_iterator type_base_wptr_it_type;
 
@@ -12600,36 +12497,35 @@ compute_diff(const corpus_sptr	f,
   r->priv_->architectures_equal_ =
     f->get_architecture_name() == s->get_architecture_name();
 
-  // Compute the diff of publicly defined and exported functions
-  diff_utils::compute_diff<fns_it_type, eq_type>(f->get_functions().begin(),
-						 f->get_functions().end(),
-						 s->get_functions().begin(),
-						 s->get_functions().end(),
-						 r->priv_->fns_edit_script_);
-
-  // Compute the diff of publicly defined and exported variables.
-  diff_utils::compute_diff<vars_it_type, eq_type>
-    (f->get_variables().begin(), f->get_variables().end(),
-     s->get_variables().begin(), s->get_variables().end(),
-     r->priv_->vars_edit_script_);
+  // Note: the diff of functions, variables, and unreferenced symbols
+  // is computed by
+  // compare_fns_vars_and_ensure_lookup_tables_populated() below using
+  // hash-based set difference, which avoids the quadratic behavior of
+  // Myers diff when most entries differ between large corpora.
 
   // Compute the diff of function elf symbols not referenced by debug
-  // info.
-  diff_utils::compute_diff<symbols_it_type, eq_type>
-    (f->get_unreferenced_function_symbols().begin(),
-     f->get_unreferenced_function_symbols().end(),
-     s->get_unreferenced_function_symbols().begin(),
-     s->get_unreferenced_function_symbols().end(),
-     r->priv_->unrefed_fn_syms_edit_script_);
+  // info.  Use hash-based set difference instead of Myers diff to
+  // avoid quadratic behavior when most symbols differ.
+  compute_unreferenced_symbol_set_diff
+    (f, s,
+     f->get_unreferenced_function_symbols(),
+     s->get_unreferenced_function_symbols(),
+     r->priv_->deleted_unrefed_fn_syms_,
+     r->priv_->added_unrefed_fn_syms_,
+     [](const auto& c, auto&&... args)
+     {return c->lookup_function_symbol(std::forward<decltype(args)>(args)...);});
 
   // Compute the diff of variable elf symbols not referenced by debug
-  // info.
-    diff_utils::compute_diff<symbols_it_type, eq_type>
-    (f->get_unreferenced_variable_symbols().begin(),
-     f->get_unreferenced_variable_symbols().end(),
-     s->get_unreferenced_variable_symbols().begin(),
-     s->get_unreferenced_variable_symbols().end(),
-     r->priv_->unrefed_var_syms_edit_script_);
+  // info.  Use hash-based set difference instead of Myers diff to
+  // avoid quadratic behavior when most symbols differ.
+  compute_unreferenced_symbol_set_diff
+    (f, s,
+     f->get_unreferenced_variable_symbols(),
+     s->get_unreferenced_variable_symbols(),
+     r->priv_->deleted_unrefed_var_syms_,
+     r->priv_->added_unrefed_var_syms_,
+     [](const auto& c, auto&&... args)
+     {return c->lookup_variable_symbol(std::forward<decltype(args)>(args)...);});
 
     if (ctxt->show_unreachable_types())
       // Compute the diff of types not reachable from public functions
@@ -12641,7 +12537,7 @@ compute_diff(const corpus_sptr	f,
 	 s->get_types_not_reachable_from_public_interfaces().end(),
 	 r->priv_->unreachable_types_edit_script_);
 
-  r->priv_->ensure_lookup_tables_populated();
+  r->priv_->compare_fns_vars_and_ensure_lookup_tables_populated();
 
   return r;
 }
  

Patch

diff --git a/include/abg-comparison.h b/include/abg-comparison.h
index b4ae3f5f..56f75e4f 100644
--- a/include/abg-comparison.h
+++ b/include/abg-comparison.h
@@ -2536,12 +2536,6 @@  public:
   void
   append_child_node(diff_sptr);
 
-  edit_script&
-  function_changes() const;
-
-  edit_script&
-  variable_changes() const;
-
   bool
   soname_changed() const;
 
diff --git a/src/abg-comparison-priv.h b/src/abg-comparison-priv.h
index fbd797c6..fa78e8c6 100644
--- a/src/abg-comparison-priv.h
+++ b/src/abg-comparison-priv.h
@@ -1073,10 +1073,6 @@  struct corpus_diff::priv
   corpus_diff::diff_stats_sptr		diff_stats_;
   bool					sonames_equal_;
   bool					architectures_equal_;
-  edit_script				fns_edit_script_;
-  edit_script				vars_edit_script_;
-  edit_script				unrefed_fn_syms_edit_script_;
-  edit_script				unrefed_var_syms_edit_script_;
   string_function_ptr_map		deleted_fns_;
   string_function_ptr_map		suppressed_deleted_fns_;
   string_function_ptr_map		added_fns_;
diff --git a/src/abg-comparison.cc b/src/abg-comparison.cc
index e7897844..1f84e04f 100644
--- a/src/abg-comparison.cc
+++ b/src/abg-comparison.cc
@@ -9906,60 +9906,46 @@  corpus_diff::priv::ensure_lookup_tables_populated()
 
   diff_context_sptr ctxt = get_context();
 
+  // Use hash-based set difference to classify functions as
+  // added/deleted/changed, avoiding the quadratic Myers diff.
   {
-    edit_script& e = fns_edit_script_;
-
-    for (vector<deletion>::const_iterator it = e.deletions().begin();
-	 it != e.deletions().end();
-	 ++it)
+    // Build a map of function ID -> function_decl* for each corpus.
+    unordered_map<string, const function_decl*> first_fns_map;
+    first_fns_map.reserve(first_->get_functions().size());
+    for (const auto* fn : first_->get_functions())
       {
-	unsigned i = it->index();
-	ABG_ASSERT(i < first_->get_functions().size());
-
-	const function_decl* deleted_fn = first_->get_functions()[i];
-	string n = get_function_id_or_pretty_representation(deleted_fn);
+	string n = get_function_id_or_pretty_representation(fn);
 	ABG_ASSERT(!n.empty());
-	// The below is commented out because there can be several
-	// functions with the same ID in the corpus.  So several
-	// functions with the same ID can be deleted.
-	// ABG_ASSERT(deleted_fns_.find(n) == deleted_fns_.end());
-	deleted_fns_[n] = deleted_fn;
+	first_fns_map[n] = fn;
       }
 
-    for (vector<insertion>::const_iterator it = e.insertions().begin();
-	 it != e.insertions().end();
-	 ++it)
+    for (const auto* fn : second_->get_functions())
       {
-	for (vector<unsigned>::const_iterator iit =
-	       it->inserted_indexes().begin();
-	     iit != it->inserted_indexes().end();
-	     ++iit)
+	string n = get_function_id_or_pretty_representation(fn);
+	ABG_ASSERT(!n.empty());
+	auto j = first_fns_map.find(n);
+	if (j != first_fns_map.end())
 	  {
-	    unsigned i = *iit;
-	    const function_decl* added_fn = second_->get_functions()[i];
-	    string n = get_function_id_or_pretty_representation(added_fn);
-	    ABG_ASSERT(!n.empty());
-	    // The below is commented out because there can be several
-	    // functions with the same ID in the corpus.  So several
-	    // functions with the same ID can be added.
-	    // ABG_ASSERT(added_fns_.find(n) == added_fns_.end());
-	    string_function_ptr_map::const_iterator j =
-	      deleted_fns_.find(n);
-	    if (j != deleted_fns_.end())
+	    // Function exists in both corpora -- check if changed.
+	    if (*j->second != *fn)
 	      {
 		function_decl_sptr f(const_cast<function_decl*>(j->second),
 				     noop_deleter());
-		function_decl_sptr s(const_cast<function_decl*>(added_fn),
+		function_decl_sptr s(const_cast<function_decl*>(fn),
 				     noop_deleter());
 		function_decl_diff_sptr d = compute_diff(f, s, ctxt);
-		if (*j->second != *added_fn)
-		  changed_fns_map_[j->first] = d;
-		deleted_fns_.erase(j);
+		changed_fns_map_[j->first] = d;
 	      }
-	    else
-	      added_fns_[n] = added_fn;
+	    first_fns_map.erase(j);
 	  }
+	else
+	  added_fns_[n] = fn;
       }
+
+    // Remaining entries in first_fns_map are deleted functions.
+    for (const auto& entry : first_fns_map)
+      deleted_fns_[entry.first] = entry.second;
+
     sort_string_function_decl_diff_sptr_map(changed_fns_map_, changed_fns_);
 
     // Now walk the allegedly deleted functions; check if their
@@ -10008,83 +9994,50 @@  corpus_diff::priv::ensure_lookup_tables_populated()
       added_fns_.erase(*i);
   }
 
+  // Use hash-based set difference to classify variables as
+  // added/deleted/changed, avoiding the quadratic Myers diff.
   {
-    edit_script& e = vars_edit_script_;
-
-    for (vector<deletion>::const_iterator it = e.deletions().begin();
-	 it != e.deletions().end();
-	 ++it)
+    // Build a map of variable ID -> var_decl_sptr for the first corpus.
+    unordered_map<string, var_decl_sptr> first_vars_map;
+    first_vars_map.reserve(first_->get_variables().size());
+    for (const auto& var : first_->get_variables())
       {
-	unsigned i = it->index();
-	ABG_ASSERT(i < first_->get_variables().size());
-
-	const var_decl_sptr deleted_var = first_->get_variables()[i];
-	string n = deleted_var->get_id();
+	string n = var->get_id();
 	ABG_ASSERT(!n.empty());
-	// The below is commented out because there can be several
-	// global variables with the same ID in the corpus.  So
-	// several global variables with the same ID can be deleted.
-	//
-	// In general these are static member variables.  There can be
-	// multiple instances of these (one per translation unit) that
-	// appear to be different, but they are put into a COMDAT
-	// section (with CLOOS ELF binding in modern toolchains) in
-	// the end.
-	//
-	// In that case, let's keep track of the first global variable
-	// removed and let's ignore the subsequent ones as they should
-	// all be "merged" into one by the linker.
-	//
-	// ABG_ASSERT(deleted_vars_.find(n) == deleted_vars_.end());
-	string_var_ptr_map::const_iterator j = deleted_vars_.find(n);
-	if (j != deleted_vars_.end())
-	  {
-	    ABG_ASSERT(is_member_decl(j->second)
-		       && get_member_is_static(j->second));
-	    continue;
-	  }
+	// Keep only the first instance of duplicate IDs (static
+	// member variables from multiple translation units).
+	if (first_vars_map.find(n) == first_vars_map.end())
+	  first_vars_map[n] = var;
 	else
-	  deleted_vars_[n] = deleted_var;
+	  ABG_ASSERT(is_member_decl(var) && get_member_is_static(var));
       }
 
-    for (vector<insertion>::const_iterator it = e.insertions().begin();
-	 it != e.insertions().end();
-	 ++it)
+    for (const auto& var : second_->get_variables())
       {
-	for (vector<unsigned>::const_iterator iit =
-	       it->inserted_indexes().begin();
-	     iit != it->inserted_indexes().end();
-	     ++iit)
+	string n = var->get_id();
+	ABG_ASSERT(!n.empty());
+	auto j = first_vars_map.find(n);
+	if (j != first_vars_map.end())
 	  {
-	    unsigned i = *iit;
-	    const var_decl_sptr added_var = second_->get_variables()[i];
-	    string n = added_var->get_id();
-	    ABG_ASSERT(!n.empty());
-	    {
-	      string_var_ptr_map::const_iterator k = added_vars_.find(n);
-	      if ( k != added_vars_.end())
-		{
-		  ABG_ASSERT(is_member_decl(k->second)
-			     && get_member_is_static(k->second));
-		  continue;
-		}
-	    }
-	    string_var_ptr_map::const_iterator j =
-	      deleted_vars_.find(n);
-	    if (j != deleted_vars_.end())
+	    // Variable exists in both corpora -- check if changed.
+	    if (*j->second != *var)
 	      {
-		if (*j->second != *added_var)
-		  {
-		    var_decl_sptr f = j->second;
-		    var_decl_sptr s = added_var;
-		    changed_vars_map_[n] = compute_diff(f, s, ctxt);
-		  }
-		deleted_vars_.erase(j);
+		var_decl_sptr f = j->second;
+		var_decl_sptr s = var;
+		changed_vars_map_[n] = compute_diff(f, s, ctxt);
 	      }
-	    else
-	      added_vars_[n] = added_var;
+	    first_vars_map.erase(j);
 	  }
+	else if (added_vars_.find(n) == added_vars_.end())
+	  added_vars_[n] = var;
+	else
+	  ABG_ASSERT(is_member_decl(var) && get_member_is_static(var));
       }
+
+    // Remaining entries in first_vars_map are deleted variables.
+    for (const auto& entry : first_vars_map)
+      deleted_vars_[entry.first] = entry.second;
+
     sort_string_var_diff_sptr_map(changed_vars_map_,
 				  sorted_changed_vars_);
 
@@ -10132,128 +10085,6 @@  corpus_diff::priv::ensure_lookup_tables_populated()
       added_vars_.erase(*i);
   }
 
-  // Massage the edit script for added/removed function symbols that
-  // were not referenced by any debug info and turn them into maps of
-  // {symbol_name, symbol}.
-  {
-    edit_script& e = unrefed_fn_syms_edit_script_;
-    for (vector<deletion>::const_iterator it = e.deletions().begin();
-	 it != e.deletions().end();
-	 ++it)
-      {
-	unsigned i = it->index();
-	ABG_ASSERT(i < first_->get_unreferenced_function_symbols().size());
-	elf_symbol_sptr deleted_sym =
-	  first_->get_unreferenced_function_symbols()[i];
-	if (!second_->lookup_function_symbol(*deleted_sym))
-	  deleted_unrefed_fn_syms_[deleted_sym->get_id_string()] = deleted_sym;
-      }
-
-    for (vector<insertion>::const_iterator it = e.insertions().begin();
-	 it != e.insertions().end();
-	 ++it)
-      {
-	for (vector<unsigned>::const_iterator iit =
-	       it->inserted_indexes().begin();
-	     iit != it->inserted_indexes().end();
-	     ++iit)
-	  {
-	    unsigned i = *iit;
-	    ABG_ASSERT(i < second_->get_unreferenced_function_symbols().size());
-	    elf_symbol_sptr added_sym =
-	      second_->get_unreferenced_function_symbols()[i];
-	    if ((deleted_unrefed_fn_syms_.find(added_sym->get_id_string())
-		 == deleted_unrefed_fn_syms_.end()))
-	      {
-		if (!first_->lookup_function_symbol(*added_sym))
-		  {
-		    bool do_add = true;
-		    if (! added_sym->get_version().is_empty()
-			&& added_sym->get_version().is_default())
-		      {
-			// So added_seem has a default version.  If
-			// the former corpus had a symbol with the
-			// same name as added_sym but with *no*
-			// version, then added_sym shouldn't be
-			// considered as a newly added symbol.
-			elf_symbol::version empty_version;
-			if (first_->lookup_function_symbol(added_sym->get_name(),
-							   empty_version))
-			  do_add = false;
-		      }
-
-		    if (do_add)
-		      added_unrefed_fn_syms_[added_sym->get_id_string()] =
-			added_sym;
-		  }
-	      }
-	    else
-	      deleted_unrefed_fn_syms_.erase(added_sym->get_id_string());
-	  }
-      }
-  }
-
-  // Massage the edit script for added/removed variable symbols that
-  // were not referenced by any debug info and turn them into maps of
-  // {symbol_name, symbol}.
-  {
-    edit_script& e = unrefed_var_syms_edit_script_;
-    for (vector<deletion>::const_iterator it = e.deletions().begin();
-	 it != e.deletions().end();
-	 ++it)
-      {
-	unsigned i = it->index();
-	ABG_ASSERT(i < first_->get_unreferenced_variable_symbols().size());
-	elf_symbol_sptr deleted_sym =
-	  first_->get_unreferenced_variable_symbols()[i];
-	if (!second_->lookup_variable_symbol(*deleted_sym))
-	  deleted_unrefed_var_syms_[deleted_sym->get_id_string()] = deleted_sym;
-      }
-
-    for (vector<insertion>::const_iterator it = e.insertions().begin();
-	 it != e.insertions().end();
-	 ++it)
-      {
-	for (vector<unsigned>::const_iterator iit =
-	       it->inserted_indexes().begin();
-	     iit != it->inserted_indexes().end();
-	     ++iit)
-	  {
-	    unsigned i = *iit;
-	    ABG_ASSERT(i < second_->get_unreferenced_variable_symbols().size());
-	    elf_symbol_sptr added_sym =
-	      second_->get_unreferenced_variable_symbols()[i];
-	    if (deleted_unrefed_var_syms_.find(added_sym->get_id_string())
-		== deleted_unrefed_var_syms_.end())
-	      {
-		if (!first_->lookup_variable_symbol(*added_sym))
-		  {
-		    bool do_add = true;
-		    if (! added_sym->get_version().is_empty()
-			&& added_sym->get_version().is_default())
-		      {
-			// So added_seem has a default version.  If
-			// the former corpus had a symbol with the
-			// same name as added_sym but with *no*
-			// version, then added_sym shouldn't be
-			// considered as a newly added symbol.
-			elf_symbol::version empty_version;
-			if (first_->lookup_variable_symbol(added_sym->get_name(),
-							   empty_version))
-			  do_add = false;
-		      }
-
-		    if (do_add)
-		      added_unrefed_var_syms_[added_sym->get_id_string()] =
-			added_sym;
-		  }
-	      }
-	    else
-	      deleted_unrefed_var_syms_.erase(added_sym->get_id_string());
-	  }
-      }
-  }
-
   // Handle the unreachable_types_edit_script_
   {
     edit_script& e = unreachable_types_edit_script_;
@@ -11879,18 +11710,6 @@  corpus_diff::append_child_node(diff_sptr d)
     }
 }
 
-/// @return the bare edit script of the functions changed as recorded
-/// by the diff.
-edit_script&
-corpus_diff::function_changes() const
-{return priv_->fns_edit_script_;}
-
-/// @return the bare edit script of the variables changed as recorded
-/// by the diff.
-edit_script&
-corpus_diff::variable_changes() const
-{return priv_->vars_edit_script_;}
-
 /// Test if the soname of the underlying corpus has changed.
 ///
 /// @return true iff the soname has changed.
@@ -12560,6 +12379,88 @@  corpus_diff::traverse(diff_node_visitor& v)
   return true;
 }
 
+/// Use a hash-based set difference to compute added/deleted
+/// unreferenced symbols between two corpora.  This replaces the
+/// previous Myers diff approach which was O(N*D) and became quadratic
+/// when most symbols differ.  The downstream consumer only needs
+/// unordered maps keyed by symbol id_string, so ordering is not
+/// required.
+///
+/// @param first the first corpus.
+///
+/// @param second the second corpus.
+///
+/// @param first_syms unreferenced symbols from the first corpus.
+///
+/// @param second_syms unreferenced symbols from the second corpus.
+///
+/// @param deleted_syms output map of deleted symbols (in first but
+/// not second), keyed by id_string.
+///
+/// @param added_syms output map of added symbols (in second but not
+/// first), keyed by id_string.
+///
+/// @param lookup_in_corpus a function that looks up whether a symbol
+/// exists in a given corpus.
+template<typename lookup_fn>
+static void
+compute_unreferenced_symbol_set_difference
+(const corpus_sptr&		first,
+ const corpus_sptr&		second,
+ const elf_symbols&		first_syms,
+ const elf_symbols&		second_syms,
+ string_elf_symbol_map&		deleted_syms,
+ string_elf_symbol_map&		added_syms,
+ lookup_fn			lookup_in_corpus)
+{
+  // Build a map of id_string -> symbol for each side.
+  unordered_map<string, elf_symbol_sptr> first_map, second_map;
+  first_map.reserve(first_syms.size());
+  second_map.reserve(second_syms.size());
+
+  for (const auto& sym : first_syms)
+    first_map[sym->get_id_string()] = sym;
+
+  for (const auto& sym : second_syms)
+    second_map[sym->get_id_string()] = sym;
+
+  // Symbols in first but not in second are candidate deletions.
+  for (const auto& entry : first_map)
+    {
+      if (second_map.find(entry.first) == second_map.end())
+	{
+	  if (!lookup_in_corpus(second, *entry.second))
+	    deleted_syms[entry.first] = entry.second;
+	}
+    }
+
+  // Symbols in second but not in first are candidate additions.
+  for (const auto& entry : second_map)
+    {
+      if (first_map.find(entry.first) != first_map.end())
+	continue;
+
+      if (!lookup_in_corpus(first, *entry.second))
+	{
+	  bool do_add = true;
+	  if (!entry.second->get_version().is_empty()
+	      && entry.second->get_version().is_default())
+	    {
+	      // added_sym has a default version.  If the former
+	      // corpus had a symbol with the same name but with
+	      // *no* version, then it shouldn't be considered as
+	      // newly added.
+	      elf_symbol::version empty_version;
+	      if (lookup_in_corpus(first, entry.second->get_name(),
+				   empty_version))
+		do_add = false;
+	    }
+	  if (do_add)
+	    added_syms[entry.first] = entry.second;
+	}
+    }
+}
+
 /// Compute the diff between two instances of @ref corpus.
 ///
 /// Note that the two corpora must have been created in the same @ref
@@ -12577,9 +12478,6 @@  compute_diff(const corpus_sptr	f,
 	     const corpus_sptr	s,
 	     diff_context_sptr	ctxt)
 {
-  typedef corpus::functions::const_iterator fns_it_type;
-  typedef corpus::variables::const_iterator vars_it_type;
-  typedef elf_symbols::const_iterator symbols_it_type;
   typedef diff_utils::deep_ptr_eq_functor eq_type;
   typedef vector<type_base_wptr>::const_iterator type_base_wptr_it_type;
 
@@ -12600,36 +12498,34 @@  compute_diff(const corpus_sptr	f,
   r->priv_->architectures_equal_ =
     f->get_architecture_name() == s->get_architecture_name();
 
-  // Compute the diff of publicly defined and exported functions
-  diff_utils::compute_diff<fns_it_type, eq_type>(f->get_functions().begin(),
-						 f->get_functions().end(),
-						 s->get_functions().begin(),
-						 s->get_functions().end(),
-						 r->priv_->fns_edit_script_);
-
-  // Compute the diff of publicly defined and exported variables.
-  diff_utils::compute_diff<vars_it_type, eq_type>
-    (f->get_variables().begin(), f->get_variables().end(),
-     s->get_variables().begin(), s->get_variables().end(),
-     r->priv_->vars_edit_script_);
+  // Note: the diff of functions, variables, and unreferenced symbols
+  // is computed by ensure_lookup_tables_populated() below using
+  // hash-based set difference, which avoids the quadratic behavior of
+  // Myers diff when most entries differ between large corpora.
 
   // Compute the diff of function elf symbols not referenced by debug
-  // info.
-  diff_utils::compute_diff<symbols_it_type, eq_type>
-    (f->get_unreferenced_function_symbols().begin(),
-     f->get_unreferenced_function_symbols().end(),
-     s->get_unreferenced_function_symbols().begin(),
-     s->get_unreferenced_function_symbols().end(),
-     r->priv_->unrefed_fn_syms_edit_script_);
+  // info.  Use hash-based set difference instead of Myers diff to
+  // avoid quadratic behavior when most symbols differ.
+  compute_unreferenced_symbol_set_difference
+    (f, s,
+     f->get_unreferenced_function_symbols(),
+     s->get_unreferenced_function_symbols(),
+     r->priv_->deleted_unrefed_fn_syms_,
+     r->priv_->added_unrefed_fn_syms_,
+     [](const auto& c, auto&&... args)
+     { return c->lookup_function_symbol(std::forward<decltype(args)>(args)...); });
 
   // Compute the diff of variable elf symbols not referenced by debug
-  // info.
-    diff_utils::compute_diff<symbols_it_type, eq_type>
-    (f->get_unreferenced_variable_symbols().begin(),
-     f->get_unreferenced_variable_symbols().end(),
-     s->get_unreferenced_variable_symbols().begin(),
-     s->get_unreferenced_variable_symbols().end(),
-     r->priv_->unrefed_var_syms_edit_script_);
+  // info.  Use hash-based set difference instead of Myers diff to
+  // avoid quadratic behavior when most symbols differ.
+  compute_unreferenced_symbol_set_difference
+    (f, s,
+     f->get_unreferenced_variable_symbols(),
+     s->get_unreferenced_variable_symbols(),
+     r->priv_->deleted_unrefed_var_syms_,
+     r->priv_->added_unrefed_var_syms_,
+     [](const auto& c, auto&&... args)
+     { return c->lookup_variable_symbol(std::forward<decltype(args)>(args)...); });
 
     if (ctxt->show_unreachable_types())
       // Compute the diff of types not reachable from public functions