[26/40] Optimize .gdb_index symbol name searching

Message ID 1496406158-12663-27-git-send-email-palves@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves June 2, 2017, 12:22 p.m. UTC
  As mentioned in the previous patch, .gdb_index name lookup got
significantly slower with the previous patch.

This patch addresses that, and in the process makes .gdb_index name
searching faster than what we had before the previous patch, even.
Using the same test:

 $ cat script.cmd
 set pagination off
 set $count = 0
 while $count < 400
   complete b string_prin
   printf "count = %d\n", $count
   set $count = $count + 1
 end

 $ time gdb --batch -q ./gdb-with-index -ex "source script.cmd"

I got, before the previous patch (-O2, x86-64):

 real    0m1.773s
 user    0m1.737s
 sys     0m0.040s

and after this patch:

 real    0m1.361s
 user    0m1.315s
 sys     0m0.040s

The basic idea here is simple: instead of always iterating over all
the symbol names in the index, we build an accelerator/sorted name
table and binary search names in it.

Later in the series, we'll want to support wild matching for C++ too,
so this mechanism already considers that.  For example, say that
you're looking up functions/methods named "func", no matter the
containing namespace/class.  If we sorted the table by qualified name,
then we obviously wouldn't be able to find those symbols with a binary
search:

  func
  ns1::a::b::func
  ns1::b::func
  ns2::func

(function symbol names in .gdb_index have no parameter info, like psymbols)

To address that out, we put an entry for each name component in the
sorted table.  something like this:

  Table Entry       Actual symbol
  ---------------------------------
  func              func

  func              ns1::a::b::func
  b::func           ns1::a::b::func
  a::b::func        ns1::a::b::func
  ns1::a::b::func   ns1::a::b::func

  func              ns1::b::func
  b::func           ns1::b::func
  ns1::b::func      ns1::b::func

  func              ns2::func
  ns2::func         ns2::func

Which sorted results in this:

  Table Entry       Actual symbol
  ---------------------------------
  a::b::func        ns1::a::b::func
  b::func           ns1::a::b::func
  b::func           ns1::b::func
  func              func
  func              ns1::a::b::func
  func              ns1::b::func
  func              ns2::func
  ns1::a::b::func   ns1::a::b::func
  ns1::b::func      ns1::b::func
  ns2::func         ns2::func

And we can binary search this.

Note that a binary search approach works for both completion and
regular lookup, while a name hashing approach only works for normal
symbol looking, since obviously "fun" and "func" have different
hashes.

At first I was a bit wary of these tables potentially growing GDB's
memory significantly.  But I did an experiment that convinced it's not
a worry at all.  I hacked gdb to count the total number of entries in
all the tables, attached that gdb to my system/Fedora's Firefox
(Fedora's debug packages uses .gdb_index), did "set max-completions
unlimited", and then hit "b [TAB]" to cause everything to expand.

That resulted in 1351355 name_components.  Each entry takes 8 bytes,
so that's 10810840 bytes (ignoring std::vector overhead), or ~10.3 MB.
That's IMO too small to worry about, given GDB was using over 7400MB
total at that point.  I.e., we're talking about 0.1% increase.

dw2_expand_symtabs_matching unit tests covering this will be added in
a follow up patch.

If the size of this table turns out to be a concern, I have an idea to
reduce the size of the table further at the expense of a bit more code
-- the vast majority of the name offsets are either 0 or fit in
8-bits:

 total name_component = 1351355, of which,
 name_component::name_offset instances need  0 bits = 679531
 name_component::name_offset instances need  8 bits = 669526
 name_component::name_offset instances need 16 bits = 2298
 name_component::name_offset instances need 32 bits = 0
 name_component::idx instances need 0 bits  = 51
 name_component::idx instances need 8 bits  = 8361
 name_component::idx instances need 16 bits = 280329
 name_component::idx instances need 32 bits = 1062614

so we could have separate tables for 0 name_offset, 8-bit name_offset
and 32-bit name_offset.  That'd give us roughly:

 679531 * 0 + 669526 * 1 + 2298 * 4 + 1062614 * 4 = 4929174, or ~4.7MB

with only 8-bit and 32-bit tables, that'd be:

 1349057 * 1 + 2298 * 4 + 4 * 1351355 = 6763669 bytes, or ~6.5MB.

I don't think we need to bother though.

I also timed:

 $ time gdb --batch -q -p `pidof firefox`
 $ time gdb --batch -q -p `pidof firefox` -ex "b main"
 $ time gdb --batch -q -p `pidof firefox` -ex "set max-completion unlimited" -ex "complete b "

and compared before previous patch vs this patch, and I didn't see a
significant difference, seemingly because time to read debug info
dominates.  The "complete b " variant of the test takes ~2min
currently...  (I have a follow up series that speeds that up
somewhat.)

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	* dwarf2read.c (byte_swap, MAYBE_SWAP): Move higher up in file.
	(struct name_component): New.
	(mapped_index::name_components): New field.
	(mapped_index::symbol_name_at): New method.
	(create_addrmap_from_index): Call mapped_index ctor.
	(dw2_map_matching_symbols): Add comment about name_components
	table.
	(dw2_expand_symtabs_matching): Factor part to...
	(dw2_expand_symtabs_matching_symbol): ... this new function.
	Build name components table, and lookup symbols in it before
	calling the name matcher.
	(dw2_expand_marked_cus): New, factored out from
	dw2_expand_symtabs_matching.
	(dwarf2_per_objfile_free): Call the mapped_index's dtor.
---
 gdb/dwarf2read.c | 323 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 281 insertions(+), 42 deletions(-)
  

Comments

Keith Seitz Aug. 8, 2017, 8:31 p.m. UTC | #1
On 06/02/2017 05:22 AM, Pedro Alves wrote:

> I got, before the previous patch (-O2, x86-64):
> 
>  real    0m1.773s
>  user    0m1.737s
>  sys     0m0.040s
> 
> and after this patch:
> 
>  real    0m1.361s
>  user    0m1.315s
>  sys     0m0.040s

The results on my computer are slightly more dramatic, running with no
optimization, your test case (using Fedora 21 system gdb w/index debuginfo)
goes from about 15 seconds down to about 2.5 seconds. Very nice!

> That resulted in 1351355 name_components.  Each entry takes 8 bytes,
> so that's 10810840 bytes (ignoring std::vector overhead), or ~10.3 MB.
> That's IMO too small to worry about, given GDB was using over 7400MB
> total at that point.  I.e., we're talking about 0.1% increase.

Indeed. I'd sacrifice that kind of memory for the kind of speed increase
you've achieved -- in a heartbeat!

> with only 8-bit and 32-bit tables, that'd be:
> 
>  1349057 * 1 + 2298 * 4 + 4 * 1351355 = 6763669 bytes, or ~6.5MB.
> 
> I don't think we need to bother though.

I'm all for memory usage optimization and whatnot, but since the benefit is
so small (55% of these new tables saved but only 0.06% of total memory),
I prefer simplicity. So you won't get anything but agreement from me on this.

> I also timed:
> 
>  $ time gdb --batch -q -p `pidof firefox`
>  $ time gdb --batch -q -p `pidof firefox` -ex "b main"
>  $ time gdb --batch -q -p `pidof firefox` -ex "set max-completion unlimited" -ex "complete b "

I'd like to reproduce this, but my computer is incapable of running this test.
I'll take your word for it. ;-)

> gdb/ChangeLog:
> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>
> 
> 	* dwarf2read.c 
> 	(mapped_index::name_components): New field.
> 	(mapped_index::symbol_name_at): New method.

Silly nit: Isn't the form most are using "(tag name) <field>: New field."?
I know I've relied on this several times to find changes in the ChangeLog.

> 	(create_addrmap_from_index): Call mapped_index ctor.

I don't see any changes to this function in the patch -- attributed to wrong
function?

> diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
> index f523326..e955131 100644
> --- a/gdb/dwarf2read.c
> +++ b/gdb/dwarf2read.c
> @@ -178,6 +178,51 @@ DEF_VEC_I (offset_type);
[snip]
> +
> +/* An index into a (C++) symbol name component in a symbol name as
> +   recorded in the mapped_index's symbol table.  For each C++ symbol
> +   in the symbol table, we record one entry for the start of each
> +   component in the symbol in a table of name components, and then
> +   sort the table, in order to be able to binary search symbol names,
> +   ignoring leading namespaces, both completion and regular look up.
> +   For example, for symbol "A::B::C", we'll have an entry that points
> +   to "A::B::C", another that points to "B::C", and another for "C".
> +   Note that function symbols in GDB index have no parameter
> +   information, just the function/method names.  You can convert a
> +   name_component to a "const char *" using the
> +   'mapped_index::symbol_name_at(offset_type)' method.  */

missing nl?

> +struct name_component
> +{
> +  /* Offset in the symbol name where the component starts.  Stored as
> +     a (32-bit) offset instead of a pointer to save memory and improve
> +     locality on 64-bit architectures.  */
> +  offset_type name_offset;
> +
> +  /* The symbol's index in the symbol and constant pool tables of a
> +     mapped_index.  */
> +  offset_type idx;
> +};
> +
>  /* A description of the mapped index.  The file format is described in
>     a comment by the code that writes the index.  */
>  struct mapped_index
> @@ -3390,6 +3424,7 @@ dwarf2_read_index (struct objfile *objfile)
>    create_addrmap_from_index (objfile, &local_map);
>  
>    map = XOBNEW (&objfile->objfile_obstack, struct mapped_index);
> +  map = new (map) mapped_index ();
>    *map = local_map;
>  
>    dwarf2_per_objfile->index_table = map;

This function (dwarf2_read_index) is not mentioned in the ChangeLog. Could
this actually be incorrectly listed in the ChangeLog under
create_addrmap_from_index?

> @@ -4095,6 +4134,22 @@ gdb_index_symbol_name_matcher::matches (const char *symbol_name)
>  }
>  
>  static void
> +dw2_expand_marked_cus
> +  (mapped_index &index, offset_type idx,
> +   struct objfile *objfile,
> +   gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
> +   gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
> +   search_domain kind);
> +
> +static void
> +dw2_expand_symtabs_matching_symbol
> +  (mapped_index &index,
> +   const lookup_name_info &lookup_name_in,
> +   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
> +   enum search_domain kind,
> +   gdb::function_view<void (offset_type)> on_match);

Isn't it rather unusual for us to have forward decls in the middle of a file?

> +
> +static void
>  dw2_expand_symtabs_matching
>    (struct objfile *objfile,
>     gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
> @@ -4186,30 +4239,214 @@ dw2_expand_symtabs_matching
[snip]
> +static void
> +dw2_expand_symtabs_matching_symbol
> +  (mapped_index &index,
> +   const lookup_name_info &lookup_name,
> +   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
> +   enum search_domain kind,
> +   gdb::function_view<void (offset_type)> match_callback)
> +{
[snip]
> +
> +      /* Sort name_comp elements by name.   */

I presume that "name_comp" is really "name_components"?

[snip]

> +  std::vector<offset_type> matches;
> +  matches.reserve (std::distance (lower, upper));
> +
> +  for (;lower != upper; ++lower)
> +    {
> +      const char *qualified = index.symbol_name_at (lower->idx);
> +
> +      if (!lookup_name_matcher.matches (qualified)
> +	  || (symbol_matcher != NULL && !symbol_matcher (qualified)))
>  	continue;
>  
> -      /* The name was matched, now expand corresponding CUs that were
> -	 marked.  */
> -      vec = (offset_type *) (index->constant_pool
> -			     + MAYBE_SWAP (index->symbol_table[idx + 1]));
> +      matches.push_back (lower->idx);
> +    }
> +
> +  std::sort (matches.begin (), matches.end ());
> +
> +  /* Finally call the callback, once per match.  */
> +  ULONGEST prev = -1;
> +  for (offset_type idx : matches)
> +    {
> +      if (prev != idx)
> +	{
> +	  match_callback (idx);
> +	  prev = idx;
> +	}
> +    }

I admit, I'm a little surprised by the number of steps involved here: push
every element in the range into a vector, sort, then de-dup & perform callback.
Had I implemented this, my initial attempt would have been to use a htab_up
and take the sorting time on insertion.

I can imagine that for very large ranges, your approach could outperform
an htab; do you expect these ranges to be that large? Have I overlooked
something? [I'm just curious. Not suggesting any changes need to be made.]

> +
> +  /* Above we use a type wider than idx's for 'prev', since 0 and
> +     (offset_type)-1 are both possible values.  */
> +  static_assert (sizeof (prev) > sizeof (offset_type), "");
> +}
> +
[snip]

Keith
  
Simon Marchi Nov. 18, 2017, 5:23 a.m. UTC | #2
On 2017-06-02 08:22 AM, Pedro Alves wrote:
> @@ -4186,30 +4239,214 @@ dw2_expand_symtabs_matching
>  	}
>      }
>  
> -  gdb_index_symbol_name_matcher lookup_name_matcher (lookup_name);
> +  mapped_index &index = *dwarf2_per_objfile->index_table;
>  
> -  for (iter = 0; iter < index->symbol_table_slots; ++iter)
> +  dw2_expand_symtabs_matching_symbol (index, lookup_name,
> +				      symbol_matcher,
> +				      kind, [&] (offset_type idx)
>      {
> -      offset_type idx = 2 * iter;
> -      const char *name;
> -      offset_type *vec, vec_len, vec_idx;
> -      int global_seen = 0;
> +      dw2_expand_marked_cus (index, idx, objfile, file_matcher,
> +			     expansion_notify, kind);
> +    });
> +}
>  
> -      QUIT;
> +/* Helper for dw2_expand_symtabs_matching that works with a
> +   mapped_index instead of the containing objfile.  This is split to a
> +   separate function in order to be able to unit test the
> +   name_components matching using a mock mapped_index.  For each
> +   symbol name that matches, calls MATCH_CALLBACK, passing it the
> +   symbol's index in the mapped_index symbol table.  */
>  
> -      if (index->symbol_table[idx] == 0 && index->symbol_table[idx + 1] == 0)
> -	continue;
> +static void
> +dw2_expand_symtabs_matching_symbol
> +  (mapped_index &index,
> +   const lookup_name_info &lookup_name,
> +   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
> +   enum search_domain kind,
> +   gdb::function_view<void (offset_type)> match_callback)
> +{
> +  gdb_index_symbol_name_matcher lookup_name_matcher
> +    (lookup_name);
> +
> +  auto *name_cmp = case_sensitivity == case_sensitive_on ? strcmp : strcasecmp;
> +
> +  /* Build the symbol name component sorted vector, if we haven't yet.
> +     The code below only knows how to break apart components of C++
> +     symbol names (and other languages that use '::' as
> +     namespace/module separator).  If we add support for wild matching
> +     to some language that uses some other operator (E.g., Ada, Go and
> +     D use '.'), then we'll need to try splitting the symbol name
> +     according to that language too.  Note that Ada does support wild
> +     matching, but doesn't currently support .gdb_index.  */
> +  if (index.name_components.empty ())
> +    {
> +      for (size_t iter = 0; iter < index.symbol_table_slots; ++iter)
> +	{
> +	  offset_type idx = 2 * iter;
> +
> +	  if (index.symbol_table[idx] == 0
> +	      && index.symbol_table[idx + 1] == 0)
> +	    continue;
> +
> +	  const char *name = index.symbol_name_at (idx);
> +
> +	  /* Add each name component to the name component table.  */
> +	  unsigned int previous_len = 0;
> +	  for (unsigned int current_len = cp_find_first_component (name);
> +	       name[current_len] != '\0';
> +	       current_len += cp_find_first_component (name + current_len))
> +	    {
> +	      gdb_assert (name[current_len] == ':');
> +	      index.name_components.push_back ({previous_len, idx});
> +	      /* Skip the '::'.  */
> +	      current_len += 2;
> +	      previous_len = current_len;
> +	    }
> +	  index.name_components.push_back ({previous_len, idx});
> +	}
> +
> +      /* Sort name_comp elements by name.   */
> +      auto name_comp_compare = [&] (const name_component &left,
> +				    const name_component &right)
> +	{
> +	  const char *left_qualified = index.symbol_name_at (left.idx);
> +	  const char *right_qualified = index.symbol_name_at (right.idx);
> +
> +	  const char *left_name = left_qualified + left.name_offset;
> +	  const char *right_name = right_qualified + right.name_offset;
> +
> +	  return name_cmp (left_name, right_name) < 0;
> +	};
> +
> +      std::sort (index.name_components.begin (),
> +		 index.name_components.end (),
> +		 name_comp_compare);
> +    }
> +
> +  const char *cplus
> +    = lookup_name.cplus ().lookup_name ().c_str ();
>  
> -      name = index->constant_pool + MAYBE_SWAP (index->symbol_table[idx]);
> +  /* Comparison function object for lower_bound that matches against a
> +     given symbol name.  */
> +  auto lookup_compare_lower = [&] (const name_component &elem,
> +				   const char *name)
> +    {
> +      const char *elem_qualified = index.symbol_name_at (elem.idx);
> +      const char *elem_name = elem_qualified + elem.name_offset;
> +      return name_cmp (elem_name, name) < 0;
> +    };
> +
> +  /* Comparison function object for upper_bound that matches against a
> +     given symbol name.  */
> +  auto lookup_compare_upper = [&] (const char *name,
> +				   const name_component &elem)
> +    {
> +      const char *elem_qualified = index.symbol_name_at (elem.idx);
> +      const char *elem_name = elem_qualified + elem.name_offset;
> +      return name_cmp (name, elem_name) < 0;
> +    };
> +
> +  auto begin = index.name_components.begin ();
> +  auto end = index.name_components.end ();
> +
> +  /* Find the lower bound.  */
> +  auto lower = [&] ()
> +    {
> +      if (lookup_name.completion_mode () && cplus[0] == '\0')
> +	return begin;
> +      else
> +	return std::lower_bound (begin, end, cplus, lookup_compare_lower);
> +    } ();
>  
> -      if (!lookup_name_matcher.matches (name)
> -	  || (symbol_matcher != NULL && !symbol_matcher (name)))
> +  /* Find the upper bound.  */
> +  auto upper = [&] ()
> +    {
> +      if (lookup_name.completion_mode ())
> +	{
> +	  /* The string frobbing below won't work if the string is
> +	     empty.  We don't need it then, anyway -- if we're
> +	     completing an empty string, then we want to iterate over
> +	     the whole range.  */
> +	  if (cplus[0] == '\0')
> +	    return end;
> +
> +	  /* In completion mode, increment the last character because
> +	     we want UPPER to point past all symbols names that have
> +	     the same prefix.  */
> +	  std::string after = cplus;
> +
> +	  gdb_assert (after.back () != 0xff);

Hi Pedro,

With Clang, I get this warning:

/home/simark/src/binutils-gdb/gdb/dwarf2read.c:4316:30: error: comparison of constant 255 with expression of type '__gnu_cxx::__alloc_traits<std::allocator<char> >::value_type' (aka 'char') is always true [-Werror,-Wtautological-constant-out-of-range-compare]
          gdb_assert (after.back () != 0xff);
                      ~~~~~~~~~~~~~ ^  ~~~~
/home/simark/src/binutils-gdb/gdb/common/gdb_assert.h:34:13: note: expanded from macro 'gdb_assert'
  ((void) ((expr) ? 0 :                                                       \
            ^~~~

Simon
  
Pedro Alves Nov. 20, 2017, 12:33 a.m. UTC | #3
On 11/18/2017 05:23 AM, Simon Marchi wrote:
> On 2017-06-02 08:22 AM, Pedro Alves wrote:

>> +	  /* In completion mode, increment the last character because
>> +	     we want UPPER to point past all symbols names that have
>> +	     the same prefix.  */
>> +	  std::string after = cplus;
>> +
>> +	  gdb_assert (after.back () != 0xff);
> 
> Hi Pedro,
> 

Hi!

> With Clang, I get this warning:
> 
> /home/simark/src/binutils-gdb/gdb/dwarf2read.c:4316:30: error: comparison of constant 255 with expression of type '__gnu_cxx::__alloc_traits<std::allocator<char> >::value_type' (aka 'char') is always true [-Werror,-Wtautological-constant-out-of-range-compare]
>           gdb_assert (after.back () != 0xff);
>                       ~~~~~~~~~~~~~ ^  ~~~~
> /home/simark/src/binutils-gdb/gdb/common/gdb_assert.h:34:13: note: expanded from macro 'gdb_assert'
>   ((void) ((expr) ? 0 :                                                       \
>             ^~~~
> 

Bleh, I should have known that'd come back and bite.  :-P

I'm sending a 3 patch series in response to this email that fixes that,
and also other things that I ran into in course of fixing it.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index f523326..e955131 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -178,6 +178,51 @@  DEF_VEC_I (offset_type);
     GDB_INDEX_CU_SET_VALUE((cu_index), (value)); \
   } while (0)
 
+#if WORDS_BIGENDIAN
+
+/* Convert VALUE between big- and little-endian.  */
+static offset_type
+byte_swap (offset_type value)
+{
+  offset_type result;
+
+  result = (value & 0xff) << 24;
+  result |= (value & 0xff00) << 8;
+  result |= (value & 0xff0000) >> 8;
+  result |= (value & 0xff000000) >> 24;
+  return result;
+}
+
+#define MAYBE_SWAP(V)  byte_swap (V)
+
+#else
+#define MAYBE_SWAP(V) (V)
+#endif /* WORDS_BIGENDIAN */
+
+/* An index into a (C++) symbol name component in a symbol name as
+   recorded in the mapped_index's symbol table.  For each C++ symbol
+   in the symbol table, we record one entry for the start of each
+   component in the symbol in a table of name components, and then
+   sort the table, in order to be able to binary search symbol names,
+   ignoring leading namespaces, both completion and regular look up.
+   For example, for symbol "A::B::C", we'll have an entry that points
+   to "A::B::C", another that points to "B::C", and another for "C".
+   Note that function symbols in GDB index have no parameter
+   information, just the function/method names.  You can convert a
+   name_component to a "const char *" using the
+   'mapped_index::symbol_name_at(offset_type)' method.  */
+struct name_component
+{
+  /* Offset in the symbol name where the component starts.  Stored as
+     a (32-bit) offset instead of a pointer to save memory and improve
+     locality on 64-bit architectures.  */
+  offset_type name_offset;
+
+  /* The symbol's index in the symbol and constant pool tables of a
+     mapped_index.  */
+  offset_type idx;
+};
+
 /* A description of the mapped index.  The file format is described in
    a comment by the code that writes the index.  */
 struct mapped_index
@@ -202,6 +247,15 @@  struct mapped_index
 
   /* A pointer to the constant pool.  */
   const char *constant_pool;
+
+  /* The name_component table (a sorted vector).  See name_component's
+     description above.  */
+  std::vector<name_component> name_components;
+
+  /* Convenience method to get at the name of the symbol at IDX in the
+     symbol table.  */
+  const char *symbol_name_at (offset_type idx) const
+  { return this->constant_pool + MAYBE_SWAP (this->symbol_table[idx]); }
 };
 
 typedef struct dwarf2_per_cu_data *dwarf2_per_cu_ptr;
@@ -2131,26 +2185,6 @@  line_header_eq_voidp (const void *item_lhs, const void *item_rhs)
 }
 
 
-#if WORDS_BIGENDIAN
-
-/* Convert VALUE between big- and little-endian.  */
-static offset_type
-byte_swap (offset_type value)
-{
-  offset_type result;
-
-  result = (value & 0xff) << 24;
-  result |= (value & 0xff00) << 8;
-  result |= (value & 0xff0000) >> 8;
-  result |= (value & 0xff000000) >> 24;
-  return result;
-}
-
-#define MAYBE_SWAP(V)  byte_swap (V)
-
-#else
-#define MAYBE_SWAP(V) (V)
-#endif /* WORDS_BIGENDIAN */
 
 /* Read the given attribute value as an address, taking the attribute's
    form into account.  */
@@ -3390,6 +3424,7 @@  dwarf2_read_index (struct objfile *objfile)
   create_addrmap_from_index (objfile, &local_map);
 
   map = XOBNEW (&objfile->objfile_obstack, struct mapped_index);
+  map = new (map) mapped_index ();
   *map = local_map;
 
   dwarf2_per_objfile->index_table = map;
@@ -4016,7 +4051,11 @@  dw2_map_matching_symbols (struct objfile *objfile,
 
      Since each language has its own symbol name matching algorithm,
      and we don't know which language is the right one, we must match
-     each symbol against all languages.
+     each symbol against all languages.  This would be a potential
+     performance problem if it were not mitigated by the
+     mapped_index::name_components lookup table, which significantly
+     reduces the number of times we need to call into this matcher,
+     making it a non-issue.
 
    - Symbol names in the index have no overload (parameter)
      information. I.e., in C++, "foo(int)" and "foo(long)" both appear
@@ -4095,6 +4134,22 @@  gdb_index_symbol_name_matcher::matches (const char *symbol_name)
 }
 
 static void
+dw2_expand_marked_cus
+  (mapped_index &index, offset_type idx,
+   struct objfile *objfile,
+   gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
+   gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
+   search_domain kind);
+
+static void
+dw2_expand_symtabs_matching_symbol
+  (mapped_index &index,
+   const lookup_name_info &lookup_name_in,
+   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
+   enum search_domain kind,
+   gdb::function_view<void (offset_type)> on_match);
+
+static void
 dw2_expand_symtabs_matching
   (struct objfile *objfile,
    gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
@@ -4105,14 +4160,12 @@  dw2_expand_symtabs_matching
 {
   int i;
   offset_type iter;
-  struct mapped_index *index;
 
   dw2_setup (objfile);
 
   /* index_table is NULL if OBJF_READNOW.  */
   if (!dwarf2_per_objfile->index_table)
     return;
-  index = dwarf2_per_objfile->index_table;
 
   if (file_matcher != NULL)
     {
@@ -4186,30 +4239,214 @@  dw2_expand_symtabs_matching
 	}
     }
 
-  gdb_index_symbol_name_matcher lookup_name_matcher (lookup_name);
+  mapped_index &index = *dwarf2_per_objfile->index_table;
 
-  for (iter = 0; iter < index->symbol_table_slots; ++iter)
+  dw2_expand_symtabs_matching_symbol (index, lookup_name,
+				      symbol_matcher,
+				      kind, [&] (offset_type idx)
     {
-      offset_type idx = 2 * iter;
-      const char *name;
-      offset_type *vec, vec_len, vec_idx;
-      int global_seen = 0;
+      dw2_expand_marked_cus (index, idx, objfile, file_matcher,
+			     expansion_notify, kind);
+    });
+}
 
-      QUIT;
+/* Helper for dw2_expand_symtabs_matching that works with a
+   mapped_index instead of the containing objfile.  This is split to a
+   separate function in order to be able to unit test the
+   name_components matching using a mock mapped_index.  For each
+   symbol name that matches, calls MATCH_CALLBACK, passing it the
+   symbol's index in the mapped_index symbol table.  */
 
-      if (index->symbol_table[idx] == 0 && index->symbol_table[idx + 1] == 0)
-	continue;
+static void
+dw2_expand_symtabs_matching_symbol
+  (mapped_index &index,
+   const lookup_name_info &lookup_name,
+   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
+   enum search_domain kind,
+   gdb::function_view<void (offset_type)> match_callback)
+{
+  gdb_index_symbol_name_matcher lookup_name_matcher
+    (lookup_name);
+
+  auto *name_cmp = case_sensitivity == case_sensitive_on ? strcmp : strcasecmp;
+
+  /* Build the symbol name component sorted vector, if we haven't yet.
+     The code below only knows how to break apart components of C++
+     symbol names (and other languages that use '::' as
+     namespace/module separator).  If we add support for wild matching
+     to some language that uses some other operator (E.g., Ada, Go and
+     D use '.'), then we'll need to try splitting the symbol name
+     according to that language too.  Note that Ada does support wild
+     matching, but doesn't currently support .gdb_index.  */
+  if (index.name_components.empty ())
+    {
+      for (size_t iter = 0; iter < index.symbol_table_slots; ++iter)
+	{
+	  offset_type idx = 2 * iter;
+
+	  if (index.symbol_table[idx] == 0
+	      && index.symbol_table[idx + 1] == 0)
+	    continue;
+
+	  const char *name = index.symbol_name_at (idx);
+
+	  /* Add each name component to the name component table.  */
+	  unsigned int previous_len = 0;
+	  for (unsigned int current_len = cp_find_first_component (name);
+	       name[current_len] != '\0';
+	       current_len += cp_find_first_component (name + current_len))
+	    {
+	      gdb_assert (name[current_len] == ':');
+	      index.name_components.push_back ({previous_len, idx});
+	      /* Skip the '::'.  */
+	      current_len += 2;
+	      previous_len = current_len;
+	    }
+	  index.name_components.push_back ({previous_len, idx});
+	}
+
+      /* Sort name_comp elements by name.   */
+      auto name_comp_compare = [&] (const name_component &left,
+				    const name_component &right)
+	{
+	  const char *left_qualified = index.symbol_name_at (left.idx);
+	  const char *right_qualified = index.symbol_name_at (right.idx);
+
+	  const char *left_name = left_qualified + left.name_offset;
+	  const char *right_name = right_qualified + right.name_offset;
+
+	  return name_cmp (left_name, right_name) < 0;
+	};
+
+      std::sort (index.name_components.begin (),
+		 index.name_components.end (),
+		 name_comp_compare);
+    }
+
+  const char *cplus
+    = lookup_name.cplus ().lookup_name ().c_str ();
 
-      name = index->constant_pool + MAYBE_SWAP (index->symbol_table[idx]);
+  /* Comparison function object for lower_bound that matches against a
+     given symbol name.  */
+  auto lookup_compare_lower = [&] (const name_component &elem,
+				   const char *name)
+    {
+      const char *elem_qualified = index.symbol_name_at (elem.idx);
+      const char *elem_name = elem_qualified + elem.name_offset;
+      return name_cmp (elem_name, name) < 0;
+    };
+
+  /* Comparison function object for upper_bound that matches against a
+     given symbol name.  */
+  auto lookup_compare_upper = [&] (const char *name,
+				   const name_component &elem)
+    {
+      const char *elem_qualified = index.symbol_name_at (elem.idx);
+      const char *elem_name = elem_qualified + elem.name_offset;
+      return name_cmp (name, elem_name) < 0;
+    };
+
+  auto begin = index.name_components.begin ();
+  auto end = index.name_components.end ();
+
+  /* Find the lower bound.  */
+  auto lower = [&] ()
+    {
+      if (lookup_name.completion_mode () && cplus[0] == '\0')
+	return begin;
+      else
+	return std::lower_bound (begin, end, cplus, lookup_compare_lower);
+    } ();
 
-      if (!lookup_name_matcher.matches (name)
-	  || (symbol_matcher != NULL && !symbol_matcher (name)))
+  /* Find the upper bound.  */
+  auto upper = [&] ()
+    {
+      if (lookup_name.completion_mode ())
+	{
+	  /* The string frobbing below won't work if the string is
+	     empty.  We don't need it then, anyway -- if we're
+	     completing an empty string, then we want to iterate over
+	     the whole range.  */
+	  if (cplus[0] == '\0')
+	    return end;
+
+	  /* In completion mode, increment the last character because
+	     we want UPPER to point past all symbols names that have
+	     the same prefix.  */
+	  std::string after = cplus;
+
+	  gdb_assert (after.back () != 0xff);
+	  after.back ()++;
+
+	  return std::upper_bound (lower, end, after.c_str (),
+				   lookup_compare_upper);
+	}
+      else
+	return std::upper_bound (lower, end, cplus, lookup_compare_upper);
+    } ();
+
+  /* Now for each symbol name in range, check to see if we have a name
+     match, and if so, call the MATCH_CALLBACK callback.  */
+
+  /* The same symbol may appear more than once in the range though.
+     E.g., if we're looking for symbols that complete "w", and we have
+     a symbol named "w1::w2", we'll find the two name components for
+     that same symbol in the range.  To be sure we only call the
+     callback once per symbol, we first collect the symbol name
+     indexes that matched in a temporary vector and ignore
+     duplicates.  */
+  std::vector<offset_type> matches;
+  matches.reserve (std::distance (lower, upper));
+
+  for (;lower != upper; ++lower)
+    {
+      const char *qualified = index.symbol_name_at (lower->idx);
+
+      if (!lookup_name_matcher.matches (qualified)
+	  || (symbol_matcher != NULL && !symbol_matcher (qualified)))
 	continue;
 
-      /* The name was matched, now expand corresponding CUs that were
-	 marked.  */
-      vec = (offset_type *) (index->constant_pool
-			     + MAYBE_SWAP (index->symbol_table[idx + 1]));
+      matches.push_back (lower->idx);
+    }
+
+  std::sort (matches.begin (), matches.end ());
+
+  /* Finally call the callback, once per match.  */
+  ULONGEST prev = -1;
+  for (offset_type idx : matches)
+    {
+      if (prev != idx)
+	{
+	  match_callback (idx);
+	  prev = idx;
+	}
+    }
+
+  /* Above we use a type wider than idx's for 'prev', since 0 and
+     (offset_type)-1 are both possible values.  */
+  static_assert (sizeof (prev) > sizeof (offset_type), "");
+}
+
+/* Helper for dw2_expand_matching symtabs.  Called on each symbol
+   matched, to expand corresponding CUs that were marked.  IDX is the
+   index of the symbol name that matched.  */
+
+static void
+dw2_expand_marked_cus
+  (mapped_index &index, offset_type idx,
+   struct objfile *objfile,
+   gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
+   gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
+   search_domain kind)
+{
+  const char *name;
+  offset_type *vec, vec_len, vec_idx;
+  bool global_seen = false;
+
+  /* XXX reindent code below before pushing.  */
+
+      vec = (offset_type *) (index.constant_pool
+			     + MAYBE_SWAP (index.symbol_table[idx + 1]));
       vec_len = MAYBE_SWAP (vec[0]);
       for (vec_idx = 0; vec_idx < vec_len; ++vec_idx)
 	{
@@ -4225,7 +4462,7 @@  dw2_expand_symtabs_matching
 	     and indices >= 7 may elide them for certain symbols
 	     (gold does this).  */
 	  int attrs_valid =
-	    (index->version >= 7
+	    (index.version >= 7
 	     && symbol_kind != GDB_INDEX_SYMBOL_KIND_NONE);
 
 	  /* Work around gold/15646.  */
@@ -4234,7 +4471,7 @@  dw2_expand_symtabs_matching
 	      if (!is_static && global_seen)
 		continue;
 	      if (!is_static)
-		global_seen = 1;
+		global_seen = true;
 	    }
 
 	  /* Only check the symbol's kind if it has one.  */
@@ -4285,7 +4522,6 @@  dw2_expand_symtabs_matching
 		}
 	    }
 	}
-    }
 }
 
 /* A helper for dw2_find_pc_sect_compunit_symtab which finds the most specific
@@ -23285,6 +23521,9 @@  dwarf2_per_objfile_free (struct objfile *objfile, void *d)
 
   if (data->dwz_file && data->dwz_file->dwz_bfd)
     gdb_bfd_unref (data->dwz_file->dwz_bfd);
+
+  if (data->index_table != NULL)
+    data->index_table->~mapped_index ();
 }