[2/2] arm-tdep: sort mapping symbols after parsing all minimal symbols

Message ID 20190625013748.11003-2-simon.marchi@polymtl.ca
State New, archived
Headers

Commit Message

Simon Marchi June 25, 2019, 1:37 a.m. UTC
  Somebody on IRC reported a while ago that loading a big ARM program in
GDB was very slow.  Their profiling pointed out that a big amount of
time was spent in

    VEC_safe_insert (arm_mapping_symbol_s, *map_p, idx, &new_map_sym);

I was able to verify this as well.

ARM mapping symbols are special ELF symbols named $a, $d and $t
indicating that symbols starting at this address up to the next mapping
symbol (in terms of address) are of type "ARM code", "data" and "Thumb
code", respectively.  GDB records these symbols in vectors (one for each
section) in arm-tdep.c.  These vectors are sorted by symbol address, to
allow for quick lookup.  The current approach is to insert new symbols
at the right position to keep the vectors sorted at all time.  This is
done based on the assumption that mapping symbols come already almost
sorted from the binary, as explains this comment in
arm_record_special_symbol:

  /* Assume that most mapping symbols appear in order of increasing
     value.  If they were randomly distributed, it would be faster to
     always push here and then sort at first use.  */

Well, it turns out this is not the case.  The original reporter
mentioned that mapping symbols in their binaries are not nearly sorted,
and this is not my experience either (at least in the binary used in the
benchmark below).  So if the values don't come nearly sorted, doing
insertions to keep the vectors sorted ends up being of the order of
number_of_mapping_symbols ^ 2.

This patch changes it just like the comment above says, to just append
to the vector in arm_record_special_symbol and sort the vector on first
use.

Benchmark
  

Comments

Tom Tromey June 25, 2019, 2:32 p.m. UTC | #1
>>>>> "Simon" == Simon Marchi <simon.marchi@polymtl.ca> writes:

Simon> Somebody on IRC reported a while ago that loading a big ARM program in
Simon> GDB was very slow.  Their profiling pointed out that a big amount of
Simon> time was spent in

Simon>     VEC_safe_insert (arm_mapping_symbol_s, *map_p, idx, &new_map_sym);

Simon> I was able to verify this as well.

Thank you for doing this.  It looks good to me.

Simon>   insert sorted: 71.12s user 2.71s system 99% cpu 1:14.03 total
Simon>   sort after:    46.42s user 2.60s system 99% cpu  49.147 total

Simon> I tried for fun on my Raspberry Pi 3, the run time of
Simon> elf_symtab_read goes from ~259s to ~9s, reading the same file.

Great results.

Tom
  
Alan Hayward June 25, 2019, 2:33 p.m. UTC | #2
> On 25 Jun 2019, at 02:37, Simon Marchi <simon.marchi@polymtl.ca> wrote:

> 

> Somebody on IRC reported a while ago that loading a big ARM program in

> GDB was very slow.  Their profiling pointed out that a big amount of

> time was spent in

> 

>   VEC_safe_insert (arm_mapping_symbol_s, *map_p, idx, &new_map_sym);

> 

> I was able to verify this as well.

> 

> ARM mapping symbols are special ELF symbols named $a, $d and $t

> indicating that symbols starting at this address up to the next mapping

> symbol (in terms of address) are of type "ARM code", "data" and "Thumb

> code", respectively.  GDB records these symbols in vectors (one for each

> section) in arm-tdep.c.  These vectors are sorted by symbol address, to

> allow for quick lookup.  The current approach is to insert new symbols

> at the right position to keep the vectors sorted at all time.  This is

> done based on the assumption that mapping symbols come already almost

> sorted from the binary, as explains this comment in

> arm_record_special_symbol:

> 

> /* Assume that most mapping symbols appear in order of increasing

>    value.  If they were randomly distributed, it would be faster to

>    always push here and then sort at first use.  */


I’ve been wondering where this assumption came from, whether there is any
evidence to back it up, one way or the other.

Sadly, the original patch doesn’t give anymore clues (it was pushed without
any obvious review):
https://sourceware.org/ml/gdb-patches/2008-05/msg00113.html

I had a look at some binaries:
*gdb itself and binaries built by the test suite - random order.
*system binaries on Aarch32 Linux - no mapping symbols.

I then looked at a binary built with the Arm Compiler, and found that the
symbols are in increasing numerical order.  That might explain where the
assumption came from.


> 

> Well, it turns out this is not the case.  The original reporter

> mentioned that mapping symbols in their binaries are not nearly sorted,

> and this is not my experience either (at least in the binary used in the

> benchmark below).  So if the values don't come nearly sorted, doing

> insertions to keep the vectors sorted ends up being of the order of

> number_of_mapping_symbols ^ 2.

> 

> This patch changes it just like the comment above says, to just append

> to the vector in arm_record_special_symbol and sort the vector on first

> use.

> 

> Benchmark

> =========

> 

> I have done some benchmarks using an --enable-targets=all GDB, compiled

> with -O2, running on x86-64 and parsing file

> dce18d22e5c2ecb6a3a57372f4e6ef614130bc.debug from this package:

> 

> https://launchpad.net/ubuntu/+source/firefox/66.0.3+build1-0ubuntu1/+build/16608691/+files/firefox-dbg_66.0.3+build1-0ubuntu1_armhf.deb

> 

> This file is the separate debug info for libxul.so (part of firefox) for

> ARM.

> 

> I have added some traces to measure the execution time of just

> elf_symtab_read and ran GDB like this:

> 

> ./gdb --data-directory=data-directory -nx -batch .../path/to/usr/lib/debug/.build-id/65/dce18d22e5c2ecb6a3a57372f4e6ef614130bc.debug

> 

> Since the new code sorts the vectors on first use, it would be difficult

> to benchmark it as-is and be fair, since the "before" version does more

> work in elf_symtab_read.  So I have actually benchmarked a version of

> the patch that did sort all the vectors at the end of elf_symtab_read,

> so the sorting would be considered in the measured execution time.

> 

> Here's the measured execution time of elf_symtab_read, averaged on 3

> runs:

> 

> insert sorted (before): 28.678s

> sort after (after):      1.760s

> 

> And here's the total execution time of the command above (just one run).

> The time is now mostly spent in reading DWARF.

> 

> insert sorted: 71.12s user 2.71s system 99% cpu 1:14.03 total

> sort after:    46.42s user 2.60s system 99% cpu  49.147 total

> 

> I tried for fun on my Raspberry Pi 3, the run time of

> elf_symtab_read goes from ~259s to ~9s, reading the same file.


That’s quite a significant change!

What would the effect be with your code if the symbol addresses were already
sorted - would we expect a slowdown compared with the existing version?





I’ve built this patch series and ran the test suite on an Aarch32 box, and
saw no regressions.

As far as the code changes go, the patch below LGTM.

I looked at the previous path, I’m not an expert on std:vector, but the changes
LGTM too.
  
Simon Marchi June 25, 2019, 6:21 p.m. UTC | #3
On 2019-06-25 10:33 a.m., Alan Hayward wrote:
> 
> 
>> On 25 Jun 2019, at 02:37, Simon Marchi <simon.marchi@polymtl.ca> wrote:
>>
>> Somebody on IRC reported a while ago that loading a big ARM program in
>> GDB was very slow.  Their profiling pointed out that a big amount of
>> time was spent in
>>
>>   VEC_safe_insert (arm_mapping_symbol_s, *map_p, idx, &new_map_sym);
>>
>> I was able to verify this as well.
>>
>> ARM mapping symbols are special ELF symbols named $a, $d and $t
>> indicating that symbols starting at this address up to the next mapping
>> symbol (in terms of address) are of type "ARM code", "data" and "Thumb
>> code", respectively.  GDB records these symbols in vectors (one for each
>> section) in arm-tdep.c.  These vectors are sorted by symbol address, to
>> allow for quick lookup.  The current approach is to insert new symbols
>> at the right position to keep the vectors sorted at all time.  This is
>> done based on the assumption that mapping symbols come already almost
>> sorted from the binary, as explains this comment in
>> arm_record_special_symbol:
>>
>> /* Assume that most mapping symbols appear in order of increasing
>>    value.  If they were randomly distributed, it would be faster to
>>    always push here and then sort at first use.  */
> 
> I’ve been wondering where this assumption came from, whether there is any
> evidence to back it up, one way or the other.
> 
> Sadly, the original patch doesn’t give anymore clues (it was pushed without
> any obvious review):
> https://sourceware.org/ml/gdb-patches/2008-05/msg00113.html

Ah I hadn't thought of checking the original patch, thanks.

> I had a look at some binaries:
> *gdb itself and binaries built by the test suite - random order.
> *system binaries on Aarch32 Linux - no mapping symbols.
> 
> I then looked at a binary built with the Arm Compiler, and found that the
> symbols are in increasing numerical order.  That might explain where the
> assumption came from.

Ok, good to know.

>> I tried for fun on my Raspberry Pi 3, the run time of
>> elf_symtab_read goes from ~259s to ~9s, reading the same file.
> 
> That’s quite a significant change!

Yes, a bit too significant I would say!  I would appreciate if somebody else
could run a similar experiment to confirm that it's not just me who messed
something up.

> What would the effect be with your code if the symbol addresses were already
> sorted - would we expect a slowdown compared with the existing version?

If you can give me a binary of significant size that has symbols already sorted (so,
compiled with the ARM compiler), I could do the experiment.  I'd be curious to know.

My guess is that it won't be significantly slower.  I don't know what happens internally
when sorting an already sorted vector, but it seems like it's quite efficient:

https://stackoverflow.com/questions/6567326/does-stdsort-check-if-a-vector-is-already-sorted
https://lemire.me/blog/2016/09/28/sorting-already-sorted-arrays-is-much-faster/

> I’ve built this patch series and ran the test suite on an Aarch32 box, and
> saw no regressions.
> 
> As far as the code changes go, the patch below LGTM.
> 
> I looked at the previous path, I’m not an expert on std:vector, but the changes
> LGTM too.

Thanks, I'll push both patches shortly, with Tom's comments on the first patch
addressed.

Simon
  

Patch

=========

I have done some benchmarks using an --enable-targets=all GDB, compiled
with -O2, running on x86-64 and parsing file
dce18d22e5c2ecb6a3a57372f4e6ef614130bc.debug from this package:

  https://launchpad.net/ubuntu/+source/firefox/66.0.3+build1-0ubuntu1/+build/16608691/+files/firefox-dbg_66.0.3+build1-0ubuntu1_armhf.deb

This file is the separate debug info for libxul.so (part of firefox) for
ARM.

I have added some traces to measure the execution time of just
elf_symtab_read and ran GDB like this:

  ./gdb --data-directory=data-directory -nx -batch  .../path/to/usr/lib/debug/.build-id/65/dce18d22e5c2ecb6a3a57372f4e6ef614130bc.debug

Since the new code sorts the vectors on first use, it would be difficult
to benchmark it as-is and be fair, since the "before" version does more
work in elf_symtab_read.  So I have actually benchmarked a version of
the patch that did sort all the vectors at the end of elf_symtab_read,
so the sorting would be considered in the measured execution time.

Here's the measured execution time of elf_symtab_read, averaged on 3
runs:

  insert sorted (before): 28.678s
  sort after (after):      1.760s

And here's the total execution time of the command above (just one run).
The time is now mostly spent in reading DWARF.

  insert sorted: 71.12s user 2.71s system 99% cpu 1:14.03 total
  sort after:    46.42s user 2.60s system 99% cpu  49.147 total

I tried for fun on my Raspberry Pi 3, the run time of
elf_symtab_read goes from ~259s to ~9s, reading the same file.

gdb/ChangeLog:

	* arm-tdep.c (struct arm_per_objfile) <section_maps_sorted>: New
	field.
	(arm_find_mapping_symbol): Sort mapping symbol vectors on first
	use.
	(arm_record_special_symbol): Don't insert new symbol in sorted
	position, push it at the end.
---
 gdb/arm-tdep.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index 2a890b62d733..81e5617b6f67 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -105,7 +105,8 @@  typedef std::vector<arm_mapping_symbol> arm_mapping_symbol_vec;
 struct arm_per_objfile
 {
   arm_per_objfile (size_t num_sections)
-  : section_maps (new arm_mapping_symbol_vec[num_sections])
+  : section_maps (new arm_mapping_symbol_vec[num_sections]),
+    section_maps_sorted (new bool[num_sections] ())
   {}
 
   /* Information about mapping symbols ($a, $d, $t) in the objfile.
@@ -117,6 +118,10 @@  struct arm_per_objfile
      For each section, the vector of arm_mapping_symbol is sorted by
      symbol value (address).  */
   std::unique_ptr<arm_mapping_symbol_vec[]> section_maps;
+
+  /* For each corresponding element of section_maps above, is this vector
+     sorted.  */
+  std::unique_ptr<bool[]> section_maps_sorted;
 };
 
 /* The list of available "set arm ..." and "show arm ..." commands.  */
@@ -354,10 +359,19 @@  arm_find_mapping_symbol (CORE_ADDR memaddr, CORE_ADDR *start)
 						   arm_objfile_data_key);
       if (data != NULL)
 	{
+	  unsigned int section_idx = sec->the_bfd_section->index;
+	  arm_mapping_symbol_vec &map
+	    = data->section_maps[section_idx];
+
+	  /* Sort the vector on first use.  */
+	  if (!data->section_maps_sorted[section_idx])
+	    {
+	      std::sort (map.begin (), map.end ());
+	      data->section_maps_sorted[section_idx] = true;
+	    }
+
 	  struct arm_mapping_symbol map_key
 	    = { memaddr - obj_section_addr (sec), 0 };
-	  const arm_mapping_symbol_vec &map
-	    = data->section_maps[sec->the_bfd_section->index];
 	  arm_mapping_symbol_vec::const_iterator it
 	    = std::lower_bound (map.begin (), map.end (), map_key);
 
@@ -8545,12 +8559,8 @@  arm_record_special_symbol (struct gdbarch *gdbarch, struct objfile *objfile,
   new_map_sym.value = sym->value;
   new_map_sym.type = name[1];
 
-  /* Assume that most mapping symbols appear in order of increasing
-     value.  If they were randomly distributed, it would be faster to
-     always push here and then sort at first use.  */
-  arm_mapping_symbol_vec::iterator it
-    = std::lower_bound (map.begin (), map.end (), new_map_sym);
-  map.insert (it, new_map_sym);
+  /* Insert at the end, the vector will be sorted on first use.  */
+  map.push_back (new_map_sym);
 }
 
 static void