[review] Create a correctly-sized demangled names hashtable

Message ID gerrit.1574118864000.I1f074d38e1d90af58705ec852f90c84cc034cd2e@gnutoolchain-gerrit.osci.io
State New, archived
Headers

Commit Message

Simon Marchi (Code Review) Nov. 18, 2019, 11:14 p.m. UTC
  Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/686
......................................................................

Create a correctly-sized demangled names hashtable

If we have a minsym count, we know the demangled names hashtable will
be at least that big.  So use that count to create it, so we don't
have to resize/rehash it as much.

This is a 6% improvement in minsym loading time.

2019-11-18  Christian Biesinger  <cbiesinger@google.com>

	* symtab.c (create_demangled_names_hash): Use per_bfd->
	minimal_symbol_count as the initial size, if greater than
	our default size.

Change-Id: I1f074d38e1d90af58705ec852f90c84cc034cd2e
---
M gdb/symtab.c
1 file changed, 9 insertions(+), 2 deletions(-)
  

Comments

Simon Marchi (Code Review) Nov. 18, 2019, 11:19 p.m. UTC | #1
Christian Biesinger has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/686
......................................................................


Patch Set 1:

I guess this patch doesn't *technically* depend on the threading changes; but it will have no benefit without them.
  
Simon Marchi (Code Review) Nov. 18, 2019, 11:24 p.m. UTC | #2
Christian Biesinger has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/686
......................................................................


Patch Set 1:

(1 comment)

| --- gdb/symtab.c
| +++ gdb/symtab.c
| @@ -771,10 +771,17 @@ create_demangled_names_hash (struct objfile_per_bfd_storage *per_bfd)
|       Choosing a much larger table size wastes memory, and saves only about
| -     1% in symbol reading.  */
| +     1% in symbol reading.  However, if the minsym count is already
| +     initialized (e.g. because symbol name setting was deferred to
| +     a background thread) we can initialize the hashtable with that
| +     count, because we will almost certainly have at least that
| +     many entries.  If we have a nonzero number but less than 256,
| +     we still stay with 256 to have some space for psymbols, etc.  */
| +
| +  int count = std::max (per_bfd->minimal_symbol_count, 256);

PS1, Line 779:

Looking at hashtab.c, I should perhaps do something like
((minimal_symbol_count+2)/3)*4, to more completely avoid hashtable
resizings. Let me know if you have thoughts on that.

|  
|    per_bfd->demangled_names_hash.reset (htab_create_alloc
| -    (256, hash_demangled_name_entry, eq_demangled_name_entry,
| +    (count, hash_demangled_name_entry, eq_demangled_name_entry,
|       free_demangled_name_entry, xcalloc, xfree));
|  }
|  
|  /* See symtab.h  */
|
  
Simon Marchi (Code Review) Nov. 20, 2019, 4:42 a.m. UTC | #3
Simon Marchi has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/686
......................................................................


Patch Set 1:

(1 comment)

| --- gdb/symtab.c
| +++ gdb/symtab.c
| @@ -771,10 +771,17 @@ create_demangled_names_hash (struct objfile_per_bfd_storage *per_bfd)
|       Choosing a much larger table size wastes memory, and saves only about
| -     1% in symbol reading.  */
| +     1% in symbol reading.  However, if the minsym count is already
| +     initialized (e.g. because symbol name setting was deferred to
| +     a background thread) we can initialize the hashtable with that
| +     count, because we will almost certainly have at least that
| +     many entries.  If we have a nonzero number but less than 256,
| +     we still stay with 256 to have some space for psymbols, etc.  */
| +
| +  int count = std::max (per_bfd->minimal_symbol_count, 256);

PS1, Line 779:

> Looking at hashtab.c, I should perhaps do something like ((minimal_symbol_count+2)/3)*4, to more completely avoid hashtable resizings. Let me know if you have thoughts on that.

Not sure why the +2, but I think it's a good idea to size the
hashtable correctly from the start.  I'd be fine with the (.../3)*4
with the comment explaining why we do that.  This is performance
critical code, so it's normal to do tweaks like that.

Oh, is the +2 to make sure it rounds up the division by 3?

|  
|    per_bfd->demangled_names_hash.reset (htab_create_alloc
| -    (256, hash_demangled_name_entry, eq_demangled_name_entry,
| +    (count, hash_demangled_name_entry, eq_demangled_name_entry,
|       free_demangled_name_entry, xcalloc, xfree));
|  }
|  
|  /* See symtab.h  */
|
  
Simon Marchi (Code Review) Nov. 20, 2019, 5:34 a.m. UTC | #4
Christian Biesinger has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/686
......................................................................


Patch Set 2:

(1 comment)

BTW, thoughts on me pushing this even without the threading changes, so that I have fewer patches to keep track of? Shouldn't hurt anything.

| --- gdb/symtab.c
| +++ gdb/symtab.c
| @@ -771,10 +771,17 @@ create_demangled_names_hash (struct objfile_per_bfd_storage *per_bfd)
|       Choosing a much larger table size wastes memory, and saves only about
| -     1% in symbol reading.  */
| +     1% in symbol reading.  However, if the minsym count is already
| +     initialized (e.g. because symbol name setting was deferred to
| +     a background thread) we can initialize the hashtable with that
| +     count, because we will almost certainly have at least that
| +     many entries.  If we have a nonzero number but less than 256,
| +     we still stay with 256 to have some space for psymbols, etc.  */
| +
| +  int count = std::max (per_bfd->minimal_symbol_count, 256);

PS1, Line 779:

Yes, +2 is for rounding. Done and added a comment for that.

|  
|    per_bfd->demangled_names_hash.reset (htab_create_alloc
| -    (256, hash_demangled_name_entry, eq_demangled_name_entry,
| +    (count, hash_demangled_name_entry, eq_demangled_name_entry,
|       free_demangled_name_entry, xcalloc, xfree));
|  }
|  
|  /* See symtab.h  */
|
  
Simon Marchi (Code Review) Nov. 22, 2019, 2:50 a.m. UTC | #5
Kevin Buettner has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/686
......................................................................


Patch Set 2: Code-Review+2

> Patch Set 2:
> 
> (1 comment)
> 
> BTW, thoughts on me pushing this even without the threading changes, so that I have fewer patches to keep track of? Shouldn't hurt anything.

I think it makes sense to push this patch ahead of the rest of the series.
  

Patch

diff --git a/gdb/symtab.c b/gdb/symtab.c
index 3502827..e4da065 100644
--- a/gdb/symtab.c
+++ b/gdb/symtab.c
@@ -769,10 +769,17 @@ 
   /* Choose 256 as the starting size of the hash table, somewhat arbitrarily.
      The hash table code will round this up to the next prime number.
      Choosing a much larger table size wastes memory, and saves only about
-     1% in symbol reading.  */
+     1% in symbol reading.  However, if the minsym count is already
+     initialized (e.g. because symbol name setting was deferred to
+     a background thread) we can initialize the hashtable with that
+     count, because we will almost certainly have at least that
+     many entries.  If we have a nonzero number but less than 256,
+     we still stay with 256 to have some space for psymbols, etc.  */
+
+  int count = std::max (per_bfd->minimal_symbol_count, 256);
 
   per_bfd->demangled_names_hash.reset (htab_create_alloc
-    (256, hash_demangled_name_entry, eq_demangled_name_entry,
+    (count, hash_demangled_name_entry, eq_demangled_name_entry,
      free_demangled_name_entry, xcalloc, xfree));
 }