[review] Compute msymbol hash codes in parallel

Message ID gerrit.1572031795000.Ifaa3346e9998f05743bff9e2eaad3f83b954d071@gnutoolchain-gerrit.osci.io
State New, archived
Headers

Commit Message

Simon Marchi (Code Review) Oct. 25, 2019, 7:29 p.m. UTC
  Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308
......................................................................

Compute msymbol hash codes in parallel

This is for the msymbol_hash and msymbol_demangled_hash hashtables
in objfile_per_bfd_storage. This basically computes those hash
codes together with the demangled symbol name in the background,
before it inserts the symbols in the hash table.

gdb/ChangeLog:

2019-09-30  Christian Biesinger  <cbiesinger@google.com>

	* minsyms.c (add_minsym_to_hash_table): Use a previously computed
	hash code if possible.
	(add_minsym_to_demangled_hash_table): Likewise.
	(minimal_symbol_reader::install): Compute the hash codes for msymbol
	on the background thread.
	* symtab.h (struct minimal_symbol) <hash_value, demangled_hash_value>:
	Add these fields.

Change-Id: Ifaa3346e9998f05743bff9e2eaad3f83b954d071
---
M gdb/minsyms.c
1 file changed, 26 insertions(+), 15 deletions(-)
  

Comments

Simon Marchi (Code Review) Oct. 29, 2019, 7:44 p.m. UTC | #1
Tom Tromey has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308
......................................................................


Patch Set 1: Code-Review+1

(3 comments)

Thank you.  I appreciate the work you've done speeding up this area.
Marking this +1 since it needs a few fixes and also depends on
the threading patches.

https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308/1/gdb/minsyms.c 
File gdb/minsyms.c:

https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308/1/gdb/minsyms.c@1251 
PS1, Line 1251: 
1244 |       *copyto++ = *copyfrom++;
1245 |       mcount = copyto - msymbol;
1246 |     }
1247 |   return (mcount);
1248 | }
1249 | 
1250 | struct computed_hash_values {
1251 |   hashval_t mangled_name_hash;
1252 |   unsigned int minsym_hash;
1253 |   unsigned int minsym_demangled_hash;

It's a shame we need 3, but I guess the minsym ones are
case-insensitive.


https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308/1/gdb/minsyms.c@1263 
PS1, Line 1263: 
1254 | };
1255 | 
1256 | /* Build (or rebuild) the minimal symbol hash tables.  This is necessary
1257 |    after compacting or sorting the table since the entries move around
1258 |    thus causing the internal minimal_symbol pointers to become jumbled.  */
1259 |   
1260 | static void
1261 | build_minimal_symbol_hash_tables
1262 |   (struct objfile *objfile,
1263 |    std::vector<computed_hash_values>& hash_values)

Could be const?


https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308/1/gdb/minsyms.c@1269 
PS1, Line 1269: 
1261 | build_minimal_symbol_hash_tables
     | ...
1264 | {
1265 |   int i;
1266 |   struct minimal_symbol *msym;
1267 | 
1268 |   /* Clear the hash tables.  */
1269 |   for (i = 0; i < MINIMAL_SYMBOL_HASH_SIZE; i++)
1270 |     {
1271 |       objfile->per_bfd->msymbol_hash[i] = 0;
1272 |       objfile->per_bfd->msymbol_demangled_hash[i] = 0;
1273 |     }

Not part of your patch but this is only needed when multiple
symbol readers install minsyms, which is pretty rare.  So,
maybe it could be moved elsewhere to speed up the normal case.

This also applies to the 2 lines in this function that clear out
the "hash_next" and "demangled_hash_next" fields, though I suspect
it's not worth much to remove these.
  
Simon Marchi (Code Review) Oct. 29, 2019, 9:56 p.m. UTC | #2
Christian Biesinger has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308
......................................................................


Uploaded patch set 2: Patch Set 1 was rebased.

(3 comments)

https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308/1/gdb/minsyms.c 
File gdb/minsyms.c:

https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308/1/gdb/minsyms.c@1251 
PS1, Line 1251: 
1244 |       *copyto++ = *copyfrom++;
1245 |       mcount = copyto - msymbol;
1246 |     }
1247 |   return (mcount);
1248 | }
1249 | 
1250 | struct computed_hash_values {
1251 |   hashval_t mangled_name_hash;
1252 |   unsigned int minsym_hash;
1253 |   unsigned int minsym_demangled_hash;

> It's a shame we need 3, but I guess the minsym ones are […]

Yeah, that made me sad too :(


https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308/1/gdb/minsyms.c@1263 
PS1, Line 1263: 
1254 | };
1255 | 
1256 | /* Build (or rebuild) the minimal symbol hash tables.  This is necessary
1257 |    after compacting or sorting the table since the entries move around
1258 |    thus causing the internal minimal_symbol pointers to become jumbled.  */
1259 |   
1260 | static void
1261 | build_minimal_symbol_hash_tables
1262 |   (struct objfile *objfile,
1263 |    std::vector<computed_hash_values>& hash_values)

> Could be const?

Done


https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308/1/gdb/minsyms.c@1269 
PS1, Line 1269: 
1261 | build_minimal_symbol_hash_tables
     | ...
1264 | {
1265 |   int i;
1266 |   struct minimal_symbol *msym;
1267 | 
1268 |   /* Clear the hash tables.  */
1269 |   for (i = 0; i < MINIMAL_SYMBOL_HASH_SIZE; i++)
1270 |     {
1271 |       objfile->per_bfd->msymbol_hash[i] = 0;
1272 |       objfile->per_bfd->msymbol_demangled_hash[i] = 0;
1273 |     }

> Not part of your patch but this is only needed when multiple […]

I'll make a separate patch for that, though I don't recall that showing up in a profile.
  
Simon Marchi (Code Review) Nov. 26, 2019, 10:14 p.m. UTC | #3
Tom Tromey has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308
......................................................................


Patch Set 5:

(2 comments)

Thanks for doing this.  I agree this is a good direction to go.

I had one minor nit, but also one that's more substantive.
Let me know what you think.

| --- gdb/minsyms.c
| +++ gdb/minsyms.c
| @@ -1256,17 +1254,19 @@ clear_minimal_symbol_hash_tables (struct objfile *objfile)
|        objfile->per_bfd->msymbol_hash[i] = 0;
|        objfile->per_bfd->msymbol_demangled_hash[i] = 0;
|      }
|  }
|  
|  struct computed_hash_values
|  {
|    size_t name_length;
|    hashval_t mangled_name_hash;
| +  unsigned int minsym_hash;
| +  unsigned int minsym_demangled_hash;

PS5, Line 1264:

Needs comments, like the previous patch.

|  };
|  
|  /* Build (or rebuild) the minimal symbol hash tables.  This is necessary
|     after compacting or sorting the table since the entries move around
|     thus causing the internal minimal_symbol pointers to become jumbled.  */
|    
|  static void
| -build_minimal_symbol_hash_tables (struct objfile *objfile)
| +build_minimal_symbol_hash_tables

 ...

| @@ -1396,16 +1401,21 @@ #endif
|  		     (msym, demangled_name,
|  		      &m_objfile->per_bfd->storage_obstack);
|  		   msym->name_set = 1;
|  
|  		   hash_values[idx].mangled_name_hash
|  		     = fast_hash (msym->name, hash_values[idx].name_length);
|  		 }
| +	       hash_values[idx].minsym_hash
| +		 = msymbol_hash (msym->linkage_name ());
| +	       hash_values[idx].minsym_demangled_hash
| +		 = search_name_hash (MSYMBOL_LANGUAGE (msym),
| +				     msym->search_name ());

PS5, Line 1412:

This seems like it should only be computed if there is a demangled
form.

|  	     }
|  	   {
|  	     /* To limit how long we hold the lock, we only acquire it here
|  	        and not while we demangle the names above.  */
|  #if CXX_STD_THREAD
|  	     std::lock_guard<std::mutex> guard (demangled_mutex);
|  #endif
|  	     for (minimal_symbol *msym = start; msym < end; ++msym)
|  	       {
  
Simon Marchi (Code Review) Nov. 26, 2019, 10:25 p.m. UTC | #4
Christian Biesinger has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308
......................................................................


Patch Set 6:

(2 comments)

| --- gdb/minsyms.c
| +++ gdb/minsyms.c
| @@ -1256,17 +1254,19 @@ clear_minimal_symbol_hash_tables (struct objfile *objfile)
|        objfile->per_bfd->msymbol_hash[i] = 0;
|        objfile->per_bfd->msymbol_demangled_hash[i] = 0;
|      }
|  }
|  
|  struct computed_hash_values
|  {
|    size_t name_length;
|    hashval_t mangled_name_hash;
| +  unsigned int minsym_hash;
| +  unsigned int minsym_demangled_hash;

PS5, Line 1264:

Done

|  };
|  
|  /* Build (or rebuild) the minimal symbol hash tables.  This is necessary
|     after compacting or sorting the table since the entries move around
|     thus causing the internal minimal_symbol pointers to become jumbled.  */
|    
|  static void
| -build_minimal_symbol_hash_tables (struct objfile *objfile)
| +build_minimal_symbol_hash_tables

 ...

| @@ -1396,16 +1401,21 @@ #endif
|  		     (msym, demangled_name,
|  		      &m_objfile->per_bfd->storage_obstack);
|  		   msym->name_set = 1;
|  
|  		   hash_values[idx].mangled_name_hash
|  		     = fast_hash (msym->name, hash_values[idx].name_length);
|  		 }
| +	       hash_values[idx].minsym_hash
| +		 = msymbol_hash (msym->linkage_name ());
| +	       hash_values[idx].minsym_demangled_hash
| +		 = search_name_hash (MSYMBOL_LANGUAGE (msym),
| +				     msym->search_name ());

PS5, Line 1412:

Done (that is, I am computing this is search_name != linkage_name
matching the code in build_minimal_symbol_hash_tables)

|  	     }
|  	   {
|  	     /* To limit how long we hold the lock, we only acquire it here
|  	        and not while we demangle the names above.  */
|  #if CXX_STD_THREAD
|  	     std::lock_guard<std::mutex> guard (demangled_mutex);
|  #endif
|  	     for (minimal_symbol *msym = start; msym < end; ++msym)
|  	       {
  
Simon Marchi (Code Review) Nov. 27, 2019, 6:04 p.m. UTC | #5
Tom Tromey has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/308
......................................................................


Patch Set 7: Code-Review+2

Looks good.  Thanks for doing this.
  

Patch

diff --git a/gdb/minsyms.c b/gdb/minsyms.c
index 51894b2..9c134e8 100644
--- a/gdb/minsyms.c
+++ b/gdb/minsyms.c
@@ -141,12 +141,12 @@ 
 /* Add the minimal symbol SYM to an objfile's minsym hash table, TABLE.  */
 static void
 add_minsym_to_hash_table (struct minimal_symbol *sym,
-			  struct minimal_symbol **table)
+			  struct minimal_symbol **table,
+			  unsigned int hash_value)
 {
   if (sym->hash_next == NULL)
     {
-      unsigned int hash
-	= msymbol_hash (MSYMBOL_LINKAGE_NAME (sym)) % MINIMAL_SYMBOL_HASH_SIZE;
+      unsigned int hash = hash_value % MINIMAL_SYMBOL_HASH_SIZE;
 
       sym->hash_next = table[hash];
       table[hash] = sym;
@@ -157,18 +157,16 @@ 
    TABLE.  */
 static void
 add_minsym_to_demangled_hash_table (struct minimal_symbol *sym,
-				    struct objfile *objfile)
+				    struct objfile *objfile,
+				    unsigned int hash_value)
 {
   if (sym->demangled_hash_next == NULL)
     {
-      unsigned int hash = search_name_hash (MSYMBOL_LANGUAGE (sym),
-					    MSYMBOL_SEARCH_NAME (sym));
-
       objfile->per_bfd->demangled_hash_languages.set (MSYMBOL_LANGUAGE (sym));
 
       struct minimal_symbol **table
 	= objfile->per_bfd->msymbol_demangled_hash;
-      unsigned int hash_index = hash % MINIMAL_SYMBOL_HASH_SIZE;
+      unsigned int hash_index = hash_value % MINIMAL_SYMBOL_HASH_SIZE;
       sym->demangled_hash_next = table[hash_index];
       table[hash_index] = sym;
     }
@@ -1251,6 +1249,8 @@ 
 
 struct computed_hash_values {
   hashval_t mangled_name_hash;
+  unsigned int minsym_hash;
+  unsigned int minsym_demangled_hash;
 };
 
 /* Build (or rebuild) the minimal symbol hash tables.  This is necessary
@@ -1258,7 +1258,9 @@ 
    thus causing the internal minimal_symbol pointers to become jumbled.  */
   
 static void
-build_minimal_symbol_hash_tables (struct objfile *objfile)
+build_minimal_symbol_hash_tables
+  (struct objfile *objfile,
+   std::vector<computed_hash_values>& hash_values)
 {
   int i;
   struct minimal_symbol *msym;
@@ -1271,17 +1273,20 @@ 
     }
 
   /* Now, (re)insert the actual entries.  */
-  for ((i = objfile->per_bfd->minimal_symbol_count,
+  int mcount = objfile->per_bfd->minimal_symbol_count;
+  for ((i = 0,
 	msym = objfile->per_bfd->msymbols.get ());
-       i > 0;
-       i--, msym++)
+       i < mcount;
+       i++, msym++)
     {
       msym->hash_next = 0;
-      add_minsym_to_hash_table (msym, objfile->per_bfd->msymbol_hash);
+      add_minsym_to_hash_table (msym, objfile->per_bfd->msymbol_hash,
+				hash_values[i].minsym_hash);
 
       msym->demangled_hash_next = 0;
       if (MSYMBOL_SEARCH_NAME (msym) != MSYMBOL_LINKAGE_NAME (msym))
-	add_minsym_to_demangled_hash_table (msym, objfile);
+	add_minsym_to_demangled_hash_table
+	  (msym, objfile, hash_values[i].minsym_demangled_hash);
     }
 }
 
@@ -1391,6 +1396,12 @@ 
 		   size_t idx = msym - msymbols;
 		   hash_values[idx].mangled_name_hash = htab_hash_string (msym->name);
 		 }
+		size_t idx = msym - msymbols;
+		hash_values[idx].minsym_hash
+		  = msymbol_hash (MSYMBOL_LINKAGE_NAME (msym));
+		hash_values[idx].minsym_demangled_hash
+		  = search_name_hash (MSYMBOL_LANGUAGE (msym),
+				      MSYMBOL_SEARCH_NAME (msym));
 	     }
 	   {
 	     /* To limit how long we hold the lock, we only acquire it here
@@ -1409,7 +1420,7 @@ 
 	   }
 	 });
 
-      build_minimal_symbol_hash_tables (m_objfile);
+      build_minimal_symbol_hash_tables (m_objfile, hash_values);
     }
 }