[RFA,2/2] Simplify the psymbol hash function

Message ID 20171104161356.17565-3-tom@tromey.com
State New, archived
Headers

Commit Message

Tom Tromey Nov. 4, 2017, 4:13 p.m. UTC
  This patch simplifies the psymbol_hash function, by changing it not to
examine the contents of the symbol's name.  This change just mirrors
what psymbol_compare already does -- it is checking for name equality,
which is presumably ok because symbol names are generally interned.

This change speeds up psymbol reading.  "gdb -nx -batch gdb"
previously took ~1.8 seconds on my machine, and with this patch it now
takes ~1.7 seconds.

gdb/ChangeLog
2017-11-04  Tom Tromey  <tom@tromey.com>

	* psymtab.c (psymbol_hash): Do not hash string contents.
---
 gdb/ChangeLog | 4 ++++
 gdb/psymtab.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)
  

Comments

Pedro Alves Nov. 8, 2017, 11:12 a.m. UTC | #1
On 11/04/2017 04:13 PM, Tom Tromey wrote:
> This patch simplifies the psymbol_hash function, by changing it not to
> examine the contents of the symbol's name.  This change just mirrors
> what psymbol_compare already does -- it is checking for name equality,
> which is presumably ok because symbol names are generally interned.

Can you expand a bit more on the "presumably ok" part?  I think
it'd be nice to mention/explain this assumption in a comment
in the code.

> This change speeds up psymbol reading.  "gdb -nx -batch gdb"
> previously took ~1.8 seconds on my machine, and with this patch it now
> takes ~1.7 seconds.

That sounds great however I do wonder whether the bug is the other 
way around though.  What do the statistics say, e.g., debugging
gdb and firefox?  Do we end up deduping more or fewer
symbols in the bcache?

I read the original thread that added these custom functions,
and the original patch used strcmp in both hash and compare,
and then somehow the end result was what we have today.

See:
 https://sourceware.org/ml/gdb-patches/2010-08/msg00218.html
and:
 https://sourceware.org/ml/gdb-patches/2010-08/msg00242.html

So I'd love it that your patch is correct.  I'd just appreciate
a bit more detail since I'm not awfully familiar with this area.

Thanks,
Pedro Alves
  
Pedro Alves Nov. 8, 2017, 11:42 a.m. UTC | #2
On 11/08/2017 11:12 AM, Pedro Alves wrote:
> On 11/04/2017 04:13 PM, Tom Tromey wrote:
>> This patch simplifies the psymbol_hash function, by changing it not to
>> examine the contents of the symbol's name.  This change just mirrors
>> what psymbol_compare already does -- it is checking for name equality,
>> which is presumably ok because symbol names are generally interned.
> 
> Can you expand a bit more on the "presumably ok" part?  I think
> it'd be nice to mention/explain this assumption in a comment
> in the code.
> 
>> This change speeds up psymbol reading.  "gdb -nx -batch gdb"
>> previously took ~1.8 seconds on my machine, and with this patch it now
>> takes ~1.7 seconds.
> 
> That sounds great however I do wonder whether the bug is the other 
> way around though.  What do the statistics say, e.g., debugging
> gdb and firefox?  Do we end up deduping more or fewer
> symbols in the bcache?

TBC, here I meant compared to changing the compare function
to do strcmp instead of the pointer comparison.

> 
> I read the original thread that added these custom functions,
> and the original patch used strcmp in both hash and compare,
> and then somehow the end result was what we have today.
> 
> See:
>  https://sourceware.org/ml/gdb-patches/2010-08/msg00218.html
> and:
>  https://sourceware.org/ml/gdb-patches/2010-08/msg00242.html
> 
> So I'd love it that your patch is correct.  I'd just appreciate
> a bit more detail since I'm not awfully familiar with this area.

BTW, I didn't quite get it the first time, but I think Sami's
comment at:
 https://sourceware.org/ml/gdb-patches/2010-08/msg00242.html

> "that explains how the old hash function worked"

is related to this:
  https://sourceware.org/ml/gdb-patches/2010-08/msg00245.html

> "A previous patch of mine introduced a bcache regression :D. The
> patch made cplus_specifc a pointer to an allocated struct. This is
> because we wanted to store more information in cplus_specific without
> penalizing the other other languages. With cplus_specific being a
> pointer hashing the whole symbol didn't work anymore. This patch is
> an attempt to fix that.

So before Sami's "previous patch", it sounds like we were already doing
pointer comparisons, simply because we hashed the whole symbol as a
block of memory.

So now I'm thinking that your patch must be correct.  I'd still
like to learn more about where is it that we intern the symbol
names, though, and I still think it'd be great to add a comment
to the code.

Thanks,
Pedro Alves
  
Tom Tromey Nov. 8, 2017, 7:08 p.m. UTC | #3
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:

Pedro> So now I'm thinking that your patch must be correct.  I'd still
Pedro> like to learn more about where is it that we intern the symbol
Pedro> names, though, and I still think it'd be great to add a comment
Pedro> to the code.

It happens via symbol_set_names.  There is some subtlety here in that
sometimes a name won't be copied (this saves memory by referring to
strings in the mapped debuginfo), and also IIRC Ada adds some
complexities too.

I can add comments to the psymbol hash and comparison functions to
mention this.

Tom
  

Patch

diff --git a/gdb/psymtab.c b/gdb/psymtab.c
index f848990867..d22f70a0c0 100644
--- a/gdb/psymtab.c
+++ b/gdb/psymtab.c
@@ -1515,7 +1515,7 @@  psymbol_hash (const void *addr, int length)
   h = hash_continue (&lang, sizeof (unsigned int), h);
   h = hash_continue (&domain, sizeof (unsigned int), h);
   h = hash_continue (&theclass, sizeof (unsigned int), h);
-  h = hash_continue (psymbol->ginfo.name, strlen (psymbol->ginfo.name), h);
+  h = hash_continue (&psymbol->ginfo.name, sizeof (psymbol->ginfo.name), h);
 
   return h;
 }