Message ID | 20171104161356.17565-3-tom@tromey.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 72851 invoked by alias); 4 Nov 2017 16:14:04 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 72717 invoked by uid 89); 4 Nov 2017 16:14:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=Hx-languages-length:1213 X-HELO: gateway22.websitewelcome.com Received: from gateway22.websitewelcome.com (HELO gateway22.websitewelcome.com) (192.185.47.144) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 04 Nov 2017 16:14:00 +0000 Received: from cm17.websitewelcome.com (cm17.websitewelcome.com [100.42.49.20]) by gateway22.websitewelcome.com (Postfix) with ESMTP id 8B80210DB for <gdb-patches@sourceware.org>; Sat, 4 Nov 2017 11:13:59 -0500 (CDT) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with SMTP id B155eVdw3c72gB155eTVVB; Sat, 04 Nov 2017 11:13:59 -0500 Received: from 71-218-90-63.hlrn.qwest.net ([71.218.90.63]:32796 helo=bapiya.Home) by box5379.bluehost.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from <tom@tromey.com>) id 1eB155-003mhN-A5; Sat, 04 Nov 2017 11:13:59 -0500 From: Tom Tromey <tom@tromey.com> To: gdb-patches@sourceware.org Cc: Tom Tromey <tom@tromey.com> Subject: [RFA 2/2] Simplify the psymbol hash function Date: Sat, 4 Nov 2017 10:13:56 -0600 Message-Id: <20171104161356.17565-3-tom@tromey.com> In-Reply-To: <20171104161356.17565-1-tom@tromey.com> References: <20171104161356.17565-1-tom@tromey.com> X-BWhitelist: no X-Source-L: No X-Exim-ID: 1eB155-003mhN-A5 X-Source-Sender: 71-218-90-63.hlrn.qwest.net (bapiya.Home) [71.218.90.63]:32796 X-Source-Auth: tom+tromey.com X-Email-Count: 4 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes |
Commit Message
Tom Tromey
Nov. 4, 2017, 4:13 p.m. UTC
This patch simplifies the psymbol_hash function, by changing it not to examine the contents of the symbol's name. This change just mirrors what psymbol_compare already does -- it is checking for name equality, which is presumably ok because symbol names are generally interned. This change speeds up psymbol reading. "gdb -nx -batch gdb" previously took ~1.8 seconds on my machine, and with this patch it now takes ~1.7 seconds. gdb/ChangeLog 2017-11-04 Tom Tromey <tom@tromey.com> * psymtab.c (psymbol_hash): Do not hash string contents. --- gdb/ChangeLog | 4 ++++ gdb/psymtab.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-)
Comments
On 11/04/2017 04:13 PM, Tom Tromey wrote: > This patch simplifies the psymbol_hash function, by changing it not to > examine the contents of the symbol's name. This change just mirrors > what psymbol_compare already does -- it is checking for name equality, > which is presumably ok because symbol names are generally interned. Can you expand a bit more on the "presumably ok" part? I think it'd be nice to mention/explain this assumption in a comment in the code. > This change speeds up psymbol reading. "gdb -nx -batch gdb" > previously took ~1.8 seconds on my machine, and with this patch it now > takes ~1.7 seconds. That sounds great however I do wonder whether the bug is the other way around though. What do the statistics say, e.g., debugging gdb and firefox? Do we end up deduping more or fewer symbols in the bcache? I read the original thread that added these custom functions, and the original patch used strcmp in both hash and compare, and then somehow the end result was what we have today. See: https://sourceware.org/ml/gdb-patches/2010-08/msg00218.html and: https://sourceware.org/ml/gdb-patches/2010-08/msg00242.html So I'd love it that your patch is correct. I'd just appreciate a bit more detail since I'm not awfully familiar with this area. Thanks, Pedro Alves
On 11/08/2017 11:12 AM, Pedro Alves wrote: > On 11/04/2017 04:13 PM, Tom Tromey wrote: >> This patch simplifies the psymbol_hash function, by changing it not to >> examine the contents of the symbol's name. This change just mirrors >> what psymbol_compare already does -- it is checking for name equality, >> which is presumably ok because symbol names are generally interned. > > Can you expand a bit more on the "presumably ok" part? I think > it'd be nice to mention/explain this assumption in a comment > in the code. > >> This change speeds up psymbol reading. "gdb -nx -batch gdb" >> previously took ~1.8 seconds on my machine, and with this patch it now >> takes ~1.7 seconds. > > That sounds great however I do wonder whether the bug is the other > way around though. What do the statistics say, e.g., debugging > gdb and firefox? Do we end up deduping more or fewer > symbols in the bcache? TBC, here I meant compared to changing the compare function to do strcmp instead of the pointer comparison. > > I read the original thread that added these custom functions, > and the original patch used strcmp in both hash and compare, > and then somehow the end result was what we have today. > > See: > https://sourceware.org/ml/gdb-patches/2010-08/msg00218.html > and: > https://sourceware.org/ml/gdb-patches/2010-08/msg00242.html > > So I'd love it that your patch is correct. I'd just appreciate > a bit more detail since I'm not awfully familiar with this area. BTW, I didn't quite get it the first time, but I think Sami's comment at: https://sourceware.org/ml/gdb-patches/2010-08/msg00242.html > "that explains how the old hash function worked" is related to this: https://sourceware.org/ml/gdb-patches/2010-08/msg00245.html > "A previous patch of mine introduced a bcache regression :D. The > patch made cplus_specifc a pointer to an allocated struct. This is > because we wanted to store more information in cplus_specific without > penalizing the other other languages. With cplus_specific being a > pointer hashing the whole symbol didn't work anymore. This patch is > an attempt to fix that. So before Sami's "previous patch", it sounds like we were already doing pointer comparisons, simply because we hashed the whole symbol as a block of memory. So now I'm thinking that your patch must be correct. I'd still like to learn more about where is it that we intern the symbol names, though, and I still think it'd be great to add a comment to the code. Thanks, Pedro Alves
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:
Pedro> So now I'm thinking that your patch must be correct. I'd still
Pedro> like to learn more about where is it that we intern the symbol
Pedro> names, though, and I still think it'd be great to add a comment
Pedro> to the code.
It happens via symbol_set_names. There is some subtlety here in that
sometimes a name won't be copied (this saves memory by referring to
strings in the mapped debuginfo), and also IIRC Ada adds some
complexities too.
I can add comments to the psymbol hash and comparison functions to
mention this.
Tom
diff --git a/gdb/psymtab.c b/gdb/psymtab.c index f848990867..d22f70a0c0 100644 --- a/gdb/psymtab.c +++ b/gdb/psymtab.c @@ -1515,7 +1515,7 @@ psymbol_hash (const void *addr, int length) h = hash_continue (&lang, sizeof (unsigned int), h); h = hash_continue (&domain, sizeof (unsigned int), h); h = hash_continue (&theclass, sizeof (unsigned int), h); - h = hash_continue (psymbol->ginfo.name, strlen (psymbol->ginfo.name), h); + h = hash_continue (&psymbol->ginfo.name, sizeof (psymbol->ginfo.name), h); return h; }