Message ID | 20230407-t-robin-hood-hash-v1-0-900d93ef1510@tromey.com |
---|---|
Headers |
Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3AFE8385700C for <patchwork@sourceware.org>; Fri, 7 Apr 2023 15:25:57 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from progateway7-pub.mail.pro1.eigbox.com (gproxy5-pub.mail.unifiedlayer.com [67.222.38.55]) by sourceware.org (Postfix) with ESMTPS id 4D1EA3858D32 for <gdb-patches@sourceware.org>; Fri, 7 Apr 2023 15:25:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4D1EA3858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com Received: from cmgw10.mail.unifiedlayer.com (unknown [10.0.90.125]) by progateway7.mail.pro1.eigbox.com (Postfix) with ESMTP id 9453D1004741A for <gdb-patches@sourceware.org>; Fri, 7 Apr 2023 15:25:40 +0000 (UTC) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with ESMTP id kny8poKEqFhsVkny8pK5pK; Fri, 07 Apr 2023 15:25:40 +0000 X-Authority-Reason: nr=8 X-Authority-Analysis: v=2.4 cv=ELjDb3VC c=1 sm=1 tr=0 ts=643035f4 a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19 a=IkcTkHD0fZMA:10:nop_charset_1 a=dKHAf1wccvYA:10:nop_rcvd_month_year a=Qbun_eYptAEA:10:endurance_base64_authed_username_1 a=zstS-IiYAAAA:8 a=Wbp1zMlJv4lhXT3toE4A:9 a=QEXdDO2ut3YA:10:nop_charset_2 a=4G6NA9xxw8l3yy4pmD5M:22 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=To:Content-Transfer-Encoding:Content-Type:MIME-Version: Message-Id:Date:Subject:From:Sender:Reply-To:Cc:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=NQsILY/2potCo5R4DKylOpCHgjyUCfOQY4VYpsMG0qI=; b=ugxwOgr2eOB6K6G9ohM8mvLwLL R+9poPKNQeO+gRKMxcmKkqs3M+rSQuux89CfG9nRbDr6Tflr0EWu/z0Jq9nOZ1VRQDwVxCO3Xwkm1 Ji7RZEVoryDoOxuJqPQitOg7x; Received: from 75-166-159-36.hlrn.qwest.net ([75.166.159.36]:60392 helo=[192.168.0.21]) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from <tom@tromey.com>) id 1pkny8-001hDU-EQ for gdb-patches@sourceware.org; Fri, 07 Apr 2023 09:25:40 -0600 From: Tom Tromey <tom@tromey.com> Subject: [PATCH 00/19] Add hash table to gdbsupport Date: Fri, 07 Apr 2023 09:25:32 -0600 Message-Id: <20230407-t-robin-hood-hash-v1-0-900d93ef1510@tromey.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAOw1MGQC/x2N0QrCMAwAf2Xk2UC3ygR/RXxI22zJg60kQ4Sxf 7fz8TiO28HZlB3uww7GH3VttcN4GSAL1ZVRS2eYwhTDNdxwQ2tJK0prBYVckMOYU6S5xDJD7xI 5YzKqWc7yRb6xneJtvOj3P3s8j+MHcIaSynwAAAA= To: gdb-patches@sourceware.org X-Mailer: b4 0.12.1 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 75.166.159.36 X-Source-L: No X-Exim-ID: 1pkny8-001hDU-EQ X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 75-166-159-36.hlrn.qwest.net ([192.168.0.21]) [75.166.159.36]:60392 X-Source-Auth: tom+tromey.com X-Email-Count: 1 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-Spam-Status: No, score=-3020.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Add hash table to gdbsupport
|
|
Message
Tom Tromey
April 7, 2023, 3:25 p.m. UTC
I recently read an article about hash tables and was inspired to write a new one for gdb. I haven't converted all the libiberty htab uses in gdb, but this series does change enough of them to, I think, show that the new implementation is workable. The benefits of this approach are explained in the first patch. Regression tested on x86-64 Fedora 36. This found a latent bug in one use of htab_t, see the typedefs patch. Let me know what you think. --- Tom Tromey (19): Add a hash table to gdbsupport Convert compile-c-symbols.c to new hash table Convert filename-seen-cache.h to new hash table Convert linespec.c to new hash table Convert target-descriptions.c to new hash table Convert dwarf2/macro.c to new hash table Convert breakpoint.c to new hash table Convert py-framefilter.c to new hash table Convert disasm.c to new hash table Convert compile/compile.c to new hash table Convert type copying to new hash table Convert static links to new hash table Convert gnu-v3-abi.c to new hash table Convert abbrev cache to new hash table Convert abbrevs to new hash table Convert typedef hash to new hash table Convert all_bfds to new hash table Convert more DWARF code to new hash table Convert gdb_bfd.c to new hash table gdb/Makefile.in | 2 +- gdb/breakpoint.c | 16 +- gdb/compile/compile-c-symbols.c | 54 ++-- gdb/compile/compile-object-run.c | 4 +- gdb/compile/compile.c | 151 +-------- gdb/compile/compile.h | 10 +- gdb/disasm.c | 72 ++--- gdb/dwarf2/abbrev-cache.c | 35 +- gdb/dwarf2/abbrev-cache.h | 37 ++- gdb/dwarf2/abbrev.c | 46 --- gdb/dwarf2/abbrev.h | 54 +++- gdb/dwarf2/cu.c | 50 +-- gdb/dwarf2/cu.h | 10 +- gdb/dwarf2/macro.c | 21 +- gdb/dwarf2/read.c | 55 ++-- gdb/extension-priv.h | 3 +- gdb/extension.c | 3 +- gdb/extension.h | 4 +- gdb/filename-seen-cache.c | 59 ---- gdb/filename-seen-cache.h | 35 +- gdb/gdb_bfd.c | 137 +++----- gdb/gdbtypes.c | 59 +--- gdb/gdbtypes.h | 7 +- gdb/gnu-v3-abi.c | 101 +++--- gdb/guile/guile-internal.h | 2 +- gdb/guile/scm-type.c | 9 +- gdb/guile/scm-value.c | 3 +- gdb/linespec.c | 53 +-- gdb/objfiles.c | 74 +---- gdb/objfiles.h | 4 +- gdb/python/py-framefilter.c | 25 +- gdb/python/py-type.c | 6 +- gdb/python/py-value.c | 3 +- gdb/python/python-internal.h | 2 +- gdb/symfile.c | 2 +- gdb/target-descriptions.c | 17 +- gdb/testsuite/gdb.cp/ptype-flags.exp | 25 +- gdb/testsuite/gdb.gdb/python-helper.exp | 3 +- gdb/typeprint.c | 89 +----- gdb/typeprint.h | 29 +- gdb/unittests/hash-table-selftests.c | 128 ++++++++ gdb/value.c | 17 +- gdb/value.h | 4 +- gdbsupport/Makefile.am | 1 + gdbsupport/Makefile.in | 19 +- gdbsupport/hash-table.cc | 75 +++++ gdbsupport/hash-table.h | 551 ++++++++++++++++++++++++++++++++ 47 files changed, 1199 insertions(+), 967 deletions(-) --- base-commit: 929a05081ec2ca6448927b96f673b0cd9633a342 change-id: 20230407-t-robin-hood-hash-e01cb3a6d3d6 Best regards,
Comments
On 4/7/23 8:25 AM, Tom Tromey wrote: > I recently read an article about hash tables and was inspired to write > a new one for gdb. I haven't converted all the libiberty htab uses in > gdb, but this series does change enough of them to, I think, show that > the new implementation is workable. > > The benefits of this approach are explained in the first patch. > > Regression tested on x86-64 Fedora 36. This found a latent bug in one > use of htab_t, see the typedefs patch. > > Let me know what you think. I just have one possibly naive question which is why not use std::unordered_map<> directly in the various callers? At least, I think it might be useful to explain why GDB includes a bespoke hash table now that we require C++11, it was more obvious I think when GDB was still written in C.
>>>>> "John" == John Baldwin <jhb@FreeBSD.org> writes: John> On 4/7/23 8:25 AM, Tom Tromey wrote: >> I recently read an article about hash tables and was inspired to write >> a new one for gdb. I haven't converted all the libiberty htab uses in >> gdb, but this series does change enough of them to, I think, show that >> the new implementation is workable. >> The benefits of this approach are explained in the first patch. >> Regression tested on x86-64 Fedora 36. This found a latent bug in >> one >> use of htab_t, see the typedefs patch. >> Let me know what you think. John> I just have one possibly naive question which is why not use John> std::unordered_map<> directly in the various callers? At least, I think John> it might be useful to explain why GDB includes a bespoke hash table now John> that we require C++11, it was more obvious I think when GDB was still John> written in C. std::unordered_map has some issues. They may not affect every spot using hash tables in gdb, but they probably affect some of them. unordered_map is necessarily a chained hash table, not an open addressed hash table. And, it has to allocate nodes for entries, because it promises pointer stability. So, it has higher overhead than the hash table in this series. Also, I don't think there's a way to use one type as the key in unordered_map but do lookups using a different type. gdb uses this in some places. I'll add some text to this effect to patch #1. Tom
>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes:
Tom> I recently read an article about hash tables and was inspired to write
Tom> a new one for gdb. I haven't converted all the libiberty htab uses in
Tom> gdb, but this series does change enough of them to, I think, show that
Tom> the new implementation is workable.
Tom> The benefits of this approach are explained in the first patch.
Tom> Regression tested on x86-64 Fedora 36. This found a latent bug in one
Tom> use of htab_t, see the typedefs patch.
Tom> Let me know what you think.
I'd like to move forward with this.
I think most of the patches here are straightforward. The real question
is whether we want it. I've laid out my rationale in patch #1, with
some additions (explaining why this is better than unordered_map /
unordered_set) in a follow-up email.
Tom
>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes:
[ new hash table ]
Tom> I'd like to move forward with this.
Tom> I think most of the patches here are straightforward. The real question
Tom> is whether we want it. I've laid out my rationale in patch #1, with
Tom> some additions (explaining why this is better than unordered_map /
Tom> unordered_set) in a follow-up email.
I tried converting a few more libiberty hash tables to this one.
For a lot of things this is fine, big improvement, etc.
However, this hash table doesn't really admit the possibility that an
element might not have a natural sentinel value. This particularly
comes up with hash maps, where if you want a std::string->whatever map,
you have to introduce some kind of explicit sentinel (in my case I chose
empty string) and also then have to write a trait class.
So, this is one area where unordered_map has the advantage. Now, it
would be possible to handle this in some way. The hash table could
store a sentinel flag, for instance. Or, we could just accept this
drawback and use unordered_map when convenience is important.
Anyway, I tend to think it's not an enormous issue, but I figured I'd
point it out.
Tom
On 1/11/24 10:07 AM, Tom Tromey wrote: >>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes: > > [ new hash table ] > > Tom> I'd like to move forward with this. > > Tom> I think most of the patches here are straightforward. The real question > Tom> is whether we want it. I've laid out my rationale in patch #1, with > Tom> some additions (explaining why this is better than unordered_map / > Tom> unordered_set) in a follow-up email. > > I tried converting a few more libiberty hash tables to this one. > > For a lot of things this is fine, big improvement, etc. > > However, this hash table doesn't really admit the possibility that an > element might not have a natural sentinel value. This particularly > comes up with hash maps, where if you want a std::string->whatever map, > you have to introduce some kind of explicit sentinel (in my case I chose > empty string) and also then have to write a trait class. > > So, this is one area where unordered_map has the advantage. Now, it > would be possible to handle this in some way. The hash table could > store a sentinel flag, for instance. Or, we could just accept this > drawback and use unordered_map when convenience is important.> > Anyway, I tend to think it's not an enormous issue, but I figured I'd > point it out. This seems fine to me.
On 2024-01-11 13:07, Tom Tromey wrote: >>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes: > > [ new hash table ] > > Tom> I'd like to move forward with this. > > Tom> I think most of the patches here are straightforward. The real question > Tom> is whether we want it. I've laid out my rationale in patch #1, with > Tom> some additions (explaining why this is better than unordered_map / > Tom> unordered_set) in a follow-up email. > > I tried converting a few more libiberty hash tables to this one. > > For a lot of things this is fine, big improvement, etc. > > However, this hash table doesn't really admit the possibility that an > element might not have a natural sentinel value. This particularly > comes up with hash maps, where if you want a std::string->whatever map, > you have to introduce some kind of explicit sentinel (in my case I chose > empty string) and also then have to write a trait class. > > So, this is one area where unordered_map has the advantage. Now, it > would be possible to handle this in some way. The hash table could > store a sentinel flag, for instance. Or, we could just accept this > drawback and use unordered_map when convenience is important. > > Anyway, I tend to think it's not an enormous issue, but I figured I'd > point it out. > > Tom I spent a bit of time reading the interface of your hash table, and that was a point I found unfortunate. The type V must have a default constructor and an "empty" state (I guess that's what you mean by sentinel), even if it doesn't make sense for the rest of the program. How do other implementations (of open addessing hash tables) typically deal with this? From what I recally, other implementations I have used in the past didn't have this requirement. And that makes me think that a question I did not see answered is: I don't want to undermine your work, but what is the rationale for implementing it ourselves? Efficient C++ hash maps is a topic that comes up all the time on various programming news sites (I found this recent blog post after a quick google search [1]), surely that's a solved problem. Is there not a popular hash map library we can use as-is? Simon [1] https://martin.ankerl.com/2022/08/27/hashmap-bench-01/
Simon> I spent a bit of time reading the interface of your hash table, and that Simon> was a point I found unfortunate. The type V must have a default Simon> constructor and an "empty" state (I guess that's what you mean by Simon> sentinel), even if it doesn't make sense for the rest of the program. It's maybe worth noting that this is not a step backward from htab_t, which only stores pointers. Simon> How do other implementations (of open addessing hash tables) typically Simon> deal with this? From what I recally, other implementations I have used Simon> in the past didn't have this requirement. I don't know in general. I looked at one and it uses a separate vector of flag bytes to indicate which values are valid. Simon> And that makes me think that a question I did not see answered is: I Simon> don't want to undermine your work, but what is the rationale for Simon> implementing it ourselves? In my view, integrating an external C++ package like this is pain both legally (at least, you have to find one with the appropriate license) and technically. Even just trying to integrate the GCC hash table was a pain. The typical problem, IMO, is that it's unusual to write a standalone class like this, instead one introduces dependencies on a variety of other things. Tom
On 1/12/24 13:22, Tom Tromey wrote: > Simon> I spent a bit of time reading the interface of your hash table, and that > Simon> was a point I found unfortunate. The type V must have a default > Simon> constructor and an "empty" state (I guess that's what you mean by > Simon> sentinel), even if it doesn't make sense for the rest of the program. > > It's maybe worth noting that this is not a step backward from htab_t, > which only stores pointers. That's true, in the sense that you can still decide to store pointers in the gdb::traited_hash_table. But if you want to change it so that it stores objects directly (which I guess is an advantage because it's one less indirection), then you have to introduce this new "invalid" or "empty" state for objects for that class. It therefore removes the guarantee you previously had for objects of that class to always be valid, and it becomes possible for objects used elsewhere to be (by mistake or not) in that new "empty" state. It's one more thing to reason about, and some more potential pitfalls. > Simon> How do other implementations (of open addessing hash tables) typically > Simon> deal with this? From what I recally, other implementations I have used > Simon> in the past didn't have this requirement. > > I don't know in general. I looked at one and it uses a separate vector > of flag bytes to indicate which values are valid. I looked at this one: https://github.com/martinus/unordered_dense/tree/main From what I understand, the underlying storage is a vector, and they somehow make it so all vector elements are always occupied, there are no empty slots. > Simon> And that makes me think that a question I did not see answered is: I > Simon> don't want to undermine your work, but what is the rationale for > Simon> implementing it ourselves? > > In my view, integrating an external C++ package like this is pain both > legally (at least, you have to find one with the appropriate license) > and technically. Even just trying to integrate the GCC hash table was a > pain. The typical problem, IMO, is that it's unusual to write a > standalone class like this, instead one introduces dependencies on a > variety of other things. I did have some experiences (with folly notably) where trying to use one data structure pulled in a bunch of files and it quickly became unmanageable, but that was not the norm. But in my experience, most of the time it's just one header file to copy over, with no dependency. The one I linked above seems to be just that (I gave it a quick try in GDB). And the license being MIT, I don't think that's a problem. My understanding is that it's permitted in this direction, and we use other MIT-licensed stuff. Simon