[1/2] gdb/dwarf: use gdb::unordered_map for dwarf2_per_bfd::{quick_file_names_table, type_unit_groups}
Checks
Commit Message
From: Simon Marchi <simon.marchi@efficios.com>
Change these two hash tables to use gdb::unordered_map. I changed these
two at the same time because they both use the same key, a
stmt_list_hash. Unlike other previous patches that used a
gdb::unordered_set, use an unordered_map here because the key isn't
found in the element itself (well, it was before, because of how htab
works, but it didn't need to be).
You'll notice that the type_unit_group structure is empty. That
structure isn't really needed. It is removed in the following patch.
Regression tested on Debian 12 amd64 with a bunch of DWARF target
boards.
Change-Id: Iec2289958d0f755cab8198f5b72ecab48358ba11
---
gdb/dwarf2/read-debug-names.c | 5 +-
gdb/dwarf2/read-gdb-index.c | 2 -
gdb/dwarf2/read.c | 189 ++++++++--------------------------
gdb/dwarf2/read.h | 21 ++--
4 files changed, 60 insertions(+), 157 deletions(-)
base-commit: a7e5d97c123b5164460a604a154a239fbcfadd86
Comments
>>>>> "Simon" == simon marchi <simon.marchi@polymtl.ca> writes:
Simon> From: Simon Marchi <simon.marchi@efficios.com>
Simon> Change these two hash tables to use gdb::unordered_map. I changed these
Simon> two at the same time because they both use the same key, a
Simon> stmt_list_hash. Unlike other previous patches that used a
Simon> gdb::unordered_set, use an unordered_map here because the key isn't
Simon> found in the element itself (well, it was before, because of how htab
Simon> works, but it didn't need to be).
Simon> You'll notice that the type_unit_group structure is empty. That
Simon> structure isn't really needed. It is removed in the following patch.
Simon> Regression tested on Debian 12 amd64 with a bunch of DWARF target
Simon> boards.
Thanks for doing this.
Simon> +struct stmt_list_hash;
Simon> +
Simon> +struct stmt_list_hash_hash
Simon> +{
Simon> + using is_avalanching = void;
I mildly suspect that this isn't really the case, but at the same time I
wonder whether it matters.
Approved-By: Tom Tromey <tom@tromey.com>
Tom
On 2025-03-18 09:02, Tom Tromey wrote:
>>>>>> "Simon" == simon marchi <simon.marchi@polymtl.ca> writes:
>
> Simon> From: Simon Marchi <simon.marchi@efficios.com>
> Simon> Change these two hash tables to use gdb::unordered_map. I changed these
> Simon> two at the same time because they both use the same key, a
> Simon> stmt_list_hash. Unlike other previous patches that used a
> Simon> gdb::unordered_set, use an unordered_map here because the key isn't
> Simon> found in the element itself (well, it was before, because of how htab
> Simon> works, but it didn't need to be).
>
> Simon> You'll notice that the type_unit_group structure is empty. That
> Simon> structure isn't really needed. It is removed in the following patch.
>
> Simon> Regression tested on Debian 12 amd64 with a bunch of DWARF target
> Simon> boards.
>
> Thanks for doing this.
>
> Simon> +struct stmt_list_hash;
> Simon> +
> Simon> +struct stmt_list_hash_hash
> Simon> +{
> Simon> + using is_avalanching = void;
>
> I mildly suspect that this isn't really the case, but at the same time I
> wonder whether it matters.
The hash function uses ankerl::unordered_dense::hash (i.e. wyhash),
why would it not be avalanching?
Simon
>>>>> "Simon" == Simon Marchi <simon.marchi@efficios.com> writes:
>> I mildly suspect that this isn't really the case, but at the same time I
>> wonder whether it matters.
Simon> The hash function uses ankerl::unordered_dense::hash (i.e. wyhash),
Simon> why would it not be avalanching?
It sums a few hash values. Is that sufficiently avalanching?
I really don't know but I sort of suspect it isn't in some theoretical
way.
Tom
On 3/18/25 2:52 PM, Tom Tromey wrote:
>>>>>> "Simon" == Simon Marchi <simon.marchi@efficios.com> writes:
>
>>> I mildly suspect that this isn't really the case, but at the same time I
>>> wonder whether it matters.
>
> Simon> The hash function uses ankerl::unordered_dense::hash (i.e. wyhash),
> Simon> why would it not be avalanching?
>
> It sums a few hash values. Is that sufficiently avalanching?
> I really don't know but I sort of suspect it isn't in some theoretical
> way.
I was also wondering, and I would like to get an informed opinion about
that.
My intuition is that if you do:
avalanching_hash(x) + avalanching_hash(y)
... then the result must also be avalanching. If you change just 1 bit
of x, then the result of avalanching_hash(x) will be completely
unrelated to the previous result of avalanching_hash(x) (that's the
definition of avalanching). So the result of the sum will be as
unpredictable. If that's not the case, I'd like to understand why.
Also, would it make a difference to use xor instead of plus?
I will push the series, since it's really just a detail, but I'd like to
know the answer to the question above, just to know what the best
practices are.
Simon
>>>>> "Simon" == Simon Marchi <simon.marchi@efficios.com> writes:
Simon> I was also wondering, and I would like to get an informed opinion about
Simon> that.
From https://en.wikipedia.org/wiki/Avalanche_effect
The strict avalanche criterion (SAC) is a formalization of the avalanche
effect. It is satisfied if, whenever a single input bit is complemented,
each of the output bits changes with a 50% probability.
I think it's reasonably clear that this doesn't hold for +.
Unclear whether unordered_dense is referring to strict avalanche or some
weaker form.
Simon> I will push the series, since it's really just a detail, but I'd like to
Simon> know the answer to the question above, just to know what the best
Simon> practices are.
It's mildly weird IMO that this hash table cares about avalanching. I
mean, that's definitely important in some use cases, but not generally
in gdb, I think. Maybe we should just always set is_avalanching as a
policy, to avoid double hashing.
One example where you don't really need this in gdb is abbrevs: these
are normally sequential, and using the abbrev number as the hash is
totally reasonable. With typical DWARF there's never a clash and the
hash table is really just a fancy way of writing a vector. Using a more
sophisticated hash here wouldn't cause a bug, but it would be spending
some CPU for no benefit.
Tom
@@ -424,13 +424,12 @@ cooked_index_worker_debug_names::do_reading ()
exceptions.push_back (std::move (exc));
}
- dwarf2_per_bfd *per_bfd = m_per_objfile->per_bfd;
- per_bfd->quick_file_names_table
- = create_quick_file_names_table (per_bfd->all_units.size ());
m_results.emplace_back (nullptr,
complaint_handler.release (),
std::move (exceptions),
parent_map ());
+
+ dwarf2_per_bfd *per_bfd = m_per_objfile->per_bfd;
cooked_index *table
= (gdb::checked_static_cast<cooked_index *>
(per_bfd->index_table.get ()));
@@ -1558,8 +1558,6 @@ dwarf2_read_gdb_index
set_main_name_from_gdb_index (per_objfile, map.get ());
per_bfd->index_table = std::move (map);
- per_bfd->quick_file_names_table =
- create_quick_file_names_table (per_bfd->all_units.size ());
return true;
}
@@ -270,6 +270,8 @@ struct loclists_rnglists_header
struct stmt_list_hash
{
+ bool operator== (const stmt_list_hash &other) const noexcept;
+
/* The DWO unit this table is from or NULL if there is none. */
struct dwo_unit *dwo_unit;
@@ -284,12 +286,8 @@ struct stmt_list_hash
struct type_unit_group
{
- /* The data used to construct the hash key. */
- struct stmt_list_hash hash {};
};
-using type_unit_group_up = std::unique_ptr<type_unit_group>;
-
/* These sections are what may appear in a (real or virtual) DWO file. */
struct dwo_sections
@@ -1535,9 +1533,6 @@ dwarf2_per_bfd::start_reading (dwarf_scanner_base_up new_table)
line_header when we're done and don't need to record it here. */
struct quick_file_names
{
- /* The data used to construct the hash key. */
- struct stmt_list_hash hash;
-
/* The number of entries in file_names, real_names. */
unsigned int num_file_names;
@@ -1579,64 +1574,34 @@ struct readnow_functions : public dwarf2_base_index_functions
}
};
-/* Utility hash function for a stmt_list_hash. */
-
-static hashval_t
-hash_stmt_list_entry (const struct stmt_list_hash *stmt_list_hash)
-{
- hashval_t v = 0;
-
- if (stmt_list_hash->dwo_unit != NULL)
- v += (uintptr_t) stmt_list_hash->dwo_unit->dwo_file;
- v += to_underlying (stmt_list_hash->line_sect_off);
- return v;
-}
-
-/* Utility equality function for a stmt_list_hash. */
+/* See read.h. */
-static int
-eq_stmt_list_entry (const struct stmt_list_hash *lhs,
- const struct stmt_list_hash *rhs)
+std::uint64_t
+stmt_list_hash_hash::operator() (const stmt_list_hash &key) const noexcept
{
- if ((lhs->dwo_unit != NULL) != (rhs->dwo_unit != NULL))
- return 0;
- if (lhs->dwo_unit != NULL
- && lhs->dwo_unit->dwo_file != rhs->dwo_unit->dwo_file)
- return 0;
-
- return lhs->line_sect_off == rhs->line_sect_off;
-}
+ std::uint64_t v = 0;
-/* Hash function for a quick_file_names. */
+ if (key.dwo_unit != nullptr)
+ v += ankerl::unordered_dense::hash<dwo_file *> () (key.dwo_unit->dwo_file);
-static hashval_t
-hash_file_name_entry (const void *e)
-{
- const struct quick_file_names *file_data
- = (const struct quick_file_names *) e;
-
- return hash_stmt_list_entry (&file_data->hash);
+ v += (ankerl::unordered_dense::hash<std::uint64_t> ()
+ (to_underlying (key.line_sect_off)));
+ return v;
}
-/* Equality function for a quick_file_names. */
+/* See read.h. */
-static int
-eq_file_name_entry (const void *a, const void *b)
+bool
+stmt_list_hash::operator== (const stmt_list_hash &rhs) const noexcept
{
- const struct quick_file_names *ea = (const struct quick_file_names *) a;
- const struct quick_file_names *eb = (const struct quick_file_names *) b;
-
- return eq_stmt_list_entry (&ea->hash, &eb->hash);
-}
+ if ((this->dwo_unit != nullptr) != (rhs.dwo_unit != nullptr))
+ return false;
-/* See read.h. */
+ if (this->dwo_unit != nullptr
+ && this->dwo_unit->dwo_file != rhs.dwo_unit->dwo_file)
+ return false;
-htab_up
-create_quick_file_names_table (unsigned int nr_initial_entries)
-{
- return htab_up (htab_create_alloc (nr_initial_entries,
- hash_file_name_entry, eq_file_name_entry,
- nullptr, xcalloc, xfree));
+ return this->line_sect_off == rhs.line_sect_off;
}
/* Read in CU (dwarf2_cu object) for PER_CU in the context of PER_OBJFILE. This
@@ -1759,9 +1724,7 @@ dw2_get_file_names_reader (dwarf2_cu *cu, die_info *comp_unit_die)
{
dwarf2_per_cu *this_cu = cu->per_cu;
dwarf2_per_objfile *per_objfile = cu->per_objfile;
- struct attribute *attr;
- void **slot;
- struct quick_file_names *qfn;
+ dwarf2_per_bfd *per_bfd = per_objfile->per_bfd;
gdb_assert (! this_cu->is_debug_types);
@@ -1771,29 +1734,24 @@ dw2_get_file_names_reader (dwarf2_cu *cu, die_info *comp_unit_die)
if (comp_unit_die->tag == DW_TAG_partial_unit)
return;
- slot = NULL;
-
line_header_up lh;
- sect_offset line_offset {};
file_and_directory &fnd = find_file_and_directory (comp_unit_die, cu);
+ std::optional<stmt_list_hash> stmt_list_hash_key;
+ attribute *attr = dwarf2_attr (comp_unit_die, DW_AT_stmt_list, cu);
- attr = dwarf2_attr (comp_unit_die, DW_AT_stmt_list, cu);
if (attr != nullptr && attr->form_is_unsigned ())
{
- struct quick_file_names find_entry;
-
- line_offset = (sect_offset) attr->as_unsigned ();
+ sect_offset line_offset = (sect_offset) attr->as_unsigned ();
/* We may have already read in this line header (TU line header sharing).
If we have we're done. */
- find_entry.hash.dwo_unit = cu->dwo_unit;
- find_entry.hash.line_sect_off = line_offset;
- slot = htab_find_slot (per_objfile->per_bfd->quick_file_names_table.get (),
- &find_entry, INSERT);
- if (*slot != NULL)
+ stmt_list_hash_key = {cu->dwo_unit, line_offset};
+
+ if (auto it = per_bfd->quick_file_names_table.find (*stmt_list_hash_key);
+ it != per_bfd->quick_file_names_table.end ())
{
- this_cu->file_names = (struct quick_file_names *) *slot;
+ this_cu->file_names = it->second;
return;
}
@@ -1806,12 +1764,11 @@ dw2_get_file_names_reader (dwarf2_cu *cu, die_info *comp_unit_die)
else if (lh == nullptr)
return;
- qfn = XOBNEW (&per_objfile->per_bfd->obstack, struct quick_file_names);
- qfn->hash.dwo_unit = cu->dwo_unit;
- qfn->hash.line_sect_off = line_offset;
+ auto *qfn = XOBNEW (&per_bfd->obstack, quick_file_names);
+
/* There may not be a DW_AT_stmt_list. */
- if (slot != nullptr)
- *slot = qfn;
+ if (stmt_list_hash_key.has_value ())
+ per_bfd->quick_file_names_table.emplace (*stmt_list_hash_key, qfn);
std::vector<const char *> include_names;
if (lh != nullptr)
@@ -1831,9 +1788,8 @@ dw2_get_file_names_reader (dwarf2_cu *cu, die_info *comp_unit_die)
qfn->num_file_names = offset + include_names.size ();
qfn->comp_dir = fnd.intern_comp_dir (per_objfile->objfile);
- qfn->file_names =
- XOBNEWVEC (&per_objfile->per_bfd->obstack, const char *,
- qfn->num_file_names);
+ qfn->file_names
+ = XOBNEWVEC (&per_bfd->obstack, const char *, qfn->num_file_names);
if (offset != 0)
qfn->file_names[0] = per_objfile->objfile->intern (fnd.get_name ());
@@ -2348,9 +2304,6 @@ dwarf2_initialize_objfile (struct objfile *objfile,
dwarf_read_debug_printf ("readnow requested");
create_all_units (per_objfile);
- per_bfd->quick_file_names_table
- = create_quick_file_names_table (per_bfd->all_units.size ());
-
objfile->qf.emplace_front (new readnow_functions);
}
/* Was a GDB index already read when we processed an objfile sharing
@@ -3342,56 +3295,12 @@ cutu_reader::cutu_reader (dwarf2_per_cu *this_cu,
together. A future step could be to put the types in the same symtab as
the CU the types ultimately came from. */
-static hashval_t
-hash_type_unit_group (const void *item)
-{
- const struct type_unit_group *tu_group
- = (const struct type_unit_group *) item;
-
- return hash_stmt_list_entry (&tu_group->hash);
-}
-
-static int
-eq_type_unit_group (const void *item_lhs, const void *item_rhs)
-{
- const struct type_unit_group *lhs = (const struct type_unit_group *) item_lhs;
- const struct type_unit_group *rhs = (const struct type_unit_group *) item_rhs;
-
- return eq_stmt_list_entry (&lhs->hash, &rhs->hash);
-}
-
-/* Allocate a hash table for type unit groups. */
-
-static htab_up
-allocate_type_unit_groups_table ()
-{
- return htab_up (htab_create_alloc (3,
- hash_type_unit_group,
- eq_type_unit_group,
- htab_delete_entry<type_unit_group>,
- xcalloc, xfree));
-}
-
/* Type units that don't have DW_AT_stmt_list are grouped into their own
partial symtabs. We combine several TUs per psymtab to not let the size
of any one psymtab grow too big. */
#define NO_STMT_LIST_TYPE_UNIT_PSYMTAB (1 << 31)
#define NO_STMT_LIST_TYPE_UNIT_PSYMTAB_SIZE 10
-/* Helper routine for get_type_unit_group.
- Create the type_unit_group object used to hold one or more TUs. */
-
-static type_unit_group_up
-create_type_unit_group (struct dwarf2_cu *cu, sect_offset line_offset_struct)
-{
- auto tu_group = std::make_unique<type_unit_group> ();
-
- tu_group->hash.dwo_unit = cu->dwo_unit;
- tu_group->hash.line_sect_off = line_offset_struct;
-
- return tu_group;
-}
-
/* Look up the type_unit_group for type unit CU, and create it if necessary.
STMT_LIST is a DW_AT_stmt_list attribute. */
@@ -3399,14 +3308,9 @@ static struct type_unit_group *
get_type_unit_group (struct dwarf2_cu *cu, const struct attribute *stmt_list)
{
dwarf2_per_objfile *per_objfile = cu->per_objfile;
+ dwarf2_per_bfd *per_bfd = per_objfile->per_bfd;
struct tu_stats *tu_stats = &per_objfile->per_bfd->tu_stats;
- struct type_unit_group *tu_group;
- void **slot;
unsigned int line_offset;
- struct type_unit_group type_unit_group_for_lookup;
-
- if (per_objfile->per_bfd->type_unit_groups == NULL)
- per_objfile->per_bfd->type_unit_groups = allocate_type_unit_groups_table ();
/* Do we need to create a new group, or can we use an existing one? */
@@ -3428,21 +3332,16 @@ get_type_unit_group (struct dwarf2_cu *cu, const struct attribute *stmt_list)
++tu_stats->nr_stmt_less_type_units;
}
- type_unit_group_for_lookup.hash.dwo_unit = cu->dwo_unit;
- type_unit_group_for_lookup.hash.line_sect_off = (sect_offset) line_offset;
- slot = htab_find_slot (per_objfile->per_bfd->type_unit_groups.get (),
- &type_unit_group_for_lookup, INSERT);
- if (*slot == nullptr)
+ stmt_list_hash key {cu->dwo_unit, static_cast<sect_offset> (line_offset)};
+ auto [it, inserted] = per_bfd->type_unit_groups.emplace (key, nullptr);
+
+ if (inserted)
{
- sect_offset line_offset_struct = (sect_offset) line_offset;
- auto grp = create_type_unit_group (cu, line_offset_struct);
- *slot = grp.release ();
+ (*it).second = std::make_unique<type_unit_group> ();
++tu_stats->nr_symtabs;
}
- tu_group = (struct type_unit_group *) *slot;
- gdb_assert (tu_group != nullptr);
- return tu_group;
+ return it->second.get ();
}
/* Subroutine of dwarf2_build_psymtabs_hard to simplify it.
@@ -3548,7 +3447,7 @@ build_type_psymtabs (dwarf2_per_objfile *per_objfile,
sect_offset abbrev_offset;
/* It's up to the caller to not call us multiple times. */
- gdb_assert (per_objfile->per_bfd->type_unit_groups == NULL);
+ gdb_assert (per_objfile->per_bfd->type_unit_groups.empty ());
if (per_objfile->per_bfd->all_type_units.size () == 0)
return;
@@ -3803,8 +3702,6 @@ cooked_index_worker_debug_info::do_reading ()
create_all_units (m_per_objfile);
build_type_psymtabs (m_per_objfile, &m_index_storage);
- per_bfd->quick_file_names_table
- = create_quick_file_names_table (per_bfd->all_units.size ());
if (!per_bfd->debug_aranges.empty ())
read_addrmap_from_aranges (m_per_objfile, &per_bfd->debug_aranges,
m_index_storage.get_addrmap (),
@@ -467,6 +467,17 @@ struct dwp_file;
using dwp_file_up = std::unique_ptr<dwp_file>;
+struct stmt_list_hash;
+
+struct stmt_list_hash_hash
+{
+ using is_avalanching = void;
+
+ std::uint64_t operator() (const stmt_list_hash &key) const noexcept;
+};
+
+using type_unit_group_up = std::unique_ptr<type_unit_group>;
+
/* Some DWARF data can be shared across objfiles who share the same BFD,
this data is stored in this object.
@@ -605,7 +616,8 @@ struct dwarf2_per_bfd
/* Table of struct type_unit_group objects.
The hash key is the DW_AT_stmt_list value. */
- htab_up type_unit_groups;
+ gdb::unordered_map<stmt_list_hash, type_unit_group_up, stmt_list_hash_hash>
+ type_unit_groups;
/* Set of signatured_types, used to look up by signature. */
signatured_type_set signatured_types;
@@ -644,7 +656,8 @@ struct dwarf2_per_bfd
sorted all the TUs into "type unit groups", grouped by their
DW_AT_stmt_list value. Therefore the only sharing done here is with a
CU and its associated TU group if there is one. */
- htab_up quick_file_names_table;
+ gdb::unordered_map<stmt_list_hash, quick_file_names *, stmt_list_hash_hash>
+ quick_file_names_table;
/* The CUs we recently read. */
std::vector<dwarf2_per_cu *> just_read_cus;
@@ -1195,10 +1208,6 @@ extern void finalize_all_units (dwarf2_per_bfd *per_bfd);
extern void create_all_units (dwarf2_per_objfile *per_objfile);
-/* Create a quick_file_names hash table. */
-
-extern htab_up create_quick_file_names_table (unsigned int nr_initial_entries);
-
/* Find the base address of the compilation unit for range lists and
location lists. It will normally be specified by DW_AT_low_pc.
In DWARF-3 draft 4, the base address could be overridden by