gdb/dwarf: fix order of operations when reading .debug_names

Message ID 20260519200917.342813-1-simon.marchi@efficios.com
State New
Headers
Series gdb/dwarf: fix order of operations when reading .debug_names |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

Simon Marchi May 19, 2026, 8:09 p.m. UTC
  From: Simon Marchi <simon.marchi@polymtl.ca>

PR 34149 shows a hard to reproduce bug.  The problem sometimes shows up
when reading in `.debug_names` indexes, manifesting like this:

    warning: Section .debug_names has incorrect entry in CU table, ignoring .debug_names.

This is shown even if the CU table in the `.debug_names` index
accurately describes the CUs in `.debug_info`.

The bug was introduced by commit b301725d35c1 ("gdb/dwarf: read foreign
type units").

Before b301725d35c1, the steps when reading a `.debug_names` index were:

 - call to create_all_units(), which filled the all_units vector by
   walking `.debug_info` and `.debug_types` and then sorted it.
 - calls to the build_and_check_* functions, which do some lookups in
   the all_units vector, requiring it to be sorted.

All good.  Post-b301725d35c1, we have:

 - call to create_all_units(), which still fills the all_units vector by
   walking `.debug_info` and `.debug_types`, but does not sort it
   anymore.
 - calls to the build_and_check_* functions, which do some lookups in
   the all_units vector, requiring it to be sorted (oops)
 - call to create_foreign_type_units_from_debug_names(), which may add
   more items to the all_units vector.
 - call to finalize_all_units(), to sort the all_units vector.

The sorting of all_units was taken out of create_all_units() on purpose.
Because create_foreign_type_units_from_debug_names() may add more units,
we don't want to sort the vector in create_all_units() and then sort it
again after create_foreign_type_units_from_debug_names().

The build_and_check_* functions do some lookups in the all_units vector
with dwarf2_find_unit(), which uses std::lower_bound() internally.  This
happens while the all_units vector is not sorted according to the
sorting key, all_units_less_than(), which makes the search is bogus.
dwarf2_find_unit() might return "no such unit", when in fact, the given
unit exists.  This leads build_and_check_cu_list_from_debug_names() to
believe that the list of CUs in the `.debug_names` index does not match
what was built from reading `.debug_info`, we get the warning shown
above, and the index gets wrongfully rejected.

For this bug to show up, it is necessary to have type units in a
`.debug_types`.  The sorting key for units in the all_units vector is:
section pointer (`dwarf2_section_info *`), then the unit offset into the
section.  If all units come from the same section (`.debug_info`), then
this bug does not show up because the all_units vector happens to be
sorted after create_all_units() runs.  This happens for example in test
gdb.dwarf2/debug-names-bad-cu-index.exp, where we create some DWARF 4
units and then a `.debug_names` index out of them.

It must also happen for the `dwarf2_section_info *` for `.debug_types`
to be "less than" the `dwarf2_section_info *` for `.debug_info`,
otherwise the all_units also happens to be sorted after
create_all_units() runs.  This entirely depends on the whims of the
memory allocator.

I was not able to reproduce it in my normal GDB dev build, but I was
able in a "thread sanitizer" build, for some reason.  Not because there
is an actual threading issue, but I think because it shuffled the memory
allocations in just the right way to make the bug happen.  And then, it
only happened when doing "make check", I could not get it to reproduce
when running interactively.  Using -D_GLIBCXX_DEBUG was immensely
useful, because it pointed out that the "range must be sorted"
precondition for std::lower_bound() was not met, which made the problem
immediately obvious:

    (gdb) file /home/simark/build/binutils-gdb-tsan/gdb/testsuite/outputs/gdb.dwarf2/debug-names-bad-cu-index/debug-names-bad-cu-index
    Reading symbols from /home/simark/build/binutils-gdb-tsan/gdb/testsuite/outputs/gdb.dwarf2/debug-names-bad-cu-index/debug-names-bad-cu-index...
    /usr/include/c++/16.1.1/bits/stl_algo.h:1981:
    In function:
        constexpr _FIter std::lower_bound(_FIter, _FIter, const _Tp&, _Compare)
        [with _FIter = gnu_debug::_Safe_iterator<gnu_cxx::
        normal_iterator<unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter>*,
        vector<unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter>,
        allocator<unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter> > > >,
        debug::vector<unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter> >,
        random_access_iterator_tag>; _Tp = section_and_offset; _Compare =
        dwarf2_find_unit(const section_and_offset&,
        dwarf2_per_bfd*)::<lambda(const dwarf2_per_cu_up&, const
        section_and_offset&)>]

    Error: elements in iterator range [first, last) are not partitioned by the
    predicate __comp and value __val.

Knowing all this, is it easy to artificially reproduce the problem by
making create_all_units() reverse the all_units vector, like so:

    // Reverse the all_units vector
    std::reverse (per_objfile->per_bfd->all_units.begin (),
                  per_objfile->per_bfd->all_units.end ());

The fix is to reorder the operations in this way

 - call create_all_units(), to fill the all_units vector by walking
   `.debug_info` and `.debug_types`
 - call create_foreign_type_units_from_debug_names(), which may add more
   items to the all_units vector.
 - call finalize_all_units(), to sort the all_units vector.
 - call the build_and_check_* functions, which is fine now that
   all_units is sorted.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=34149
Change-Id: I315f391605af549b55341a167683d2dc6203bcea
---
 gdb/dwarf2/read-debug-names.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)


base-commit: 5100d2b65d6fb6d571893be40b2849b3cb2abe59
  

Comments

Tom Tromey May 20, 2026, 3:15 p.m. UTC | #1
>>>>> "Simon" == Simon Marchi <simon.marchi@efficios.com> writes:

Simon>  - calls to the build_and_check_* functions, which do some lookups in
Simon>    the all_units vector, requiring it to be sorted (oops)

I wonder if either there's some way to assert that the vector must be
sorted on entry, or if there's a way to have a self-sorting data type
here, rendering this kind of bug impossible.

Anyway I think this is ok.
Approved-By: Tom Tromey <tom@tromey.com>

Tom
  
Simon Marchi May 20, 2026, 3:38 p.m. UTC | #2
On 2026-05-20 11:15, Tom Tromey wrote:
>>>>>> "Simon" == Simon Marchi <simon.marchi@efficios.com> writes:
> 
> Simon>  - calls to the build_and_check_* functions, which do some lookups in
> Simon>    the all_units vector, requiring it to be sorted (oops)
> 
> I wonder if either there's some way to assert that the vector must be
> sorted on entry, or if there's a way to have a self-sorting data type
> here, rendering this kind of bug impossible.

assert: in dwarf2_find_unit, we could check that the vector is sorted,
but that would replicate the check done by -D_GLIBCXX_DEBUG.  I'm not
sure we want to pay this price for every dwarf2_find_unit call in
production binaries.

Alternatively, we could maintain a flag that says "is all_units
currently sorted", clear that flag when doing any modification to the
vector, and set it in the sort_all_units method.  All modifications to
the vector should be done through methods to ensure the flag gets
cleared.  I think that would be cheap enough to implement, I'll give it
a try.

self-sorting data type: we could insert the new units at the right place
from the start, but that would be less efficient, because that would be
akin to an insertion sort.  It's more efficient to do one sort at the
end.

We could perhaps use std::set, but there is also the case in
fill_in_sig_entry_from_dwo_file, when the section of a unit becomes
known at a later time.  At this point, the key for the unit would
change, so we would still need to remove it and reinsert it in the set.
I don't see a way to enforce that automatically.

> Anyway I think this is ok.
> Approved-By: Tom Tromey <tom@tromey.com>

Thanks, I'll push it and work on a patch to add the assert.

Simon
  

Patch

diff --git a/gdb/dwarf2/read-debug-names.c b/gdb/dwarf2/read-debug-names.c
index a062c1ad2562..30ad0b21f2da 100644
--- a/gdb/dwarf2/read-debug-names.c
+++ b/gdb/dwarf2/read-debug-names.c
@@ -1075,7 +1075,15 @@  dwarf2_read_debug_names (dwarf2_per_objfile *per_objfile)
 	}
     }
 
+  /* Populate the `all_units` vector.  */
   create_all_units (per_objfile);
+  create_foreign_type_units_from_debug_names (per_objfile->per_bfd, map);
+
+  /* Sort the `all_units` vector.  This must be done before the
+     `build_and_check_*` calls below, since they do binary search lookups in
+     that vector.  */
+  finalize_all_units (per_objfile->per_bfd);
+
   if (!build_and_check_cu_lists_from_debug_names (per_bfd, map, dwz_map))
     return false;
 
@@ -1097,12 +1105,6 @@  dwarf2_read_debug_names (dwarf2_per_objfile *per_objfile)
 	return false;
     }
 
-  create_foreign_type_units_from_debug_names (per_objfile->per_bfd, map);
-
-  /* create_foreign_type_units_from_debug_names may add more entries to the
-     ALL_UNITS vector, so it must be called before finalize_all_units.  */
-  finalize_all_units (per_objfile->per_bfd);
-
   per_bfd->debug_aranges.read (per_objfile->objfile);
 
   /* There is a single address map for the whole index (coming from