[v2,00/13,gdb/symtab,cc-with-dwz] Fix gdb.cp/breakpoint-locs.exp

Message ID 20231212173239.16793-1-tdevries@suse.de
Headers
Series Fix gdb.cp/breakpoint-locs.exp |

Message

Tom de Vries Dec. 12, 2023, 5:32 p.m. UTC
  I. INTRODUCTION

With target board unix, test-case gdb.cp/breakpoint-locs.exp passes reliably.

But when running with target board cc-with-dwz, we sometimes have 2 breakpoint
locations:
...
(gdb) break N1::C1::baz^M
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)^M
(gdb) PASS: gdb.cp/breakpoint-locs.exp: break N1::C1::baz
...
and sometimes only 1:
...
(gdb) break N1::C1::baz^M
Breakpoint 1 at 0x4004db: file breakpoint-locs.h, line 23.^M
(gdb) FAIL: gdb.cp/breakpoint-locs.exp: break N1::C1::baz
...

By using "maint set worker-threads 0" we can make the test-case fail reliably:
...
$ gdb -q -batch \
  -iex "maint set worker-threads 0" \
  outputs/gdb.cp/breakpoint-locs/breakpoint-locs \
  -ex "break N1::C1::baz"
...

And by adding -readnow, we can make it pass reliably.

This is PR symtab/30728, a gdb 12 -> 13 regression.  This series fixes it.

II. ANALYSIS

In order for the breakpoint to have two breakpoint locations, the symtabs for
both CUs breakpoint-locs.cc and breakpoint-locs-2.cc need to be expanded
(which explains why adding -readnow fixes it).  Let's call these CU1 and CU2.

CU1 is always expanded during the call:
...
cp_canonicalize_string_no_typedefs (string=string@entry=0x2fbbf90 "N1::C1::baz")
...
due to being the first CU to contain N1.

Then CU2 is expanded or not, depending on which per_cu (CU1 or CU2) this
cooked_index entry refers to (not shown in the dump atm):
...
   [12] ((cooked_index_entry *) 0x7f1200002f00)
    name:       baz
    canonical:  baz
    qualified:  N1::C1::baz
    DWARF tag:  DW_TAG_subprogram
    flags:      0x0 []
    DIE offset: 0x2e
    parent:     ((cooked_index_entry *) 0x7f1200002ed0) [C1]
...

The DIE offset 0x2e corresponds to this DIE in a PU, imported both from CU1
and CU2:
...
 <0><b>: Abbrev Number: 34 (DW_TAG_partial_unit)
 <1><14>: Abbrev Number: 37 (DW_TAG_namespace)
    <15>   DW_AT_name        : N1
 <2><19>: Abbrev Number: 36 (DW_TAG_class_type)
    <1a>   DW_AT_name        : C1
 <3><20>: Abbrev Number: 32 (DW_TAG_subprogram)
    <21>   DW_AT_external    : 1
    <21>   DW_AT_name        : baz
    <27>   DW_AT_linkage_name: _ZN2N12C13bazEv
    <2b>   DW_AT_accessibility: 1       (public)
    <2c>   DW_AT_declaration : 1
 <1><2e>: Abbrev Number: 35 (DW_TAG_subprogram)
    <2f>   DW_AT_specification: <0x20>
    <30>   DW_AT_inline      : 3        (declared as inline and inlined)
...

[ In the target board unix case, there's no PU and both CU1 and CU2 contain a
copy of 0x2e, and consequently both CUs are expanded. ]

Which per_cu the cooked_index entry refers to is decided by this import race in
cooked_indexer::ensure_cu_exists:
...
  /* When scanning, we only want to visit a given CU a single time.
     Doing this check here avoids self-imports as well.  */
  if (for_scanning)
    {
      bool nope = false;
      if (!per_cu->scanned.compare_exchange_strong (nope, true))
        return nullptr;
    }
...
[ Note that in the "maint set worker-threads 0" case, CU1 is guaranteed to win
the import race. ]

With gdb 12 (the partial symtab case) there was a similar problem, and we
relied on the DW_TAG_inlined_subroutine DIEs for the necessary expansion, see
commit f9b5d5ea18a ("[gdb/symtab] Fix missing breakpoint location for inlined
function").

The CU1 DW_TAG_inlined_subroutine DIE is at 0x148:
...
 <0><ee>: Abbrev Number: 16 (DW_TAG_compile_unit)
    <f3>   DW_AT_language    : 4        (C++)
    <f4>   DW_AT_name        : gdb.cp/breakpoint-locs.cc
 <1><13b>: Abbrev Number: 25 (DW_TAG_subprogram)
    <13c>   DW_AT_specification: <0x115>
    <13d>   DW_AT_low_pc      : 0x4004d7
    <145>   DW_AT_high_pc     : 16
    <146>   DW_AT_frame_base  : 1 byte block: 9c        (DW_OP_call_frame_cfa)
    <148>   DW_AT_GNU_all_call_sites: 1
 <2><148>: Abbrev Number: 24 (DW_TAG_inlined_subroutine)
    <149>   DW_AT_abstract_origin: <0x2e>
    <14d>   DW_AT_low_pc      : 0x4004db
    <155>   DW_AT_high_pc     : 9
    <156>   DW_AT_call_file   : 1
    <157>   DW_AT_call_line   : 22
...
and for CU2 there's a similar one at 0x1cf.

There are no corresponding DW_TAG_inlined_subroutine cooked_index entries,
because the DIEs are skipped during indexing.

So, we can fix this by adding DW_TAG_inlined_subroutine cooked_index entries.

III. FIX

It's easy to stop skipping the DW_TAG_inlined_subroutine cooked_index entries
during indexing.  However, that gives us cooked_index entries with incorrect
parents:
...
    [17] ((cooked_index_entry *) 0x2e68e40)
    name:       baz
    canonical:  baz
    qualified:  N1::foo::baz
    DWARF tag:  DW_TAG_inlined_subroutine
    flags:      0x0 []
    DIE offset: 0x148
    parent:     ((cooked_index_entry *) 0x2e68de0) [foo]

    [19] ((cooked_index_entry *) 0x2e68f90)
    name:       baz
    canonical:  baz
    qualified:  N1::bar::baz
    DWARF tag:  DW_TAG_inlined_subroutine
    flags:      0x0 []
    DIE offset: 0x1cf
    parent:     ((cooked_index_entry *) 0x2e68f30) [bar]
...

That is, the qualified name should be N1::C1::baz in both cases.  The C1 can
be found via the DW_AT_abstract_origin, but instead the DIE parents N1::foo
and N1::bar are used.

Again, we can fix this easily, by forcing to use DW_AT_abstract_origin, but
then we have a missing parent for the DIE at 0x1cf:
...
    [17] ((cooked_index_entry *) 0x4854140)
    name:       baz
    canonical:  baz
    qualified:  N1::C1::baz
    DWARF tag:  DW_TAG_inlined_subroutine
    flags:      0x0 []
    DIE offset: 0x148
    parent:     ((cooked_index_entry *) 0x4853f60) [C1]

    [19] ((cooked_index_entry *) 0x4854290)
    name:       baz
    canonical:  baz
    qualified:  baz
    DWARF tag:  DW_TAG_inlined_subroutine
    flags:      0x0 []
    DIE offset: 0x1cf
    parent:     ((cooked_index_entry *) 0)
...

The data structures that matter for tracking parent relations in cooked_index
entries are:
- m_die_range_map (currently known parent relations), and
- m_deferred_entries (deferred cooked_index entries due to unknown parent).

These only have the scope of a parsing a single CU (including PUs for which
the CU won the import race).

Since CU1 wins the import race, it has the information available to get the right
parent.

But that's not the case for CU2.

So in order to get the corrent parent for DIE 0x1cf, tracking parent relations
in the cooked index need to done at the inter-CU scope, implying also
inter-shard scope.

I've filed PR symtab/30846 "[gdb/symtab] incorrect parent for forward spec
(inter-cu case)" for part of the problem, and the series includes two
test-cases that specifically check for this type of problem:
- gdb.dwarf2/backward-spec-inter-cu.exp
- gdb.dwarf2/forward-spec-inter-cu.exp

IV. ALTERNATIVE APPROACH

An alternative way of approaching the problem is to observe that for the
target board unix case we have two cooked_index entries with per_cu CU1 and
CU2:
...
    [18] ((cooked_index_entry *) 0x3631250)
    name:       baz
    canonical:  baz
    qualified:  N1::C1::baz
    DWARF tag:  DW_TAG_subprogram
    flags:      0x0 []
    DIE offset: 0x173
    parent:     ((cooked_index_entry *) 0x3631130) [C1]

[20] ((cooked_index_entry *) 0x3631490)
    name:       baz
    canonical:  baz
    qualified:  N1::C1::baz
    DWARF tag:  DW_TAG_subprogram
    flags:      0x0 []
    DIE offset: 0x25a
    parent:     ((cooked_index_entry *) 0x3631370) [C1]
...
but for the dwz case we have a only a single entry:
...
   [12] ((cooked_index_entry *) 0x7f1200002f00)
    name:       baz
    canonical:  baz
    qualified:  N1::C1::baz
    DWARF tag:  DW_TAG_subprogram
    flags:      0x0 []
    DIE offset: 0x2e
    parent:     ((cooked_index_entry *) 0x7f1200002ed0) [C1]
...
with per_cu either CU1 or CU2.

If in the dwz case we'd have two of those, one with per_cu CU1 and one with
per_cu CU2, then the problem would also be solved.

Put differently, in the target board unix case we have two cooked index
entries, one with per_cu CU1, and one with per_cu CU2 because that's a
one-on-one reflection of what the dwarf looks like.

But in the dwz case, factoring out the DIE into a PU and importing it from
from both CUs does not alter the logical contents of the dwarf, so it would
make complete sense to have two entries in this case as well.

Of course, we'd want to exploit the dwz compression also inside gdb, and
consequently some scheme of a single cooked_index entry with the per_cu being
the actual PU and then mapping it to the complete set of CU importers would be
more optimal.  [ Note that in such a scheme, we'd have to be able to
differentiate between expand-one-CU and expand-all-CUs to prevent
expanding all importing CUs in the expand-one-CU case. ]

But the current internal representation is in fact corresponding to dwz dropping
the import for all but one CU (well, for the cooked index, not for the
expanded symtabs).

I think that the applied rationale is that if you're looking for a type in
the symtabs, the first one will do, so it doesn't make sense to keep track of
more than one.  This however does not work if you're looking for all the
occurrences of a type in the symtabs, as is the case here.

I suppose it could be a matter of taste whether the current internal
representation is:
- a smart solution, or
- incorrect representation of the dwarf.

The fact is that:
- if we choose the former, then correct dwz handling requires extra care (as
  set out in FIX), and
- if we choose the latter and fix it, then the dwz and non-dwz case are by
  design handled in the same way.

I'm not pursuing this alternative approach in this patch series for now, for
the reasons that:
- this solution was not chosen for gdb 12 either (though the patch went in
  without comments),
- because I'm hoping that it's a bit easier and safer to backport to gdb 13,
  and
- I'm not sure if other maintainers are supportive of this approach.

V. PATCH SERIES STRUCTURE

The series consists of several parts.

- patches that do some refactoring:
  - [gdb/symtab] Refactor condition in scan_attributes
  - [gdb/symtab] Factor out m_die_range_map usage
  - [gdb/symtab] Handle nullptr parent in parent_map::set_parent
  - [gdb/symtab] Factor out m_deferred_entries usage
- patches that fix resolving deferred entries and add corresponding
  test-cases:
  - [gdb/symtab] Resolve deferred entries, inter-shard case
  - [gdb/testsuite] Add gdb.dwarf2/forward-spec-inter-cu.exp
  - [gdb/testsuite] Add gdb.dwarf2/backward-spec-inter-cu.exp
- patches that add two optimizations to the parent calculation
  (one preparation patch, and two optimization patches):
  - [gdb/symtab] Keep track of processed DIEs in shard
  - [gdb/symtab] Resolve deferred entries, intra-shard case
  - [gdb/symtab] Don't defer backward refs, inter-cu intra-shard case
- patches that fix PR symtab/30728 (logically one patch, but split up to be
  able to assess performance impact more precisely):
  - [gdb/symtab] Recurse into c++ DW_TAG_subprogram DIEs for cooked index
  - [gdb/symtab] Keep track of all parents for cooked index
  - [gdb/symtab] Fix DW_TAG_inlined_subroutine entries in the cooked index

The fix for PR symtab/30728 consists of adding entries to the cooked index for
DW_TAG_inlined_subroutine DIEs, with correct parent entries.

Tested on x86_64-linux, with target boards unix, cc-with-dwz and
cc-with-dwz-m.

I've split out the two added test-cases into separate patches, for better
patch readability.  Unfortunately the patch "[gdb/symtab] Resolve deferred
entries, inter-shard case" is still somewhat large due to moving code about.

VI. PERFORMANCE

I've also done a performance experiment.  The command being run is:
...
$ gdb -q -batch /usr/lib/debug/usr/bin/gdb-12.1-150400.15.9.1.x86_64.debug
...
The .debug file doesn't contain any partial units, so this excercises the
non-dwz scenario.  The command is run 10 times, and the mean value is used.
I also added a 5 second sleep before each run to prevent thermal throttling
from messing up the measurements.

Furthermore, the processor on which I run the experiment is an i7-1250U, with
4 performance cores and 8 efficiency cores, so I run the experiment using
"taskset -c 0-3" to make sure only performance cores are used.

The script I use compares one gdb version to another, so the first column is
100% by definition, using the same base version, though a different run each time.

This is comparing the base version with itself, which gives an idea about
accuracy:
...
real: mean: 687.00    (100%) mean: 691.00    (100.58%)
user: mean: 2196.00   (100%) mean: 2196.00   (100.00%)
sys : mean: 108.00    (100%) mean: 106.00     (98.15%)
mem : mean: 345220.40 (100%) mean: 346112.40 (100.26%)
...

The mem measure is the "Maximum resident set size of the process during its
lifetime, in Kbytes".  The real/user/sys times are in miliseconds.

And this is comparing the patch series with the base version:
...
real: mean: 687.00    (100%) mean: 931.00    (135.52%)
user: mean: 2204.00   (100%) mean: 2938.00   (133.30%)
sys : mean: 102.00    (100%) mean: 137.00    (134.31%)
mem : mean: 345479.60 (100%) mean: 418396.00 (121.11%)
...

In summary, the overall result is ~36% more real time and ~21% more memory.

The cumulative results of individual patches (leaving out the test-case
patches) are:
...
[gdb/symtab] Refactor condition in scan_attributes

real: mean: 690.00    (100%) mean: 691.00   (100.14%)
user: mean: 2201.00   (100%) mean: 2218.00  (100.77%)
sys : mean: 102.00    (100%) mean: 93.00     (91.18%)
mem : mean: 344846.80 (100%) mean: 344196.40 (99.81%)

[gdb/symtab] Factor out m_die_range_map usage

real: mean: 689.00    (100%) mean: 696.00    (101.02%)
user: mean: 2214.00   (100%) mean: 2212.00    (99.91%)
sys : mean: 98.00     (100%) mean: 91.00      (92.86%)
mem : mean: 345784.00 (100%) mean: 346644.00 (100.25%)

[gdb/symtab] Handle nullptr parent in parent_map::set_parent

real: mean: 688.00    (100%) mean: 690.00    (100.29%)
user: mean: 2215.00   (100%) mean: 2223.00   (100.36%)
sys : mean: 99.00     (100%) mean: 94.00      (94.95%)
mem : mean: 344220.40 (100%) mean: 345640.80 (100.41%)

[gdb/symtab] Factor out m_deferred_entries usage

real: mean: 693.00    (100%) mean: 689.00    (99.42%)
user: mean: 2191.00   (100%) mean: 2201.00  (100.46%)
sys : mean: 103.00    (100%) mean: 101.00    (98.06%)
mem : mean: 346022.00 (100%) mean: 345167.60 (99.75%)

[gdb/symtab] Resolve deferred entries, inter-shard case

real: mean: 688.00    (100%) mean: 711.00    (103.34%)
user: mean: 2204.00   (100%) mean: 2265.00   (102.77%)
sys : mean: 103.00    (100%) mean: 119.00    (115.53%)
mem : mean: 344649.60 (100%) mean: 375080.00 (108.83%)

[gdb/symtab] Keep track of processed DIEs in shard

real: mean: 684.00    (100%) mean: 728.00    (106.43%)
user: mean: 2208.00   (100%) mean: 2301.00   (104.21%)
sys : mean: 96.00     (100%) mean: 124.00    (129.17%)
mem : mean: 344005.60 (100%) mean: 374647.20 (108.91%)

[gdb/symtab] Resolve deferred entries, intra-shard case

real: mean: 693.00    (100%) mean: 724.00    (104.47%)
user: mean: 2213.00   (100%) mean: 2301.00   (103.98%)
sys : mean: 103.00    (100%) mean: 114.00    (110.68%)
mem : mean: 344722.80 (100%) mean: 375021.20 (108.79%)

[gdb/symtab] Don't defer backward refs, inter-cu intra-shard case

real: mean: 686.00    (100%) mean: 725.00    (105.69%)
user: mean: 2199.00   (100%) mean: 2299.00   (104.55%)
sys : mean: 106.00    (100%) mean: 119.00    (112.26%)
mem : mean: 344597.20 (100%) mean: 374937.20 (108.80%)

[gdb/symtab] Recurse into c++ DW_TAG_subprogram DIEs for cooked index

real: mean: 690.00    (100%) mean: 835.00    (121.01%)
user: mean: 2186.00   (100%) mean: 2693.00   (123.19%)
sys : mean: 106.00    (100%) mean: 130.00    (122.64%)
mem : mean: 345059.60 (100%) mean: 409780.00 (118.76%)

[gdb/symtab] Keep track of all parents for cooked index

real: mean: 684.00    (100%) mean: 903.00    (132.02%)
user: mean: 2240.00   (100%) mean: 2921.00   (130.40%)
sys : mean: 93.00     (100%) mean: 141.00    (151.61%)
mem : mean: 346353.20 (100%) mean: 411096.00 (118.69%)

[gdb/symtab] Fix DW_TAG_inlined_subroutine entries in the cooked index

real: mean: 690.00    (100%) mean: 930.00    (134.78%)
user: mean: 2227.00   (100%) mean: 2960.00   (132.91%)
sys : mean: 96.00     (100%) mean: 138.00    (143.75%)
mem : mean: 343766.00 (100%) mean: 417973.60 (121.59%)
...

Ideally, the impact of a patch series implementing dwz support on a non-dwz
test-case is none.

But fixing dwz support requires tracking more data, and there's no way of
knowing in advance whether the additional data will be used or not.

Of course this can be accommodated by optimistically assuming that the data is
unnecessary, and when it turns out it was actually needed, partially or
completely restart indexing.  My suspicion is that this approach itself is
going to be complex, so I think it's best avoided.

An optimization that can be added on top of the current approach is to
opportunistically use data from other shards if already available.

An extension of this would be to wait until data from another shard is
available, which would void the need to handle inter-shard dependencies after
the parallel_for_each, but I think there's a risk for deadlock there.  The
imports as generated by dwz are a fairly predictable pattern, but there are
other producers of imports, for instance GCC LTO.

VII. TAGS

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30728
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30846

VIII. SUBMISSION HISTORY

v2:
- added refactoring patch at the start
- dropped:
  - [gdb/symtab] Check effect in parent_map::set_parent
    (review comments, concern about speed)
  - [gdb/symtab] Add debug_handle_deferred_entries
    (doesn't produce readable output for worker-threads > 0)
  - [gdb/symtab] Add parent_map::dump
    (only used by dropped patch)
- fixed performance regression in "[gdb/symtab] Keep track of processed DIEs
  in shard"
- split up last patch of v1 into 3 patches to be able to assess performance
  impact more precisely
- use std::unique_ptr for m_die_range_map, m_deferred_entries and
  m_die_range_map_valid
- rewrote the last patch to work for any DIE, not just
  DW_TAG_inlined_subroutine DIEs.
- dropped the why narrative from the individual log messages, which proved
  difficult to maintain, focusing on the what instead, leaving the why for the
  cover letter.

v1:
- https://sourceware.org/pipermail/gdb-patches/2023-October/202882.html

rfc:
- https://sourceware.org/pipermail/gdb-patches/2023-September/202443.html.

Tom de Vries (13):
  [gdb/symtab] Refactor condition in scan_attributes
  [gdb/symtab] Factor out m_die_range_map usage
  [gdb/symtab] Handle nullptr parent in parent_map::set_parent
  [gdb/symtab] Factor out m_deferred_entries usage
  [gdb/symtab] Resolve deferred entries, inter-shard case
  [gdb/testsuite] Add gdb.dwarf2/forward-spec-inter-cu.exp
  [gdb/testsuite] Add gdb.dwarf2/backward-spec-inter-cu.exp
  [gdb/symtab] Keep track of processed DIEs in shard
  [gdb/symtab] Resolve deferred entries, intra-shard case
  [gdb/symtab] Don't defer backward refs, inter-cu intra-shard case
  [gdb/symtab] Recurse into c++ DW_TAG_subprogram DIEs for cooked index
  [gdb/symtab] Keep track of all parents for cooked index
  [gdb/symtab] Fix DW_TAG_inlined_subroutine entries in the cooked index

 gdb/dwarf2/cooked-index.c                     | 113 +++++++++
 gdb/dwarf2/cooked-index.h                     | 157 ++++++++++++-
 gdb/dwarf2/read.c                             | 222 +++++++++++++-----
 .../gdb.dwarf2/backward-spec-inter-cu.exp     | 103 ++++++++
 .../gdb.dwarf2/forward-spec-inter-cu.exp      | 103 ++++++++
 5 files changed, 642 insertions(+), 56 deletions(-)
 create mode 100644 gdb/testsuite/gdb.dwarf2/backward-spec-inter-cu.exp
 create mode 100644 gdb/testsuite/gdb.dwarf2/forward-spec-inter-cu.exp


base-commit: 14f2724f80b156928b1a0b0f9733350558e35e63
  

Comments

Tom Tromey Dec. 12, 2023, 7:05 p.m. UTC | #1
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

[...]

Thank you for this series and the write-up.
You've done a great job on this.

Tom> IV. ALTERNATIVE APPROACH

Tom> I suppose it could be a matter of taste whether the current internal
Tom> representation is:
Tom> - a smart solution, or
Tom> - incorrect representation of the dwarf.

I'm not sure I totally understood this section, but the basic idea
behind this code -- and I think these decisions predate the cooked-index
rewrite -- was to try to exploit dwz compression by also using less
memory inside gdb.

Technically, I think, in DWARF an import can appear at any level and gdb
should probably just be re-reading a bunch of DIEs over and over.

However, this seemed super expensive back when I wrote the dwz support,
and since dwz is the only known compressor and it doesn't do some of the
weird stuff, it looked better to try to implement this optimization.

Shrug, maybe that was a mistake.  If every import caused a re-scan of
the PU, it would mean that only dwz users would pay for this feature.
Just maybe the price would be very high.

Tom> - I'm not sure if other maintainers are supportive of this approach.

I probably just don't understand.

Tom> And this is comparing the patch series with the base version:
Tom> ...
Tom> real: mean: 687.00    (100%) mean: 931.00    (135.52%)
Tom> user: mean: 2204.00   (100%) mean: 2938.00   (133.30%)
Tom> sys : mean: 102.00    (100%) mean: 137.00    (134.31%)
Tom> mem : mean: 345479.60 (100%) mean: 418396.00 (121.11%)
Tom> ...

Tom> In summary, the overall result is ~36% more real time and ~21% more memory.

Pretty heavy.  However I was curious about something:

Tom> The cumulative results of individual patches (leaving out the test-case
Tom> patches) are:

Tom> [gdb/symtab] Recurse into c++ DW_TAG_subprogram DIEs for cooked index

Tom> real: mean: 690.00    (100%) mean: 835.00    (121.01%)
Tom> user: mean: 2186.00   (100%) mean: 2693.00   (123.19%)
Tom> sys : mean: 106.00    (100%) mean: 130.00    (122.64%)
Tom> mem : mean: 345059.60 (100%) mean: 409780.00 (118.76%)

... the big performance jump appears here -- but is it really needed?

IIUC this is done because we want to detect inlined functions in C++.
However, in theory those should also be emitted at the top level with
DW_AT_inline, with a value of either DW_INL_inlined or
DW_INL_declared_inlined.

Would having entries for these be enough to provoke the desired CU
expansion?  If so it may be just a matter of marking some more abbrevs
as "interesting".

Actually, for the simple test case I tried, I already see it in the
index.  So I wonder what's going on there.

Anyway if we could drop this patch then it seems like the overall cost
would be alright.

Tom> Ideally, the impact of a patch series implementing dwz support on a non-dwz
Tom> test-case is none.

Tom> But fixing dwz support requires tracking more data, and there's no way of
Tom> knowing in advance whether the additional data will be used or not.

Yeah.  I loathe this part of DWARF but I have come around to accept that
we just have to handle it.

Tom> Of course this can be accommodated by optimistically assuming that the data is
Tom> unnecessary, and when it turns out it was actually needed, partially or
Tom> completely restart indexing.  My suspicion is that this approach itself is
Tom> going to be complex, so I think it's best avoided.

I agree.

Tom
  
Tom de Vries Dec. 13, 2023, 9:58 a.m. UTC | #2
On 12/12/23 20:05, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
> 
> [...]
> 
> Thank you for this series and the write-up.
> You've done a great job on this.
> 
> Tom> IV. ALTERNATIVE APPROACH
> 
> Tom> I suppose it could be a matter of taste whether the current internal
> Tom> representation is:
> Tom> - a smart solution, or
> Tom> - incorrect representation of the dwarf.
> 
> I'm not sure I totally understood this section, but the basic idea
> behind this code -- and I think these decisions predate the cooked-index
> rewrite -- was to try to exploit dwz compression by also using less
> memory inside gdb.
> 

Makes sense to me.

> Technically, I think, in DWARF an import can appear at any level

That's also my understanding.

> and gdb
> should probably just be re-reading a bunch of DIEs over and over.

Yes, though we could make a distinction between top-level imports (as 
produced by dwz and gcc lto) and otherwise, and handle the top-level 
imports efficiently.

> However, this seemed super expensive back when I wrote the dwz support,
> and since dwz is the only known compressor and it doesn't do some of the
> weird stuff, it looked better to try to implement this optimization.
> 
> Shrug, maybe that was a mistake.  If every import caused a re-scan of
> the PU, it would mean that only dwz users would pay for this feature.
> Just maybe the price would be very high.
> 

The solution I tried to propose here is a middle ground:
- it fixes the PR we're trying to fix in this series, and
- it exploits the dwz compression inside gdb, to avoid the high price
   you mention.

The basic idea is:
- let the first import trigger a read of a PU,
- in the cooked_index entries generated for the PU, set the per_cu to
   the PU (instead of to the importing CU as we do now),
- keep track of importers, allowing to list all importers CUs of a given
   PU,
- when doing expansion for a cooked_index entry, if the per_cu is a PU,
   expand all importer CUs.

There are two notes here:
- there is the risk that this gets us too many expanded symtabs,
   so when using the cooked index we should know whether we're interested
   in expanding all matching symtabs or just one.
- imports can happen from CUs with different languages, so we'll have to
   handle that somehow.  I'm not sure btw whether we're getting that
   correct either atm.

> Tom> - I'm not sure if other maintainers are supportive of this approach.
> 
> I probably just don't understand.
> 

Ok, just let me know if there's anything I can do to clarify things.

> Tom> And this is comparing the patch series with the base version:
> Tom> ...
> Tom> real: mean: 687.00    (100%) mean: 931.00    (135.52%)
> Tom> user: mean: 2204.00   (100%) mean: 2938.00   (133.30%)
> Tom> sys : mean: 102.00    (100%) mean: 137.00    (134.31%)
> Tom> mem : mean: 345479.60 (100%) mean: 418396.00 (121.11%)
> Tom> ...
> 
> Tom> In summary, the overall result is ~36% more real time and ~21% more memory.
> 
> Pretty heavy.  However I was curious about something:
> 
> Tom> The cumulative results of individual patches (leaving out the test-case
> Tom> patches) are:
> 
> Tom> [gdb/symtab] Recurse into c++ DW_TAG_subprogram DIEs for cooked index
> 
> Tom> real: mean: 690.00    (100%) mean: 835.00    (121.01%)
> Tom> user: mean: 2186.00   (100%) mean: 2693.00   (123.19%)
> Tom> sys : mean: 106.00    (100%) mean: 130.00    (122.64%)
> Tom> mem : mean: 345059.60 (100%) mean: 409780.00 (118.76%)
> 
> ... the big performance jump appears here -- but is it really needed?
> 
> IIUC this is done because we want to detect inlined functions in C++.
> However, in theory those should also be emitted at the top level with
> DW_AT_inline, with a value of either DW_INL_inlined or
> DW_INL_declared_inlined.
> 

In the gdb.cp/breakpoint-locs.exp cc-with-dwz case we have:
...
  <1><14>: Abbrev Number: 34 (DW_TAG_namespace)
     <15>   DW_AT_name        : N1
     <18>   DW_AT_sibling     : <0x2e>
  <2><19>: Abbrev Number: 32 (DW_TAG_class_type)
     <1a>   DW_AT_name        : C1
     <1d>   DW_AT_byte_size   : 1
     <1e>   DW_AT_decl_file   : 2
     <1f>   DW_AT_decl_line   : 20
  <3><20>: Abbrev Number: 35 (DW_TAG_subprogram)
     <21>   DW_AT_external    : 1
     <21>   DW_AT_name        : baz
     <25>   DW_AT_decl_file   : 2
     <26>   DW_AT_decl_line   : 23
     <27>   DW_AT_linkage_name: (indirect string, offset: 0x1d3): 
_ZN2N12C13bazEv
     <2b>   DW_AT_accessibility: 1       (public)
     <2c>   DW_AT_declaration : 1
  <3><2c>: Abbrev Number: 0
  <2><2d>: Abbrev Number: 0
  <1><2e>: Abbrev Number: 37 (DW_TAG_subprogram)
     <2f>   DW_AT_specification: <0x20>
     <30>   DW_AT_inline      : 3        (declared as inline and inlined)
     <31>   DW_AT_sibling     : <0x39>
...
and I think what you're describing matches most closely DIE 0x2e (though 
I haven't found DW_INL_inlined or DW_INL_declared_inlined in the dwarf 
produced for the test-case, also not with gcc 12 and -gdwarf-5).

> Would having entries for these be enough to provoke the desired CU
> expansion?  If so it may be just a matter of marking some more abbrevs
> as "interesting".
> 

The problem is that the DIE 0x2e has been moved to a PU, and as 
consequence in the current implementation will cause expansion of only a 
single CU.

> Actually, for the simple test case I tried, I already see it in the
> index.  So I wonder what's going on there.
> 

I wonder if you can reproduce the FAIL I'm seeing? [ Btw, it's not 100% 
reproducible, so I usually run it in a $(seq 1 100) loop. ]

> Anyway if we could drop this patch then it seems like the overall cost
> would be alright.
> 
> Tom> Ideally, the impact of a patch series implementing dwz support on a non-dwz
> Tom> test-case is none.
> 
> Tom> But fixing dwz support requires tracking more data, and there's no way of
> Tom> knowing in advance whether the additional data will be used or not.
> 
> Yeah.  I loathe this part of DWARF but I have come around to accept that
> we just have to handle it.
> 
> Tom> Of course this can be accommodated by optimistically assuming that the data is
> Tom> unnecessary, and when it turns out it was actually needed, partially or
> Tom> completely restart indexing.  My suspicion is that this approach itself is
> Tom> going to be complex, so I think it's best avoided.
> 
> I agree.

Thanks for the review.

- Tom
  
Tom Tromey Dec. 13, 2023, 8:14 p.m. UTC | #3
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

>> However, in theory those should also be emitted at the top level with
>> DW_AT_inline, with a value of either DW_INL_inlined or
>> DW_INL_declared_inlined.

Tom> In the gdb.cp/breakpoint-locs.exp cc-with-dwz case we have:

Tom>     <30>   DW_AT_inline      : 3        (declared as inline and inlined)

Tom> and I think what you're describing matches most closely DIE 0x2e
Tom> (though I haven't found DW_INL_inlined or DW_INL_declared_inlined in
Tom> the dwarf produced for the test-case, also not with gcc 12 and
Tom> -gdwarf-5).

readelf / objdump don't print the symbolic name here, but that's
actually what you're seeing.  From dwarf2.h:

    /* Inline attribute.  */
    enum dwarf_inline_attribute
      {
        DW_INL_not_inlined = 0,
        DW_INL_inlined = 1,
        DW_INL_declared_not_inlined = 2,
        DW_INL_declared_inlined = 3
      };

>> Would having entries for these be enough to provoke the desired CU
>> expansion?  If so it may be just a matter of marking some more abbrevs
>> as "interesting".

Tom> The problem is that the DIE 0x2e has been moved to a PU, and as
Tom> consequence in the current implementation will cause expansion of only
Tom> a single CU.

Ok, I see.  Won't this fail with .gdb_index as well then?  Because IIRC
that attributes symbols from a PU to the canonical includer.

Anyway it seems better to me to record inclusions more precisely and
then expand more often (depending on the lookup) than to spend a lot of
time recursing into C++ subroutines.

Tom