[v2] gdb/linespec: Improve sliding breakpoints

Message ID 20240910145514.29402-1-list+gdb@vahedi.org
State New
Headers
Series [v2] gdb/linespec: Improve sliding breakpoints |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm fail Test failed

Commit Message

Shahab Vahedi Sept. 10, 2024, 2:55 p.m. UTC
  From: Lancelot SIX <lancelot.six@amd.com>

The algorithm used to place a breakpoint by line approximately goes as
follow:

- Look for all the PCs which maps directly to the desired line
- If any PC is found, place a breakpoint at each found PC.
- If no PC maps to the desired line:
  - Search for the lowest line number in the line table which is higher
    than the desired line, and has some object code associated to it.
  - List all PCs which map to this new line number.
  - Place a breakpoint at each PC in the list above.

There are situations where this heuristic is not sufficient.  If we
consider the following c++ code:

    12 template<typename T>
    13 T func (T a, T b)
    14 {
    15   /* Break here.  */
    16   if constexpr (std::is_integral_t<T>)
    17     return a;
    18   else
    19     return b;
    20 }

Given the current algorithm, if the user tries to set a breakpoint at
line 15, here is what happens:

- Search PCs mapping to L15, there is none ;
- Search for the first line number above L15 with code, this is L17 ;
- List all PCs mapping to L17 and a breakpoints at insert a breakpoint at
  each ;

Now lets suppose that this templated function has been instantiated both
for an integral type (int) and non-integral type (float).  If we follow
the algorithm, no breakpoint will ever be inserted in func<float>,
while there will be a breakpoint in func<int>.

I came across a less obvious variation of this case where an inlined
function had different optimization opportunities for different instances.
As a consequence, some instances had the first actual line of code
different than the other, making the breakpoint be missed and the user
confused.

This patch is a proposition to adapt the algorithm in order to cover the
scenario exposed above.  To do so, this patch ends up scanning
independently for the best candidate in the various blocks composing
the line table structures (a block being a succession of
linetable_entries terminated by an entry with line == 0).

The line table for the example program above looks something like:

    (gdb) maintenance info line-table missed
    symtab: /[...]/missed_pb.cc ((struct symtab *) 0x6210000f0500)
    linetable: ((struct linetable *) 0x6210001a1220):
    INDEX  LINE   ADDRESS            IS-STMT PROLOGUE-END
    ...
    17     END    0x00000000000011d3 Y
    18     13     0x00000000000011d3 Y                    <-- func<float>
    19     19     0x00000000000011e5 Y
    20     20     0x00000000000011ea Y
    21     END    0x00000000000011ec Y
    22     13     0x00000000000011ec Y                    <-- func<int>
    23     17     0x00000000000011fa Y
    24     20     0x00000000000011fd Y
    25     END    0x00000000000011ff Y

The first block listed (starting at 0x11d3) is func<float>, and the
second starting at 0x11ec is func<int>.

The proposed algorithm analyzes each block independently.  The first step
is to decide if a block contains the line the user is interested in (a.k.a
the target).  In our example, this is L15.  A block is considered to match
if it contains at least one instruction mapped to a line before the target
and at least one instruction mapped to a line after the target.  In other
words this means that will select all blocks where:

- there is at least one instruction mapped to a line L such as L <= target
- there is at least one instruction mapped to a line M such as M >= target

For each block which satisfies this condition, we select the lowest line
number above our target, and select all instructions mapping to this line
number (there might be multiple).  In our example the func<float> block
satisfies the condition (we have line 13 and 19), so we insert breakpoints
at the PCs mapped to L19.  Similarly, the func<int> block is selected
(thanks to L13 and L17), and we insert breakpoints at all PCs mapped to
L17.

The core of this heuristic is implemented in find_best_le_for_symtab_line.

There are however some cases which are still problematic, where the
first line after the target is the first line of the block we are
looking for.  In such situation the algorithm would find nothing where
we would want to select 0x11d3 and 0x11ec (both mapping to L13).  In
order to solve this, the algorithm first looks for the lowest line
number following the user specified target as this is done before this
patch (here the first line with associated code following L12 which is
L13).  Once this is figured out, this new target (L13) is passed to the
algorithm above (find_best_le_for_symtab_line).

This is safe to do because we know that there is no line between the user
target (included) and this new target.  Therefore, whatever
find_best_le_for_symtab_line returns, it must map to lines numbers above
(inclusive) the new target.

Finally, the testsuite showed an interesting regression.  Let's suppose
we have:

    1 struct Foo { Foo () {} };
    2 Foo f1;
    3 void bar ()
    4 {
    5   /* break here.  */
    6   do_something ();
    7 }
    8 Foo f2;

Inspecting the linetable, we have a code block containing L2 and L8 (so
over line 5).  If a user was to insert a breakpoint somewhere in L5, the
algorithm shown above would also insert a breakpoint at L8.  To solve
this, this patch uses the fact that L2 and L8 are, from dwarf's
perspective in a artificial block which corresponds to the static
initialization.  So if among multiple candidates, some are part of a
virtual function and some are not, we only keep the ones in the non
virtual function.

This proposed new algorithm will not solve all possible cases.  For
example, if the func function is inlined multiple times in the same block
for different template parameters, the lookup will process all of those as
part of the same scope.  Solving such situation would require heavier
works.  I think that this change, while not covering every scenarios is
still a step in the right direction.

Regression tested on x86_64 GNU/Linux.

Change-Id: I8291f9e1a3522934a0f418dfc565cebb86cff730
Reviewed-by: Guinevere Larsen <blarsen@redhat.com>
---

ChangeLog v2:
- Rebased and retested on latest binutils-gdb
- Incorporated the remarks from the first submission [1]

[1] [PATCH] gdb/linespec: Improve sliding breakpoints
https://inbox.sourceware.org/gdb-patches/20220617160746.1193587-1-lancelot.six@amd.com/

 gdb/linespec.c                                | 79 ++++++++++++++++++-
 gdb/symtab.c                                  | 45 +++++++++++
 gdb/symtab.h                                  | 18 +++++
 .../gdb.linespec/breakpoint-sliding.cc        | 67 ++++++++++++++++
 .../gdb.linespec/breakpoint-sliding.exp       | 69 ++++++++++++++++
 5 files changed, 275 insertions(+), 3 deletions(-)
 create mode 100644 gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
 create mode 100644 gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
  

Comments

Shahab Vahedi Sept. 26, 2024, 2:52 p.m. UTC | #1
Gentle reminder.
  
Gerlicher, Klaus Sept. 27, 2024, 7:08 a.m. UTC | #2
Hi,

Thanks for taking a stab at this. I haven't done any reviews so far but I worked a bit in this code area. There's 
only one or two things I'd like to point out, inline.

Thanks
Klaus

> -----Original Message-----
> From: Shahab Vahedi <list+gdb@vahedi.org>
> Sent: Tuesday, September 10, 2024 4:55 PM
> To: gdb-patches@sourceware.org
> Cc: Shahab Vahedi <list+gdb@vahedi.org>; Shahab Vahedi
> <shahab.vahedi@amd.com>; Lancelot Six <lancelot.six@amd.com>; Guinevere
> Larsen <blarsen@redhat.com>
> Subject: [PATCH v2] gdb/linespec: Improve sliding breakpoints
> 
> From: Lancelot SIX <lancelot.six@amd.com>
> 
> The algorithm used to place a breakpoint by line approximately goes as
> follow:
> 
> - Look for all the PCs which maps directly to the desired line
> - If any PC is found, place a breakpoint at each found PC.
> - If no PC maps to the desired line:
>   - Search for the lowest line number in the line table which is higher
>     than the desired line, and has some object code associated to it.
>   - List all PCs which map to this new line number.
>   - Place a breakpoint at each PC in the list above.
> 
> There are situations where this heuristic is not sufficient.  If we
> consider the following c++ code:
> 
>     12 template<typename T>
>     13 T func (T a, T b)
>     14 {
>     15   /* Break here.  */
>     16   if constexpr (std::is_integral_t<T>)
>     17     return a;
>     18   else
>     19     return b;
>     20 }
> 
> Given the current algorithm, if the user tries to set a breakpoint at
> line 15, here is what happens:
> 
> - Search PCs mapping to L15, there is none ;
> - Search for the first line number above L15 with code, this is L17 ;
> - List all PCs mapping to L17 and a breakpoints at insert a breakpoint at
>   each ;
> 
> Now lets suppose that this templated function has been instantiated both
> for an integral type (int) and non-integral type (float).  If we follow
> the algorithm, no breakpoint will ever be inserted in func<float>,
> while there will be a breakpoint in func<int>.
> 
> I came across a less obvious variation of this case where an inlined
> function had different optimization opportunities for different instances.
> As a consequence, some instances had the first actual line of code
> different than the other, making the breakpoint be missed and the user
> confused.
> 
> This patch is a proposition to adapt the algorithm in order to cover the
> scenario exposed above.  To do so, this patch ends up scanning
> independently for the best candidate in the various blocks composing
> the line table structures (a block being a succession of
> linetable_entries terminated by an entry with line == 0).
> 
> The line table for the example program above looks something like:
> 
>     (gdb) maintenance info line-table missed
>     symtab: /[...]/missed_pb.cc ((struct symtab *) 0x6210000f0500)
>     linetable: ((struct linetable *) 0x6210001a1220):
>     INDEX  LINE   ADDRESS            IS-STMT PROLOGUE-END
>     ...
>     17     END    0x00000000000011d3 Y
>     18     13     0x00000000000011d3 Y                    <-- func<float>
>     19     19     0x00000000000011e5 Y
>     20     20     0x00000000000011ea Y
>     21     END    0x00000000000011ec Y
>     22     13     0x00000000000011ec Y                    <-- func<int>
>     23     17     0x00000000000011fa Y
>     24     20     0x00000000000011fd Y
>     25     END    0x00000000000011ff Y
> 
> The first block listed (starting at 0x11d3) is func<float>, and the
> second starting at 0x11ec is func<int>.
> 
> The proposed algorithm analyzes each block independently.  The first step
> is to decide if a block contains the line the user is interested in (a.k.a
> the target).  In our example, this is L15.  A block is considered to match
> if it contains at least one instruction mapped to a line before the target
> and at least one instruction mapped to a line after the target.  In other
> words this means that will select all blocks where:
> 
> - there is at least one instruction mapped to a line L such as L <= target
> - there is at least one instruction mapped to a line M such as M >= target
> 
> For each block which satisfies this condition, we select the lowest line
> number above our target, and select all instructions mapping to this line
> number (there might be multiple).  In our example the func<float> block
> satisfies the condition (we have line 13 and 19), so we insert breakpoints
> at the PCs mapped to L19.  Similarly, the func<int> block is selected
> (thanks to L13 and L17), and we insert breakpoints at all PCs mapped to
> L17.
> 
> The core of this heuristic is implemented in find_best_le_for_symtab_line.
> 
> There are however some cases which are still problematic, where the
> first line after the target is the first line of the block we are
> looking for.  In such situation the algorithm would find nothing where
> we would want to select 0x11d3 and 0x11ec (both mapping to L13).  In
> order to solve this, the algorithm first looks for the lowest line
> number following the user specified target as this is done before this
> patch (here the first line with associated code following L12 which is
> L13).  Once this is figured out, this new target (L13) is passed to the
> algorithm above (find_best_le_for_symtab_line).
> 
> This is safe to do because we know that there is no line between the user
> target (included) and this new target.  Therefore, whatever
> find_best_le_for_symtab_line returns, it must map to lines numbers above
> (inclusive) the new target.
> 
> Finally, the testsuite showed an interesting regression.  Let's suppose
> we have:
> 
>     1 struct Foo { Foo () {} };
>     2 Foo f1;
>     3 void bar ()
>     4 {
>     5   /* break here.  */
>     6   do_something ();
>     7 }
>     8 Foo f2;
> 
> Inspecting the linetable, we have a code block containing L2 and L8 (so
> over line 5).  If a user was to insert a breakpoint somewhere in L5, the
> algorithm shown above would also insert a breakpoint at L8.  To solve
> this, this patch uses the fact that L2 and L8 are, from dwarf's
> perspective in a artificial block which corresponds to the static
> initialization.  So if among multiple candidates, some are part of a
> virtual function and some are not, we only keep the ones in the non
> virtual function.
> 
> This proposed new algorithm will not solve all possible cases.  For
> example, if the func function is inlined multiple times in the same block
> for different template parameters, the lookup will process all of those as
> part of the same scope.  Solving such situation would require heavier
> works.  I think that this change, while not covering every scenarios is
> still a step in the right direction.
> 
> Regression tested on x86_64 GNU/Linux.
> 
> Change-Id: I8291f9e1a3522934a0f418dfc565cebb86cff730
> Reviewed-by: Guinevere Larsen <blarsen@redhat.com>
> ---
> 
> ChangeLog v2:
> - Rebased and retested on latest binutils-gdb
> - Incorporated the remarks from the first submission [1]
> 
> [1] [PATCH] gdb/linespec: Improve sliding breakpoints
> https://inbox.sourceware.org/gdb-patches/20220617160746.1193587-1-
> lancelot.six@amd.com/
> 
>  gdb/linespec.c                                | 79 ++++++++++++++++++-
>  gdb/symtab.c                                  | 45 +++++++++++
>  gdb/symtab.h                                  | 18 +++++
>  .../gdb.linespec/breakpoint-sliding.cc        | 67 ++++++++++++++++
>  .../gdb.linespec/breakpoint-sliding.exp       | 69 ++++++++++++++++
>  5 files changed, 275 insertions(+), 3 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
>  create mode 100644 gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
> 
> diff --git a/gdb/linespec.c b/gdb/linespec.c
> index be85074b01d..f2c213a6c49 100644
> --- a/gdb/linespec.c
> +++ b/gdb/linespec.c
> @@ -2076,9 +2076,43 @@ create_sals_line_offset (struct linespec_state
> *self,
>        if (intermediate_results.empty () && best_entry != NULL)
>  	{
>  	  was_exact = false;
> -	  intermediate_results = decode_digits_ordinary (self, ls,
> -							 best_entry->line,
> -							 &best_entry);
> +	  /* We did not find an exact match for the line we are looking for.
> +	     As a consequence, we are now looking for the next best line.
> +	     However, the next best line does not have to be the same across
> +	     the entire program.  If we consider the following code:
> +
> +	       12 template<typename T>
> +	       13 T select (T a, T b)
> +	       14 {
> +	       15   // Break here.
> +	       16   if constexpr (std::is_integral_t<T>)
> +	       17     return a;
> +	       18   else
> +	       19     return b;
> +	       20 }
> +
> +	     For a function like the above, the next best line is L17 if T is
> +	     an integral type, but should be L19 if T is not.  As a
> +	     consequence we try to search for the next best line independently
> +	     in each code block (a code block being a contiguous section of a
> +	     line table terminated by an entry with line == 0).  */
> +	  for (const auto &symtab : ls->file_symtabs)
> +	    {
> +	      program_space *pspace
> +		= symtab->compunit ()->objfile ()->pspace ();
> +	      for (const linetable_entry *le
> +		   : find_best_le_for_symtab_line (symtab->linetable (),
> +						   best_entry->line))
> +		{
> +		  symtab_and_line sal;
> +		  sal.pspace = pspace;
> +		  sal.symtab = symtab;
> +		  sal.line = le->line;
> +		  sal.explicit_line = true;
> +		  sal.pc = le->pc (symtab->compunit ()->objfile ());
> +		  intermediate_results.push_back (sal);
> +		}
> +	    }
>  	}
> 
>        /* For optimized code, the compiler can scatter one source line
> @@ -2094,6 +2128,11 @@ create_sals_line_offset (struct linespec_state
> *self,
>        gdb::def_vector<int> filter (intermediate_results.size ());
>        gdb::def_vector<const block *> blocks (intermediate_results.size ());
> 
> +      /* Is there any non-artificial block?
> +	 Artificial blocks are marked by DW_AT_artificial attribute and
> +	 correspond to static initializations.  */
> +      bool any_nonartificial = false;
> +
>        for (i = 0; i < intermediate_results.size (); ++i)
>  	{
>  	  set_current_program_space (intermediate_results[i].pspace);
> @@ -2101,6 +2140,13 @@ create_sals_line_offset (struct linespec_state
> *self,
>  	  filter[i] = 1;
>  	  blocks[i] = block_for_pc_sect (intermediate_results[i].pc,
>  					 intermediate_results[i].section);
> +
> +	  if (!any_nonartificial && blocks[i] != nullptr)
> +	    {
> +	      const struct symbol *func = blocks[i]->containing_function ();
> +	      any_nonartificial
> +		= (func != nullptr && !func->is_artificial ());
> +	    }
>  	}
> 
>        for (i = 0; i < intermediate_results.size (); ++i)
> @@ -2156,6 +2202,33 @@ create_sals_line_offset (struct linespec_state
> *self,
>  		&& val.line < sym->line ())
>  	      continue;
> 
> +	    /* If we did not find an exact match, the above heuristic might
> +	       find some false positives.
> +
> +	       Lets consider (with Foo having a non trivial ctor):
> +
> +		    9 struct Foo { Foo () {} };
> +		   10 Foo f1;
> +		   11 int bar ()
> +		   12 {
> +		   13   // Break here.
> +		   14   return 42;
> +		   15 }
> +		   16 Foo f2;
> +
> +		When the user tries to insert a breakpoint at L13, the
> +		algorithm sees a code path from L10 to L16 (in static
> +		initialization), and will consider L16 as a candidate.
> +		However the user clearly intended to break in bar.  Therefore,
> +		if we see that there is at least 1 breakpoint in a non
> +		artificial function (bar), ignore the hit in the static
> +		initialization.  */
> +	    if (!was_exact
> +		&& any_nonartificial
> +		&& sym != nullptr
> +		&& sym->is_artificial ())
> +	      continue;
> +
>  	    if (self->funfirstline)
>  	      skip_prologue_sal (&sal);
> 
> diff --git a/gdb/symtab.c b/gdb/symtab.c
> index 3d8dcac63bb..ecd827570a8 100644
> --- a/gdb/symtab.c
> +++ b/gdb/symtab.c
> @@ -3605,6 +3605,51 @@ find_pcs_for_symtab_line (struct symtab
> *symtab, int line,
>    return result;
>  }
> 
> +/* See symtab.h.  */
> +

find_pcs_for_symtab_line and find_line_symtab seemsto be very similar to what is being added here. And since we already
have something like this I would rather improve the existing code instead of adding an entirely new function. 

It also looks like decode_digits_ordinary() does not have any users anymore after this patch.

As a side note it is also interesting that find_line_symtab goes through all symtabs associated with the source file which
find_pcs_for_symtab_line used by decode_digits_ordinary does not do.

My concern also extends to other functionality like "info line" which does not produce all locations correctly. Ideally we would have one lookup 
function for both setting breakpoints and looking up line information.

> +std::vector<const linetable_entry *>
> +find_best_le_for_symtab_line (const linetable *lt, int line)
> +{
> +  std::vector<const linetable_entry *> res;
> +
> +  if (lt == nullptr || line < 0)
> +    return res;
> +
> +  /* Tells if in the current block we found at least one PC mapping to a line
> +     before LINE.  */
> +  bool found_preceding = false;
> +  const int nitems = lt->nitems;
> +  std::vector<const linetable_entry *> best_candidates;
> +
> +  for (int i = 0; i < nitems; ++i)
> +    {
> +      const linetable_entry &item = lt->item[i];
> +      found_preceding
> +	= found_preceding || (item.line > 0 && item.line <= line);
> +
> +      if (item.is_stmt && item.line >= line
> +	  && (best_candidates.empty ()
> +	      || item.line <= best_candidates.front ()->line))
> +	{
> +	  if (!best_candidates.empty ()
> +	      && item.line < best_candidates.front ()->line)
> +	    best_candidates.clear ();
> +	  best_candidates.push_back (&item);
> +	}
> +      else if (item.line == 0)
> +	{
> +	  if (found_preceding)
> +	    res.insert (res.end (), best_candidates.begin (),
> +			best_candidates.end ());
> +
> +	  found_preceding = false;
> +	  best_candidates.clear ();
> +	}
> +    }
> +
> +  return res;
> +}
> +
> 
> 
> 
>  /* Set the PC value for a given source file and line number and return true.
>     Returns false for invalid line number (and sets the PC to 0).
> diff --git a/gdb/symtab.h b/gdb/symtab.h
> index d615fdf1e52..d9273a1e69a 100644
> --- a/gdb/symtab.h
> +++ b/gdb/symtab.h
> @@ -2815,6 +2815,24 @@ void iterate_over_symtabs (const char *name,
>  std::vector<CORE_ADDR> find_pcs_for_symtab_line
>      (struct symtab *symtab, int line, const linetable_entry **best_entry);
> 
> +/* Given a linetable LT, return the best entries to map to line LINE.
> +   An entry must follow these conditions to be valid:
> +
> +   1) It must be a part of a block (i.e. a sequence of entries terminated
> +   by an entry with line == 0) containing at least one statement above LINE
> +   and one equal to or below LINE.
> +
> +   2) The entry must be a statement.
> +
> +   3) The line associated with this statement must be equal to or higher
> +   than LINE
> +
> +   The best entry for any given block will then be the one with the lowest
> +   line. */
> +
> +std::vector<const linetable_entry *> find_best_le_for_symtab_line
> +    (const linetable *lt, int line);
> +
>  /* Prototype for callbacks for LA_ITERATE_OVER_SYMBOLS.  The callback
>     is called once per matching symbol SYM.  The callback should return
>     true to indicate that LA_ITERATE_OVER_SYMBOLS should continue
> diff --git a/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
> b/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
> new file mode 100644
> index 00000000000..04e88891279
> --- /dev/null
> +++ b/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
> @@ -0,0 +1,67 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright (C) 2022-2024 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include <type_traits>
> +
> +template<typename T>
> +T
> +foo (T a, T b)
> +{
> +  T ret;
> +  /* In foo break here.  */
> +  if constexpr (std::is_integral_v<T>)
> +    ret = a; /* break in foo<int>.  */
> +  else
> +    ret = b; /* break in foo<NonIntegral>.  */
> +
> +  return ret;
> +}
> +
> +struct NonIntegral
> +{
> +  int m;
> +};
> +
> +struct St
> +{
> +  St ()
> +  { /* Empty.  */ }
> +};
> +
> +St st1;
> +
> +static int
> +between_static_init ()
> +{
> +  /* between_static_init break here.  */
> +  return 42; /* break in between_static_init.  */
> +}
> +
> +St st2;
> +
> +int main ()
> +{
> +
> +#if __cplusplus >= 201703L
> +  foo<int> (1, 2);
> +  foo<NonIntegral> ({ 1 }, { 2 });
> +#endif
> +
> +  between_static_init ();
> +
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
> b/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
> new file mode 100644
> index 00000000000..f84e2fd08ae
> --- /dev/null
> +++ b/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
> @@ -0,0 +1,69 @@
> +# Copyright (C) 2022-2024 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# This testcase exercises breakpoint sliding when the user tries to set
> +# a breakpoint at a line number which has no code associated to it.  GDB
> +# should then place breakpoint at the first line following the one given
> +# by the user in each code block composing the program space, given that
> +# the code block "covers" the line asked by the user.
> +
> +require allow_cplus_tests
> +
> +standard_testfile .cc
> +
> +# This test relies on having a C++17 capable compiler.  Since Sep-2023,
> +# C++17 is required to build gdb/gdbserver/gdbsupport (see f74dc26792a),
> +# so that should not be be a problem.
> +if {[prepare_for_testing "failed to prepare with c++17" $testfile \
> +			 $srcfile {debug c++ additional_flags=-std=c++17}]} {
> +    return -1
> +}
> +
> +proc_with_prefix test_sliding_per_block {} {
> +    set loc [gdb_get_line_number "In foo break here."]
> +    gdb_test "break $loc" \
> +	"Breakpoint $::decimal.* \\(2 locations\\)" \
> +	"slide to if constexpr"
> +
> +    set loc_in_int_instance [gdb_get_line_number "break in foo<int>"]
> +    set loc_in_struct_instance [gdb_get_line_number "break in
> foo<NonIntegral>"]
> +    set seen_int_pb 0
> +    set seen_struct_bp 0
> +    gdb_test_multiple "info breakpoint" "info breakpoint" {
> +	-re "in foo<int>\\(int, int\\) at
> \[^\n\r\]*$::srcfile:$loc_in_int_instance" {
> +	    set seen_int_pb 1
> +	    exp_continue
> +	}
> +	-re "in foo<NonIntegral>\\(NonIntegral, NonIntegral\\) at
> \[^\n\r\]*$::srcfile:$loc_in_struct_instance" {
> +	    set seen_struct_bp 1
> +	    exp_continue
> +	}
> +	-re -wrap "" {
> +	    gdb_assert {$seen_int_pb && $seen_struct_bp} $gdb_test_name
> +	}
> +    }
> +}
> +
> +proc_with_prefix test_ignore_artificial_block {} {
> +    set loc  [gdb_get_line_number "between_static_init break here"]
> +    set expected_loc [gdb_get_line_number "break in between_static_init."]
> +    gdb_test "break $loc" \
> +	"Breakpoint $::decimal at $::hex: file .*/$::srcfile, line
> $expected_loc\\." \
> +	"ignore static block"
> +}
> +
> +clean_restart $::binfile
> +test_sliding_per_block
> +test_ignore_artificial_block
> --
> 2.46.0
> 

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
  
Luis Machado Sept. 27, 2024, 8:11 a.m. UTC | #3
On 9/10/24 15:55, Shahab Vahedi wrote:
> From: Lancelot SIX <lancelot.six@amd.com>
> 
> The algorithm used to place a breakpoint by line approximately goes as
> follow:
> 
> - Look for all the PCs which maps directly to the desired line
> - If any PC is found, place a breakpoint at each found PC.
> - If no PC maps to the desired line:
>   - Search for the lowest line number in the line table which is higher
>     than the desired line, and has some object code associated to it.
>   - List all PCs which map to this new line number.
>   - Place a breakpoint at each PC in the list above.
> 
> There are situations where this heuristic is not sufficient.  If we
> consider the following c++ code:
> 
>     12 template<typename T>
>     13 T func (T a, T b)
>     14 {
>     15   /* Break here.  */
>     16   if constexpr (std::is_integral_t<T>)
>     17     return a;
>     18   else
>     19     return b;
>     20 }
> 
> Given the current algorithm, if the user tries to set a breakpoint at
> line 15, here is what happens:
> 
> - Search PCs mapping to L15, there is none ;
> - Search for the first line number above L15 with code, this is L17 ;
> - List all PCs mapping to L17 and a breakpoints at insert a breakpoint at
>   each ;
> 
> Now lets suppose that this templated function has been instantiated both
> for an integral type (int) and non-integral type (float).  If we follow
> the algorithm, no breakpoint will ever be inserted in func<float>,
> while there will be a breakpoint in func<int>.
> 
> I came across a less obvious variation of this case where an inlined
> function had different optimization opportunities for different instances.
> As a consequence, some instances had the first actual line of code
> different than the other, making the breakpoint be missed and the user
> confused.

Out of curiosity, how much of this issue is down to the generated debug info not
depicting things in a more precise way?

Also, in practical terms, why would we place a breakpoint at 15 when 16 would be
a better option?

We also have rbreak for pattern-based breakpoint insertion, which should hopefully
cover multiple instantiations of templates.
  

Patch

diff --git a/gdb/linespec.c b/gdb/linespec.c
index be85074b01d..f2c213a6c49 100644
--- a/gdb/linespec.c
+++ b/gdb/linespec.c
@@ -2076,9 +2076,43 @@  create_sals_line_offset (struct linespec_state *self,
       if (intermediate_results.empty () && best_entry != NULL)
 	{
 	  was_exact = false;
-	  intermediate_results = decode_digits_ordinary (self, ls,
-							 best_entry->line,
-							 &best_entry);
+	  /* We did not find an exact match for the line we are looking for.
+	     As a consequence, we are now looking for the next best line.
+	     However, the next best line does not have to be the same across
+	     the entire program.  If we consider the following code:
+
+	       12 template<typename T>
+	       13 T select (T a, T b)
+	       14 {
+	       15   // Break here.
+	       16   if constexpr (std::is_integral_t<T>)
+	       17     return a;
+	       18   else
+	       19     return b;
+	       20 }
+
+	     For a function like the above, the next best line is L17 if T is
+	     an integral type, but should be L19 if T is not.  As a
+	     consequence we try to search for the next best line independently
+	     in each code block (a code block being a contiguous section of a
+	     line table terminated by an entry with line == 0).  */
+	  for (const auto &symtab : ls->file_symtabs)
+	    {
+	      program_space *pspace
+		= symtab->compunit ()->objfile ()->pspace ();
+	      for (const linetable_entry *le
+		   : find_best_le_for_symtab_line (symtab->linetable (),
+						   best_entry->line))
+		{
+		  symtab_and_line sal;
+		  sal.pspace = pspace;
+		  sal.symtab = symtab;
+		  sal.line = le->line;
+		  sal.explicit_line = true;
+		  sal.pc = le->pc (symtab->compunit ()->objfile ());
+		  intermediate_results.push_back (sal);
+		}
+	    }
 	}
 
       /* For optimized code, the compiler can scatter one source line
@@ -2094,6 +2128,11 @@  create_sals_line_offset (struct linespec_state *self,
       gdb::def_vector<int> filter (intermediate_results.size ());
       gdb::def_vector<const block *> blocks (intermediate_results.size ());
 
+      /* Is there any non-artificial block?
+	 Artificial blocks are marked by DW_AT_artificial attribute and
+	 correspond to static initializations.  */
+      bool any_nonartificial = false;
+
       for (i = 0; i < intermediate_results.size (); ++i)
 	{
 	  set_current_program_space (intermediate_results[i].pspace);
@@ -2101,6 +2140,13 @@  create_sals_line_offset (struct linespec_state *self,
 	  filter[i] = 1;
 	  blocks[i] = block_for_pc_sect (intermediate_results[i].pc,
 					 intermediate_results[i].section);
+
+	  if (!any_nonartificial && blocks[i] != nullptr)
+	    {
+	      const struct symbol *func = blocks[i]->containing_function ();
+	      any_nonartificial
+		= (func != nullptr && !func->is_artificial ());
+	    }
 	}
 
       for (i = 0; i < intermediate_results.size (); ++i)
@@ -2156,6 +2202,33 @@  create_sals_line_offset (struct linespec_state *self,
 		&& val.line < sym->line ())
 	      continue;
 
+	    /* If we did not find an exact match, the above heuristic might
+	       find some false positives.
+
+	       Lets consider (with Foo having a non trivial ctor):
+
+		    9 struct Foo { Foo () {} };
+		   10 Foo f1;
+		   11 int bar ()
+		   12 {
+		   13   // Break here.
+		   14   return 42;
+		   15 }
+		   16 Foo f2;
+
+		When the user tries to insert a breakpoint at L13, the
+		algorithm sees a code path from L10 to L16 (in static
+		initialization), and will consider L16 as a candidate.
+		However the user clearly intended to break in bar.  Therefore,
+		if we see that there is at least 1 breakpoint in a non
+		artificial function (bar), ignore the hit in the static
+		initialization.  */
+	    if (!was_exact
+		&& any_nonartificial
+		&& sym != nullptr
+		&& sym->is_artificial ())
+	      continue;
+
 	    if (self->funfirstline)
 	      skip_prologue_sal (&sal);
 
diff --git a/gdb/symtab.c b/gdb/symtab.c
index 3d8dcac63bb..ecd827570a8 100644
--- a/gdb/symtab.c
+++ b/gdb/symtab.c
@@ -3605,6 +3605,51 @@  find_pcs_for_symtab_line (struct symtab *symtab, int line,
   return result;
 }
 
+/* See symtab.h.  */
+
+std::vector<const linetable_entry *>
+find_best_le_for_symtab_line (const linetable *lt, int line)
+{
+  std::vector<const linetable_entry *> res;
+
+  if (lt == nullptr || line < 0)
+    return res;
+
+  /* Tells if in the current block we found at least one PC mapping to a line
+     before LINE.  */
+  bool found_preceding = false;
+  const int nitems = lt->nitems;
+  std::vector<const linetable_entry *> best_candidates;
+
+  for (int i = 0; i < nitems; ++i)
+    {
+      const linetable_entry &item = lt->item[i];
+      found_preceding
+	= found_preceding || (item.line > 0 && item.line <= line);
+
+      if (item.is_stmt && item.line >= line
+	  && (best_candidates.empty ()
+	      || item.line <= best_candidates.front ()->line))
+	{
+	  if (!best_candidates.empty ()
+	      && item.line < best_candidates.front ()->line)
+	    best_candidates.clear ();
+	  best_candidates.push_back (&item);
+	}
+      else if (item.line == 0)
+	{
+	  if (found_preceding)
+	    res.insert (res.end (), best_candidates.begin (),
+			best_candidates.end ());
+
+	  found_preceding = false;
+	  best_candidates.clear ();
+	}
+    }
+
+  return res;
+}
+
 
 /* Set the PC value for a given source file and line number and return true.
    Returns false for invalid line number (and sets the PC to 0).
diff --git a/gdb/symtab.h b/gdb/symtab.h
index d615fdf1e52..d9273a1e69a 100644
--- a/gdb/symtab.h
+++ b/gdb/symtab.h
@@ -2815,6 +2815,24 @@  void iterate_over_symtabs (const char *name,
 std::vector<CORE_ADDR> find_pcs_for_symtab_line
     (struct symtab *symtab, int line, const linetable_entry **best_entry);
 
+/* Given a linetable LT, return the best entries to map to line LINE.
+   An entry must follow these conditions to be valid:
+
+   1) It must be a part of a block (i.e. a sequence of entries terminated
+   by an entry with line == 0) containing at least one statement above LINE
+   and one equal to or below LINE.
+
+   2) The entry must be a statement.
+
+   3) The line associated with this statement must be equal to or higher
+   than LINE
+
+   The best entry for any given block will then be the one with the lowest
+   line. */
+
+std::vector<const linetable_entry *> find_best_le_for_symtab_line
+    (const linetable *lt, int line);
+
 /* Prototype for callbacks for LA_ITERATE_OVER_SYMBOLS.  The callback
    is called once per matching symbol SYM.  The callback should return
    true to indicate that LA_ITERATE_OVER_SYMBOLS should continue
diff --git a/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc b/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
new file mode 100644
index 00000000000..04e88891279
--- /dev/null
+++ b/gdb/testsuite/gdb.linespec/breakpoint-sliding.cc
@@ -0,0 +1,67 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright (C) 2022-2024 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <type_traits>
+
+template<typename T>
+T
+foo (T a, T b)
+{
+  T ret;
+  /* In foo break here.  */
+  if constexpr (std::is_integral_v<T>)
+    ret = a; /* break in foo<int>.  */
+  else
+    ret = b; /* break in foo<NonIntegral>.  */
+
+  return ret;
+}
+
+struct NonIntegral
+{
+  int m;
+};
+
+struct St
+{
+  St ()
+  { /* Empty.  */ }
+};
+
+St st1;
+
+static int
+between_static_init ()
+{
+  /* between_static_init break here.  */
+  return 42; /* break in between_static_init.  */
+}
+
+St st2;
+
+int main ()
+{
+
+#if __cplusplus >= 201703L
+  foo<int> (1, 2);
+  foo<NonIntegral> ({ 1 }, { 2 });
+#endif
+
+  between_static_init ();
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp b/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
new file mode 100644
index 00000000000..f84e2fd08ae
--- /dev/null
+++ b/gdb/testsuite/gdb.linespec/breakpoint-sliding.exp
@@ -0,0 +1,69 @@ 
+# Copyright (C) 2022-2024 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This testcase exercises breakpoint sliding when the user tries to set
+# a breakpoint at a line number which has no code associated to it.  GDB
+# should then place breakpoint at the first line following the one given
+# by the user in each code block composing the program space, given that
+# the code block "covers" the line asked by the user.
+
+require allow_cplus_tests
+
+standard_testfile .cc
+
+# This test relies on having a C++17 capable compiler.  Since Sep-2023,
+# C++17 is required to build gdb/gdbserver/gdbsupport (see f74dc26792a),
+# so that should not be be a problem.
+if {[prepare_for_testing "failed to prepare with c++17" $testfile \
+			 $srcfile {debug c++ additional_flags=-std=c++17}]} {
+    return -1
+}
+
+proc_with_prefix test_sliding_per_block {} {
+    set loc [gdb_get_line_number "In foo break here."]
+    gdb_test "break $loc" \
+	"Breakpoint $::decimal.* \\(2 locations\\)" \
+	"slide to if constexpr"
+
+    set loc_in_int_instance [gdb_get_line_number "break in foo<int>"]
+    set loc_in_struct_instance [gdb_get_line_number "break in foo<NonIntegral>"]
+    set seen_int_pb 0
+    set seen_struct_bp 0
+    gdb_test_multiple "info breakpoint" "info breakpoint" {
+	-re "in foo<int>\\(int, int\\) at \[^\n\r\]*$::srcfile:$loc_in_int_instance" {
+	    set seen_int_pb 1
+	    exp_continue
+	}
+	-re "in foo<NonIntegral>\\(NonIntegral, NonIntegral\\) at \[^\n\r\]*$::srcfile:$loc_in_struct_instance" {
+	    set seen_struct_bp 1
+	    exp_continue
+	}
+	-re -wrap "" {
+	    gdb_assert {$seen_int_pb && $seen_struct_bp} $gdb_test_name
+	}
+    }
+}
+
+proc_with_prefix test_ignore_artificial_block {} {
+    set loc  [gdb_get_line_number "between_static_init break here"]
+    set expected_loc [gdb_get_line_number "break in between_static_init."]
+    gdb_test "break $loc" \
+	"Breakpoint $::decimal at $::hex: file .*/$::srcfile, line $expected_loc\\." \
+	"ignore static block"
+}
+
+clean_restart $::binfile
+test_sliding_per_block
+test_ignore_artificial_block