[3/8] Break at each iteration for breakpoints placed on a while statement

Message ID 20150917185751.22fe2937@pinnacle.lan
State New, archived
Headers

Commit Message

Kevin Buettner Sept. 18, 2015, 1:57 a.m. UTC
  On Tue, 25 Aug 2015 13:09:55 +0100
Pedro Alves <palves@redhat.com> wrote:

> On 08/19/2015 08:03 AM, Kevin Buettner wrote:
> 
> > @@ -1933,6 +1956,18 @@ create_sals_line_offset (struct linespec_state *self,
> >  	    struct symbol *sym = (blocks[i]
> >  				  ? block_containing_function (blocks[i])
> >  				  : NULL);
> > +	    CORE_ADDR branch_addr = gdbarch_unconditional_branch_address
> > +	      (get_current_arch (), intermediate_results.sals[i].pc);
> 
> It'd be nice to have a comment here explaining why we do this.  You have
> it in the commit log, but it'd be useful going forward to have it in
> the sources.
> 
> Nit, this is about the branch _destination_ not the branch instruction's
> address, right?  I'd suggest
> s/gdbarch_unconditional_branch_address/gdbarch_unconditional_branch_destination/
> 
> Also, there should be a better gdbarch to use than get_current_arch().
> E.g., get_objfile_arch (SYMTAB_OBJFILE (sals[i].symtab)), the sal's pspace's
> gdbarch, etc.
> 
> > +
> > +	    /* Only use branch if it's in the same block and is also
> 
> And here "use branch destination".
> 
> > +	       within one of the sals from the initial list.  */
> > +	    if (branch_addr != 0 && blocks[i]->startaddr <= branch_addr
> > +	        && branch_addr < blocks[i]->endaddr
> > +		&& addr_in_sals (branch_addr, intermediate_results.nelts,
> > +		                 intermediate_results.sals))
> > +	      {
> > +		intermediate_results.sals[i].pc = branch_addr;
> > +	      }
> 
> Also, we really shouldn't be adding code that assumes 0 to mean invalid
> address, as on some ports, it is a valid address.  You could e.g.,
> change gdbarch_unconditional_branch_address to return an int and
> take an output CORE_ADDR * parameter, and/or make it an M gdbarch
> method, and check gdbarch_unconditional_branch_address_p before
> use.

I've fixed each of these in the new patch.  See below.

> I wonder whether we have to worry about the case of a goto
> jumping to the same line?  Like:
> 
>   goto label; foo (); label: bar ();
> 
> Not the most usual or beautiful code to write, though I'd guess code
> generators could output things like that.  Writing it in several lines
> but behind a #define might be another way to get everything in
> the same line.

Here is what I tried:

volatile int v;
...
  v = 0;
  goto d; c: v++; d: if (v < 3) goto c;	/* Loop 5 */

Here's what gcc generates for x86_64 (on my machine, anyway):

=> 0x4005c2 <loop_test+210>:    movl   $0x0,0x200a64(%rip)        # 0x601030 <v>
   0x4005cc <loop_test+220>:    nop
   0x4005cd <loop_test+221>:    mov    0x200a5d(%rip),%eax        # 0x601030 <v>
   0x4005d3 <loop_test+227>:    cmp    $0x2,%eax
   0x4005d6 <loop_test+230>:    jg     0x4005ea <loop_test+250>
   0x4005d8 <loop_test+232>:    nop
   0x4005d9 <loop_test+233>:    mov    0x200a51(%rip),%eax        # 0x601030 <v>
   0x4005df <loop_test+239>:    add    $0x1,%eax
   0x4005e2 <loop_test+242>:    mov    %eax,0x200a48(%rip)        # 0x601030 <v>
   0x4005e8 <loop_test+248>:    jmp    0x4005cd <loop_test+221>

If I place a breakpoint on the Loop 5 line, it ends up on on 0x4005cc,
which is a nop.  It appears that gcc has rewritten this code so that
the the comparison comes first, which exactly the opposite of what
happens when one writes a while loop.

The presence of the nop means that things work as expected when we
place a breakpoint on that line.  It will stop just once on the
statement instead of once for each time through the loop.

This was an interesting exercise, but it doesn't really answer the
question of what would happen if gcc would instead place an
actual branch at the beginning.  I would expect my patch
to cause the breakpoint to be placed within the loop instead
of at the beginning of the statement.  I don't think this is
desirable, but I can't think of a way (within the current work) to
stop it from happening.

Here's the new patch:

    Break at each iteration for breakpoints placed on a while statement.
    
    This patch changes create_sals_line_offset() in linespec.c so that, for
    a given SAL, if that SAL's address (pc) refers to an unconditional
    branch instruction whose branch target also refers to the same SAL, then
    the branch target is used for the SAL instead.
    
    The pratical effect of this is that a breakpoint placed on a while
    loop will break at the evaluation of the condition instead of at the
    unconditional branch which transfers control to the starting address
    for the evaluation of the condition.
    
    Consider the following code snippet (which is taken from one of the
    new tests for this patch set):
    
        9         while (v < 3)                         /* Loop 1 condition */
        10          {
        11            v++;                              /* Loop 1 increment */
        12          }
    
    This is compiled as the following x86_64 code:
    
        0x000000000040059e <loop_test+14>:   jmp    0x4005af <loop_test+31>
        0x00000000004005a0 <loop_test+16>:   mov    0x200a8a(%rip),%eax # 0x601030 <v>
        0x00000000004005a6 <loop_test+22>:   add    $0x1,%eax
        0x00000000004005a9 <loop_test+25>:   mov    %eax,0x200a81(%rip) # 0x601030 <v>
        0x00000000004005af <loop_test+31>:   mov    0x200a7b(%rip),%eax # 0x601030 <v>
        0x00000000004005b5 <loop_test+37>:   cmp    $0x2,%eax
        0x00000000004005b8 <loop_test+40>:   jle    0x4005a0 <loop_test+16>
    
    If a breakpoint is placed on line 9, which begins at loop_test+14, this
    change/patch causes the breakpoint to be placed on loop_test+31, which is
    the starting address for the evaluation of the condition.
    
    In order for this to work, an architecture specific method,
    unconditional_branch_destination, was introduced in an earlier patch in
    the set.  I've implemented this method for x86_64, i386, arm, thumb,
    powerpc, rx, and rl78.
    
    I've tested on each of these architectures and see no regressions.
    
    gdb/ChangeLog:
    	* linespec.c (addr_in_sals): New function.
    	(create_sals_line_offset): Adjust SAL whose pc refers to an
    	unconditional branch whose destination is the same line.
---
 gdb/linespec.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)
  

Comments

Pedro Alves Sept. 30, 2015, 12:16 p.m. UTC | #1
Hi Kevin,

Sorry for the delay.

On 09/18/2015 02:57 AM, Kevin Buettner wrote:

> If I place a breakpoint on the Loop 5 line, it ends up on on 0x4005cc,
> which is a nop.  It appears that gcc has rewritten this code so that
> the the comparison comes first, which exactly the opposite of what
> happens when one writes a while loop.
> 
> The presence of the nop means that things work as expected when we
> place a breakpoint on that line.  It will stop just once on the
> statement instead of once for each time through the loop.
> 
> This was an interesting exercise, but it doesn't really answer the
> question of what would happen if gcc would instead place an
> actual branch at the beginning.  I would expect my patch
> to cause the breakpoint to be placed within the loop instead
> of at the beginning of the statement.  I don't think this is
> desirable, but I can't think of a way (within the current work) to
> stop it from happening.

Yeah, me neither.  This is always going to be brittle.  :-/
Really the best would be DWARF is_stmt.

> 
> Here's the new patch:
> 
>     Break at each iteration for breakpoints placed on a while statement.
>     
>     This patch changes create_sals_line_offset() in linespec.c so that, for
>     a given SAL, if that SAL's address (pc) refers to an unconditional
>     branch instruction whose branch target also refers to the same SAL, then
>     the branch target is used for the SAL instead.
>     
>     The pratical effect of this is that a breakpoint placed on a while

"practical"

>     loop will break at the evaluation of the condition instead of at the
>     unconditional branch which transfers control to the starting address
>     for the evaluation of the condition.
>     
>     Consider the following code snippet (which is taken from one of the
>     new tests for this patch set):
>     
>         9         while (v < 3)                         /* Loop 1 condition */
>         10          {
>         11            v++;                              /* Loop 1 increment */
>         12          }
>     
>     This is compiled as the following x86_64 code:
>     
>         0x000000000040059e <loop_test+14>:   jmp    0x4005af <loop_test+31>
>         0x00000000004005a0 <loop_test+16>:   mov    0x200a8a(%rip),%eax # 0x601030 <v>
>         0x00000000004005a6 <loop_test+22>:   add    $0x1,%eax
>         0x00000000004005a9 <loop_test+25>:   mov    %eax,0x200a81(%rip) # 0x601030 <v>
>         0x00000000004005af <loop_test+31>:   mov    0x200a7b(%rip),%eax # 0x601030 <v>
>         0x00000000004005b5 <loop_test+37>:   cmp    $0x2,%eax
>         0x00000000004005b8 <loop_test+40>:   jle    0x4005a0 <loop_test+16>
>     
>     If a breakpoint is placed on line 9, which begins at loop_test+14, this
>     change/patch causes the breakpoint to be placed on loop_test+31, which is
>     the starting address for the evaluation of the condition.
>     
>     In order for this to work, an architecture specific method,
>     unconditional_branch_destination, was introduced in an earlier patch in
>     the set.  I've implemented this method for x86_64, i386, arm, thumb,
>     powerpc, rx, and rl78.
>     
>     I've tested on each of these architectures and see no regressions.
>     
>     gdb/ChangeLog:
>     	* linespec.c (addr_in_sals): New function.
>     	(create_sals_line_offset): Adjust SAL whose pc refers to an
>     	unconditional branch whose destination is the same line.
> ---
>  gdb/linespec.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)
> 
> diff --git a/gdb/linespec.c b/gdb/linespec.c
> index 4c29c12..4b5cde9 100644
> --- a/gdb/linespec.c
> +++ b/gdb/linespec.c
> @@ -1808,6 +1808,29 @@ canonicalize_linespec (struct linespec_state *state, const linespec_p ls)
>      }
>  }
>  
> +/* Return 1 if one of the SALS between 0 and NELTS contains ADDR. 
> +   Return 0 otherwise.  */
> +
> +static int
> +addr_in_sals (CORE_ADDR addr, int nelts, struct symtab_and_line *sals)
> +{
> +  int i;
> +
> +  for (i = 0; i < nelts; i++)
> +    {
> +      struct symtab_and_line sal;
> +
> +      if (sals[i].end == 0)
> +        sal = find_pc_sect_line (sals[i].pc, sals[i].section, 0);
> +      else
> +        sal = sals[i];
> +
> +      if (sal.pc <= addr && addr < sal.end)
> +	return 1;
> +    }
> +  return 0;
> +}
> +
>  /* Given a line offset in LS, construct the relevant SALs.  */
>  
>  static struct symtabs_and_lines
> @@ -1933,6 +1956,59 @@ create_sals_line_offset (struct linespec_state *self,
>  	    struct symbol *sym = (blocks[i]
>  				  ? block_containing_function (blocks[i])
>  				  : NULL);
> +	    int is_branch;
> +	    CORE_ADDR branch_dest;
> +	    
> +	    /* This next bit of code examines the instruction at the
> +	       SAL's address (pc).  If the instruction is an
> +	       unconditional branch whose branch destination also
> +	       refers to the same SAL, then the branch destination is
> +	       used for the SAL's pc value instead.
> +
> +	       The pratical effect of this is that a breakpoint placed

"practical".

> +	       on a while loop will break at the evaluation of the
> +	       condition instead of at an unconditional branch which
> +	       transfers control to the starting address for the
> +	       evaluation of the condition.
> +
> +	       Consider the following code snippet (which is taken
> +	       from the test case for gdb.base/loop-break.exp.)
> +
> +		 while (v < 3)
> +                  {
> +                    v++; 
> +                  }
> +
> +	       On x86_64, this might be compiled as follows:
> +
> +		 0x40059e <loop_test+14>:   jmp    0x4005af <loop_test+31>
> +		 0x4005a0 <loop_test+16>:   mov    0x200a8a(%rip),%eax
> +		 0x4005a6 <loop_test+22>:   add    $0x1,%eax
> +		 0x4005a9 <loop_test+25>:   mov    %eax,0x200a81(%rip)
> +		 0x4005af <loop_test+31>:   mov    0x200a7b(%rip),%eax
> +		 0x4005b5 <loop_test+37>:   cmp    $0x2,%eax
> +		 0x4005b8 <loop_test+40>:   jle    0x4005a0 <loop_test+16>
> +
> +	       If a breakpoint is placed on the while statement line
> +	       which begins at loop_test+14, the following code causes
> +	       the breakpoint to be (instead) placed on loop_test+31, which
> +	       is the starting address for the evaluation of the condition.  */
> +
> +	    is_branch = gdbarch_unconditional_branch_destination
> +	      (get_objfile_arch
> +	        (SYMTAB_OBJFILE (intermediate_results.sals[i].symtab)),
> +	       intermediate_results.sals[i].pc,
> +	       &branch_dest);
> +
> +	    /* Only use branch destination if it's in the same block and
> +	       is also within one of the sals from the initial list.  */
> +	    if (is_branch && blocks[i]->startaddr <= branch_dest
> +	        && branch_dest < blocks[i]->endaddr
> +		&& addr_in_sals (branch_dest, intermediate_results.nelts,
> +		                 intermediate_results.sals))

I believe intermediate_results can contain sals of different
pspaces / inferiors.  Shouldn't we take that into account somehow?

> +	      {
> +		intermediate_results.sals[i].pc = branch_dest;
> +	      }

Unnecessary braces.

>  
>  	    if (self->funfirstline)
>  	      skip_prologue_sal (&intermediate_results.sals[i]);
> 

Thanks,
Pedro Alves
  
Doug Evans Oct. 1, 2015, 4:09 a.m. UTC | #2
On Wed, Sep 30, 2015 at 5:16 AM, Pedro Alves <palves@redhat.com> wrote:
> Hi Kevin,
>
> Sorry for the delay.
>
> On 09/18/2015 02:57 AM, Kevin Buettner wrote:
>
>> If I place a breakpoint on the Loop 5 line, it ends up on on 0x4005cc,
>> which is a nop.  It appears that gcc has rewritten this code so that
>> the the comparison comes first, which exactly the opposite of what
>> happens when one writes a while loop.
>>
>> The presence of the nop means that things work as expected when we
>> place a breakpoint on that line.  It will stop just once on the
>> statement instead of once for each time through the loop.
>>
>> This was an interesting exercise, but it doesn't really answer the
>> question of what would happen if gcc would instead place an
>> actual branch at the beginning.  I would expect my patch
>> to cause the breakpoint to be placed within the loop instead
>> of at the beginning of the statement.  I don't think this is
>> desirable, but I can't think of a way (within the current work) to
>> stop it from happening.
>
> Yeah, me neither.  This is always going to be brittle.  :-/
> Really the best would be DWARF is_stmt.

I hate to enter into this late, but holy cow,
is this really what the dwarf committee and gcc developers
expect gdb to do?

Assuming, for the moment, that dwarf handles this just fine,
can't we push back on gcc?
  

Patch

diff --git a/gdb/linespec.c b/gdb/linespec.c
index 4c29c12..4b5cde9 100644
--- a/gdb/linespec.c
+++ b/gdb/linespec.c
@@ -1808,6 +1808,29 @@  canonicalize_linespec (struct linespec_state *state, const linespec_p ls)
     }
 }
 
+/* Return 1 if one of the SALS between 0 and NELTS contains ADDR. 
+   Return 0 otherwise.  */
+
+static int
+addr_in_sals (CORE_ADDR addr, int nelts, struct symtab_and_line *sals)
+{
+  int i;
+
+  for (i = 0; i < nelts; i++)
+    {
+      struct symtab_and_line sal;
+
+      if (sals[i].end == 0)
+        sal = find_pc_sect_line (sals[i].pc, sals[i].section, 0);
+      else
+        sal = sals[i];
+
+      if (sal.pc <= addr && addr < sal.end)
+	return 1;
+    }
+  return 0;
+}
+
 /* Given a line offset in LS, construct the relevant SALs.  */
 
 static struct symtabs_and_lines
@@ -1933,6 +1956,59 @@  create_sals_line_offset (struct linespec_state *self,
 	    struct symbol *sym = (blocks[i]
 				  ? block_containing_function (blocks[i])
 				  : NULL);
+	    int is_branch;
+	    CORE_ADDR branch_dest;
+	    
+	    /* This next bit of code examines the instruction at the
+	       SAL's address (pc).  If the instruction is an
+	       unconditional branch whose branch destination also
+	       refers to the same SAL, then the branch destination is
+	       used for the SAL's pc value instead.
+
+	       The pratical effect of this is that a breakpoint placed
+	       on a while loop will break at the evaluation of the
+	       condition instead of at an unconditional branch which
+	       transfers control to the starting address for the
+	       evaluation of the condition.
+
+	       Consider the following code snippet (which is taken
+	       from the test case for gdb.base/loop-break.exp.)
+
+		 while (v < 3)
+                  {
+                    v++; 
+                  }
+
+	       On x86_64, this might be compiled as follows:
+
+		 0x40059e <loop_test+14>:   jmp    0x4005af <loop_test+31>
+		 0x4005a0 <loop_test+16>:   mov    0x200a8a(%rip),%eax
+		 0x4005a6 <loop_test+22>:   add    $0x1,%eax
+		 0x4005a9 <loop_test+25>:   mov    %eax,0x200a81(%rip)
+		 0x4005af <loop_test+31>:   mov    0x200a7b(%rip),%eax
+		 0x4005b5 <loop_test+37>:   cmp    $0x2,%eax
+		 0x4005b8 <loop_test+40>:   jle    0x4005a0 <loop_test+16>
+
+	       If a breakpoint is placed on the while statement line
+	       which begins at loop_test+14, the following code causes
+	       the breakpoint to be (instead) placed on loop_test+31, which
+	       is the starting address for the evaluation of the condition.  */
+
+	    is_branch = gdbarch_unconditional_branch_destination
+	      (get_objfile_arch
+	        (SYMTAB_OBJFILE (intermediate_results.sals[i].symtab)),
+	       intermediate_results.sals[i].pc,
+	       &branch_dest);
+
+	    /* Only use branch destination if it's in the same block and
+	       is also within one of the sals from the initial list.  */
+	    if (is_branch && blocks[i]->startaddr <= branch_dest
+	        && branch_dest < blocks[i]->endaddr
+		&& addr_in_sals (branch_dest, intermediate_results.nelts,
+		                 intermediate_results.sals))
+	      {
+		intermediate_results.sals[i].pc = branch_dest;
+	      }
 
 	    if (self->funfirstline)
 	      skip_prologue_sal (&intermediate_results.sals[i]);