gdb/record-full: disable range stepping when resuming threads

Message ID 20230427185407.203300-1-simon.marchi@efficios.com
State New
Headers
Series gdb/record-full: disable range stepping when resuming threads |

Commit Message

Simon Marchi April 27, 2023, 6:54 p.m. UTC
  I see these failures, when running with the native-gdbserver of
native-extended-gdbserver boards:

    Running /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.reverse/finish-reverse-next.exp ...
    FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 1 LEP from function body
    FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 2 at b = 5, from function body
    FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 1 GEP call from function body
    FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 2 at b = 50 from function body

Let's use this simpler program to illustrate the problem:

    int main()
    {
      int a = 362;
      a = a * 17;
      return a;
    }

It compiles down to:

    int a = 362;
    401689:       c7 45 fc 6a 01 00 00    movl   $0x16a,-0x4(%rbp)
    a = a * 17;
    401690:       8b 55 fc                mov    -0x4(%rbp),%edx
    401693:       89 d0                   mov    %edx,%eax
    401695:       c1 e0 04                shl    $0x4,%eax
    401698:       01 d0                   add    %edx,%eax
    40169a:       89 45 fc                mov    %eax,-0x4(%rbp)
    return a;
    40169d:       8b 45 fc                mov    -0x4(%rbp),%eax

When single stepping these lines, debugging locally, while recording,
these are the recorded instructions (basically one for each instruction
shown above):

    (gdb) maintenance print record-instruction 0
    4 bytes of memory at address 0x00007fffffffdc5c changed from: 6a 01 00 00
    Register rip changed: (void (*)()) 0x40169a <main+21>
    (gdb) maintenance print record-instruction -1
    Register rax changed: 5792
    Register eflags changed: [ PF AF IF ]
    Register rip changed: (void (*)()) 0x401698 <main+19>
    (gdb) maintenance print record-instruction -2
    Register rax changed: 362
    Register eflags changed: [ PF ZF IF ]
    Register rip changed: (void (*)()) 0x401695 <main+16>
    (gdb) maintenance print record-instruction -3
    Register rax changed: 4200069
    Register rip changed: (void (*)()) 0x401693 <main+14>
    (gdb) maintenance print record-instruction -4
    Register rdx changed: 140737488346696
    Register rip changed: (void (*)()) 0x401690 <main+11>
    (gdb) maintenance print record-instruction -5
    4 bytes of memory at address 0x00007fffffffdc5c changed from: 00 00 00 00
    Register rip changed: (void (*)()) 0x401689 <main+4>
    (gdb) maintenance print record-instruction -6
    Not enough recorded history

But when debugging remotely:

    (gdb) maintenance print record-instruction 0
    Register rdx changed: 140737488346728
    Register rip changed: (void (*)()) 0x401690 <main+11>
    (gdb) maintenance print record-instruction -1
    4 bytes of memory at address 0x00007fffffffdc7c changed from: 00 00 00 00
    Register rip changed: (void (*)()) 0x401689 <main+4>
    (gdb) maintenance print record-instruction -2
    Not enough recorded history

In this list, we only have entries for the beginning of each line.  This
is because of the remote target's support for range stepping.  The
record-full layer can only record instructions when the underlying
process target reports a stop.  With range stepping, the remote target
single-steps multiple instructions at a time, so the record-full target
doesn't get to see and record them all.

Fix this by making the record-full layer disable range-stepping
before handing the resume request to the beneath layer, forcing the
remote target to report stops for each instruction.

Change-Id: Ia95ea62720bbcd0b6536a904360ffbf839eb823d
---
 gdb/record-full.c | 7 +++++++
 1 file changed, 7 insertions(+)
  

Comments

Keith Seitz April 28, 2023, 5:33 p.m. UTC | #1
On 4/27/23 11:54, Simon Marchi via Gdb-patches wrote:
> I see these failures, when running with the native-gdbserver of
> native-extended-gdbserver boards:
> 
>      Running /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.reverse/finish-reverse-next.exp ...
>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 1 LEP from function body
>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 2 at b = 5, from function body
>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 1 GEP call from function body
>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 2 at b = 50 from function body
> 

Thank you for the explanation. It was simple enough for me to understand. :-)

> ---
>   gdb/record-full.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/gdb/record-full.c b/gdb/record-full.c
> index 15c5b7d682ed..026c309b674c 100644
> --- a/gdb/record-full.c
> +++ b/gdb/record-full.c
> @@ -1094,6 +1094,13 @@ record_full_target::resume (ptid_t ptid, int step, enum gdb_signal signal)
>         /* Make sure the target beneath reports all signals.  */
>         target_pass_signals ({});
>   
> +      /* Disable range-stepping, forcing the process target to report stops for
> +	 all executed instructions, so we can record them all.  */
> +      process_stratum_target *proc_target
> +	= current_inferior ()->process_target ();
> +      for (thread_info *thread : all_non_exited_threads (proc_target, ptid))
> +	thread->control.may_range_step = 0;
> +
>         this->beneath ()->resume (ptid, step, signal);
>       }
>   }

This makes sense to me, and I've also regression tested it here and detected no
problems. [You likely have, too, but it's my habit to test (a lot).]

Keith
  
Simon Marchi April 28, 2023, 6:29 p.m. UTC | #2
On 4/28/23 13:33, Keith Seitz wrote:
> On 4/27/23 11:54, Simon Marchi via Gdb-patches wrote:
>> I see these failures, when running with the native-gdbserver of
>> native-extended-gdbserver boards:
>>
>>      Running /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.reverse/finish-reverse-next.exp ...
>>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 1 LEP from function body
>>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 2 at b = 5, from function body
>>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 1 GEP call from function body
>>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 2 at b = 50 from function body
>>
> 
> Thank you for the explanation. It was simple enough for me to understand. :-)
> 
>> ---
>>   gdb/record-full.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/gdb/record-full.c b/gdb/record-full.c
>> index 15c5b7d682ed..026c309b674c 100644
>> --- a/gdb/record-full.c
>> +++ b/gdb/record-full.c
>> @@ -1094,6 +1094,13 @@ record_full_target::resume (ptid_t ptid, int step, enum gdb_signal signal)
>>         /* Make sure the target beneath reports all signals.  */
>>         target_pass_signals ({});
>>   +      /* Disable range-stepping, forcing the process target to report stops for
>> +     all executed instructions, so we can record them all.  */
>> +      process_stratum_target *proc_target
>> +    = current_inferior ()->process_target ();
>> +      for (thread_info *thread : all_non_exited_threads (proc_target, ptid))
>> +    thread->control.may_range_step = 0;
>> +
>>         this->beneath ()->resume (ptid, step, signal);
>>       }
>>   }
> 
> This makes sense to me, and I've also regression tested it here and detected no
> problems. [You likely have, too, but it's my habit to test (a lot).]

Hi Keith,

Thanks for taking a look.  I'm not 100% sure about the change, but given
that:

 - it makes sense to you
 - it is isolated in the record-full code
 - record-full is somewhat on life support
 - it fixes some annoying failures I'd like to see disappear

... I'll go ahead and push this patch.  If someone thinks there is a
better way to do it, I'll be happy to revisit the fix.

Simon
  
Guinevere Larsen May 2, 2023, 2:27 p.m. UTC | #3
On 27/04/2023 20:54, Simon Marchi via Gdb-patches wrote:
> I see these failures, when running with the native-gdbserver of
> native-extended-gdbserver boards:
>
>      Running /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.reverse/finish-reverse-next.exp ...
>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 1 LEP from function body
>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 2 at b = 5, from function body
>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 1 GEP call from function body
>      FAIL: gdb.reverse/finish-reverse-next.exp: reverse next 2 at b = 50 from function body
>
> Let's use this simpler program to illustrate the problem:
>
>      int main()
>      {
>        int a = 362;
>        a = a * 17;
>        return a;
>      }
>
> It compiles down to:
>
>      int a = 362;
>      401689:       c7 45 fc 6a 01 00 00    movl   $0x16a,-0x4(%rbp)
>      a = a * 17;
>      401690:       8b 55 fc                mov    -0x4(%rbp),%edx
>      401693:       89 d0                   mov    %edx,%eax
>      401695:       c1 e0 04                shl    $0x4,%eax
>      401698:       01 d0                   add    %edx,%eax
>      40169a:       89 45 fc                mov    %eax,-0x4(%rbp)
>      return a;
>      40169d:       8b 45 fc                mov    -0x4(%rbp),%eax
>
> When single stepping these lines, debugging locally, while recording,
> these are the recorded instructions (basically one for each instruction
> shown above):
>
>      (gdb) maintenance print record-instruction 0
>      4 bytes of memory at address 0x00007fffffffdc5c changed from: 6a 01 00 00
>      Register rip changed: (void (*)()) 0x40169a <main+21>
>      (gdb) maintenance print record-instruction -1
>      Register rax changed: 5792
>      Register eflags changed: [ PF AF IF ]
>      Register rip changed: (void (*)()) 0x401698 <main+19>
>      (gdb) maintenance print record-instruction -2
>      Register rax changed: 362
>      Register eflags changed: [ PF ZF IF ]
>      Register rip changed: (void (*)()) 0x401695 <main+16>
>      (gdb) maintenance print record-instruction -3
>      Register rax changed: 4200069
>      Register rip changed: (void (*)()) 0x401693 <main+14>
>      (gdb) maintenance print record-instruction -4
>      Register rdx changed: 140737488346696
>      Register rip changed: (void (*)()) 0x401690 <main+11>
>      (gdb) maintenance print record-instruction -5
>      4 bytes of memory at address 0x00007fffffffdc5c changed from: 00 00 00 00
>      Register rip changed: (void (*)()) 0x401689 <main+4>
>      (gdb) maintenance print record-instruction -6
>      Not enough recorded history
>
> But when debugging remotely:
>
>      (gdb) maintenance print record-instruction 0
>      Register rdx changed: 140737488346728
>      Register rip changed: (void (*)()) 0x401690 <main+11>
>      (gdb) maintenance print record-instruction -1
>      4 bytes of memory at address 0x00007fffffffdc7c changed from: 00 00 00 00
>      Register rip changed: (void (*)()) 0x401689 <main+4>
>      (gdb) maintenance print record-instruction -2
>      Not enough recorded history
>
> In this list, we only have entries for the beginning of each line.  This
> is because of the remote target's support for range stepping.  The
> record-full layer can only record instructions when the underlying
> process target reports a stop.  With range stepping, the remote target
> single-steps multiple instructions at a time, so the record-full target
> doesn't get to see and record them all.
>
> Fix this by making the record-full layer disable range-stepping
> before handing the resume request to the beneath layer, forcing the
> remote target to report stops for each instruction.
>
> Change-Id: Ia95ea62720bbcd0b6536a904360ffbf839eb823d
> ---
>   gdb/record-full.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/gdb/record-full.c b/gdb/record-full.c
> index 15c5b7d682ed..026c309b674c 100644
> --- a/gdb/record-full.c
> +++ b/gdb/record-full.c
> @@ -1094,6 +1094,13 @@ record_full_target::resume (ptid_t ptid, int step, enum gdb_signal signal)
>         /* Make sure the target beneath reports all signals.  */
>         target_pass_signals ({});
>   
> +      /* Disable range-stepping, forcing the process target to report stops for
> +	 all executed instructions, so we can record them all.  */
> +      process_stratum_target *proc_target
> +	= current_inferior ()->process_target ();
> +      for (thread_info *thread : all_non_exited_threads (proc_target, ptid))
> +	thread->control.may_range_step = 0;
> +
>         this->beneath ()->resume (ptid, step, signal);
>       }
>   }

Hey! Nice catch on this one, with this change you also managed to fix record/29745 (https://sourceware.org/bugzilla/show_bug.cgi?id=29745).

For future improvement, I'm wondering: is gdbserver aware that the execution is being recorded at all? That way it could record the execution by itself and once the stop is reported, it sends the recorded instructions (which I hope should be faster). I'm asking before adding it to my backlog, just in case it is very difficult to do.
  
Tom Tromey May 2, 2023, 3:03 p.m. UTC | #4
>>>>> "Bruno" == Bruno Larsen via Gdb-patches <gdb-patches@sourceware.org> writes:

Bruno> For future improvement, I'm wondering: is gdbserver aware that the
Bruno> execution is being recorded at all?

I don't think so, or at least, not for full record.  (It seems to have
some btrace knowledge.)

Bruno> That way it could record the
Bruno> execution by itself and once the stop is reported, it sends the
Bruno> recorded instructions (which I hope should be faster). I'm asking
Bruno> before adding it to my backlog, just in case it is very difficult to
Bruno> do.

I guess at minimum it would require making it possible to build the
record machinery into gdbserver, and also adding the needed requests to
the remote protocol.

Tom
  
Guinevere Larsen May 2, 2023, 3:57 p.m. UTC | #5
On 02/05/2023 17:03, Tom Tromey wrote:
>>>>>> "Bruno" == Bruno Larsen via Gdb-patches <gdb-patches@sourceware.org> writes:
> Bruno> For future improvement, I'm wondering: is gdbserver aware that the
> Bruno> execution is being recorded at all?
>
> I don't think so, or at least, not for full record.  (It seems to have
> some btrace knowledge.)
>
> Bruno> That way it could record the
> Bruno> execution by itself and once the stop is reported, it sends the
> Bruno> recorded instructions (which I hope should be faster). I'm asking
> Bruno> before adding it to my backlog, just in case it is very difficult to
> Bruno> do.
>
> I guess at minimum it would require making it possible to build the
> record machinery into gdbserver, and also adding the needed requests to
> the remote protocol.

Ah, that's unfortunate. I was hoping it would already bt there and this 
would've mostly been a protocol addition. Oh well... thanks for taking 
the time to answer :-)
  
Simon Marchi May 2, 2023, 6:03 p.m. UTC | #6
On 5/2/23 11:57, Bruno Larsen wrote:
> On 02/05/2023 17:03, Tom Tromey wrote:
>>>>>>> "Bruno" == Bruno Larsen via Gdb-patches <gdb-patches@sourceware.org> writes:
>> Bruno> For future improvement, I'm wondering: is gdbserver aware that the
>> Bruno> execution is being recorded at all?
>>
>> I don't think so, or at least, not for full record.  (It seems to have
>> some btrace knowledge.)
>>
>> Bruno> That way it could record the
>> Bruno> execution by itself and once the stop is reported, it sends the
>> Bruno> recorded instructions (which I hope should be faster). I'm asking
>> Bruno> before adding it to my backlog, just in case it is very difficult to
>> Bruno> do.
>>
>> I guess at minimum it would require making it possible to build the
>> record machinery into gdbserver, and also adding the needed requests to
>> the remote protocol.
> 
> Ah, that's unfortunate. I was hoping it would already bt there and this would've mostly been a protocol addition. Oh well... thanks for taking the time to answer :-)

Also, I think that the built-in record mechanism (the record-full
target) is mostly obsolete at this point.  It hasn't been updated in
ages to support new CPU instructions, and doesn't support multi-threaded
programs.  To anybody using the record-full target, I would suggest
trying out something like rr [1] instead.

Simon

[1] https://rr-project.org/
  

Patch

diff --git a/gdb/record-full.c b/gdb/record-full.c
index 15c5b7d682ed..026c309b674c 100644
--- a/gdb/record-full.c
+++ b/gdb/record-full.c
@@ -1094,6 +1094,13 @@  record_full_target::resume (ptid_t ptid, int step, enum gdb_signal signal)
       /* Make sure the target beneath reports all signals.  */
       target_pass_signals ({});
 
+      /* Disable range-stepping, forcing the process target to report stops for
+	 all executed instructions, so we can record them all.  */
+      process_stratum_target *proc_target
+	= current_inferior ()->process_target ();
+      for (thread_info *thread : all_non_exited_threads (proc_target, ptid))
+	thread->control.may_range_step = 0;
+
       this->beneath ()->resume (ptid, step, signal);
     }
 }