[2/2] Enable range stepping for ARM on GDBServer

Message ID 20160831171406.24057-2-antoine.tremblay@ericsson.com
State New, archived
Headers

Commit Message

Antoine Tremblay Aug. 31, 2016, 5:14 p.m. UTC
  This patch enables range stepping to be done in GDBServer with an ARM
target.

There is one problem however with the gdb.threads/non-stop-fair-events.exp
test.

Since single stepping is done in software and that displaced stepping is
not supported, threads end up hitting each others breakpoints and the main
thread can't make any progress passed a number of threads on my system.

Thus we get:
FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=5: thread 1
broke out of loop (timeout)

Note that even letting it go an hour doesn't help so bumping the timeout
is out of the question.

I'm not sure what to do about this one ? kfail ? ideas ?

No regressions, tested on ubuntu 14.04 ARMv7 and x86.
With gdbserver-native.

gdb/gdbserver/ChangeLog:

	* linux-arm-low.c (arm_supports_range_stepping): New function.
	(linux_target_ops the_low_target)<supports_range_stepping>: Initialize.
---
 gdb/gdbserver/linux-arm-low.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
  

Comments

Pedro Alves Aug. 31, 2016, 5:50 p.m. UTC | #1
On 08/31/2016 06:14 PM, Antoine Tremblay wrote:
> This patch enables range stepping to be done in GDBServer with an ARM
> target.
> 
> There is one problem however with the gdb.threads/non-stop-fair-events.exp
> test.
> 
> Since single stepping is done in software and that displaced stepping is
> not supported, threads end up hitting each others breakpoints and the main
> thread can't make any progress passed a number of threads on my system.
> 
> Thus we get:
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=5: thread 1
> broke out of loop (timeout)
> 
> Note that even letting it go an hour doesn't help so bumping the timeout
> is out of the question.
> 
> I'm not sure what to do about this one ? kfail ? ideas ?

Hmm, this is exactly the sort of problem the test is meant to
catch, and the reason we do event thread randomization:

 # Test that GDB in non-stop mode gives roughly equal priority to
 # events of all threads.

Why does it work without range stepping, but not with?

E.g., back when we did:

 commit 1ed415e2b9b985aac087c35949d0e1e489ab496d
 Author:     Pedro Alves <palves@redhat.com>
 AuthorDate: Wed Sep 16 15:51:36 2015 +0100

    non-stop-fair-events.exp slower on software single-step && !displ-step targets
    
    On software single-step targets that don't support displaced stepping,
    threads keep hitting each other's single-step breakpoints, and then
    GDB needs to pause all threads to step past those.  The end result is
    that progress in the main thread will be slower and it may take a bit
    longer for the signal to be queued.  This patch bumps the timeout on
    such targets.
    
    gdb/testsuite/ChangeLog:
    2015-09-16  Pedro Alves  <palves@redhat.com>
            Sandra Loosemore <sandra@codesourcery.com>
    [...]


... the test always managed to complete on sss && !displ-step targets.

Thanks,
Pedro Alves
  
Antoine Tremblay Aug. 31, 2016, 6:15 p.m. UTC | #2
Pedro Alves writes:

> On 08/31/2016 06:14 PM, Antoine Tremblay wrote:
>> This patch enables range stepping to be done in GDBServer with an ARM
>> target.
>> 
>> There is one problem however with the gdb.threads/non-stop-fair-events.exp
>> test.
>> 
>> Since single stepping is done in software and that displaced stepping is
>> not supported, threads end up hitting each others breakpoints and the main
>> thread can't make any progress passed a number of threads on my system.
>> 
>> Thus we get:
>> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=5: thread 1
>> broke out of loop (timeout)
>> 
>> Note that even letting it go an hour doesn't help so bumping the timeout
>> is out of the question.
>> 
>> I'm not sure what to do about this one ? kfail ? ideas ?
>
> Hmm, this is exactly the sort of problem the test is meant to
> catch, and the reason we do event thread randomization:
>
>  # Test that GDB in non-stop mode gives roughly equal priority to
>  # events of all threads.
>
> Why does it work without range stepping, but not with?
>

If I recall correctly GDBServer will report each stepi to GDB
without range stepping but will continue in gdbserver when range
stepping is on, thus the run control is different.

And that GDBServer's run control is not as fair as GDB's for such
situations.

Maybe it could be made to work, I remember trying a few things but after
spending quite some time on it I could not wrap my head around it. Thus
my call for ideas here.

(It's been a few weeks since I've touched this, please forgive my lack
of details)

> E.g., back when we did:
>
>  commit 1ed415e2b9b985aac087c35949d0e1e489ab496d
>  Author:     Pedro Alves <palves@redhat.com>
>  AuthorDate: Wed Sep 16 15:51:36 2015 +0100
>
>     non-stop-fair-events.exp slower on software single-step && !displ-step targets
>     
>     On software single-step targets that don't support displaced stepping,
>     threads keep hitting each other's single-step breakpoints, and then
>     GDB needs to pause all threads to step past those.  The end result is
>     that progress in the main thread will be slower and it may take a bit
>     longer for the signal to be queued.  This patch bumps the timeout on
>     such targets.
>     
>     gdb/testsuite/ChangeLog:
>     2015-09-16  Pedro Alves  <palves@redhat.com>
>             Sandra Loosemore <sandra@codesourcery.com>
>     [...]
>
>
> ... the test always managed to complete on sss && !displ-step targets.
>

But that was with GDB only right ? Since ARM is the only target with
software single step in GDBServer.
  
Pedro Alves Aug. 31, 2016, 6:39 p.m. UTC | #3
On 08/31/2016 07:15 PM, Antoine Tremblay wrote:
> 
> Pedro Alves writes:
> 
>> On 08/31/2016 06:14 PM, Antoine Tremblay wrote:
>>> This patch enables range stepping to be done in GDBServer with an ARM
>>> target.
>>>
>>> There is one problem however with the gdb.threads/non-stop-fair-events.exp
>>> test.
>>>
>>> Since single stepping is done in software and that displaced stepping is
>>> not supported, threads end up hitting each others breakpoints and the main
>>> thread can't make any progress passed a number of threads on my system.
>>>
>>> Thus we get:
>>> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=5: thread 1
>>> broke out of loop (timeout)
>>>
>>> Note that even letting it go an hour doesn't help so bumping the timeout
>>> is out of the question.
>>>
>>> I'm not sure what to do about this one ? kfail ? ideas ?
>>
>> Hmm, this is exactly the sort of problem the test is meant to
>> catch, and the reason we do event thread randomization:
>>
>>  # Test that GDB in non-stop mode gives roughly equal priority to
>>  # events of all threads.
>>
>> Why does it work without range stepping, but not with?
>>
> 
> If I recall correctly GDBServer will report each stepi to GDB
> without range stepping but will continue in gdbserver when range
> stepping is on, thus the run control is different.
> 
> And that GDBServer's run control is not as fair as GDB's for such
> situations.
> 
> Maybe it could be made to work, I remember trying a few things but after
> spending quite some time on it I could not wrap my head around it. Thus
> my call for ideas here.

It's between hard and impossible to give out ideas without some sort
of high level analysis of the typical loop of events gdbserver is getting
stuck in, causing the main thread to never be given a chance to
make progress.

It sounds like gdbserver's event starvation avoidance isn't really
working on sss targets with range stepping enabled.  E.g., are we
doing the randomization too late?

Thanks,
Pedro Alves
  
Antoine Tremblay Aug. 31, 2016, 7:14 p.m. UTC | #4
Pedro Alves writes:

> On 08/31/2016 07:15 PM, Antoine Tremblay wrote:
>> 
>> Pedro Alves writes:
>> 
>>> On 08/31/2016 06:14 PM, Antoine Tremblay wrote:
>>>> This patch enables range stepping to be done in GDBServer with an ARM
>>>> target.
>>>>
>>>> There is one problem however with the gdb.threads/non-stop-fair-events.exp
>>>> test.
>>>>
>>>> Since single stepping is done in software and that displaced stepping is
>>>> not supported, threads end up hitting each others breakpoints and the main
>>>> thread can't make any progress passed a number of threads on my system.
>>>>
>>>> Thus we get:
>>>> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=5: thread 1
>>>> broke out of loop (timeout)
>>>>
>>>> Note that even letting it go an hour doesn't help so bumping the timeout
>>>> is out of the question.
>>>>
>>>> I'm not sure what to do about this one ? kfail ? ideas ?
>>>
>>> Hmm, this is exactly the sort of problem the test is meant to
>>> catch, and the reason we do event thread randomization:
>>>
>>>  # Test that GDB in non-stop mode gives roughly equal priority to
>>>  # events of all threads.
>>>
>>> Why does it work without range stepping, but not with?
>>>
>> 
>> If I recall correctly GDBServer will report each stepi to GDB
>> without range stepping but will continue in gdbserver when range
>> stepping is on, thus the run control is different.
>> 
>> And that GDBServer's run control is not as fair as GDB's for such
>> situations.
>> 
>> Maybe it could be made to work, I remember trying a few things but after
>> spending quite some time on it I could not wrap my head around it. Thus
>> my call for ideas here.
>
> It's between hard and impossible to give out ideas without some sort
> of high level analysis of the typical loop of events gdbserver is getting
> stuck in, causing the main thread to never be given a chance to
> make progress.
>

I agree, my point is that after spending days analyzing the thing, I
can't make sense of it in resonable time and it's complicated enought
that describing the process in an email and asking for help on some
particular pieces of code doesn't seems like a process that would work.

So by ideas I mean, if someone has checked that out already, or is ready
to look into it from scratch or there's a reason that this should not be
relevant to range stepping on ARM.

> It sounds like gdbserver's event starvation avoidance isn't really
> working on sss targets with range stepping enabled.  E.g., are we
> doing the randomization too late?
>

I would need to get back into it for a few days which I can't do at the
moment to answer that. From what I recall the way the events are
selected made it so that the signal was never delivered.

I'm sorry I can't be more helpful at the moment but I wanted to post
this issue before I have to leave for a while.

However I wonder if range stepping or ARM depends on this of if we
should treat it as two different issues ?

Thank you,
Antoine
  
Pedro Alves Sept. 1, 2016, 1:37 p.m. UTC | #5
On 08/31/2016 08:14 PM, Antoine Tremblay wrote:

> I'm sorry I can't be more helpful at the moment but I wanted to post
> this issue before I have to leave for a while.

Understood.  Does enabling range stepping unblock something else?

> However I wonder if range stepping or ARM depends on this of if we
> should treat it as two different issues ?

Offhand, the knee-jerk reaction is that if enabling range stepping
causes a regression, then it sounds like range stepping has a
problem that should be fixed and it may be premature to enable it.

I see a parallel here with all the all-stop-on-top-of-non-stop
work, which exposed a ton of such latent problems that were treated
as dependencies that needed to be addressed first.  That's what
resulted in the creation of this test (see 'git log ede9f622af1f').

as-ns is enabled by default on native, but not on remote.  It sounds
like testing with as-ns enabled on remote could reveal the same
range stepping problems, but all over the testsuite instead.  :-/

Thanks,
Pedro Alves
  
Antoine Tremblay Sept. 1, 2016, 3:21 p.m. UTC | #6
Pedro Alves writes:

> On 08/31/2016 08:14 PM, Antoine Tremblay wrote:
>
>> I'm sorry I can't be more helpful at the moment but I wanted to post
>> this issue before I have to leave for a while.
>
> Understood.  Does enabling range stepping unblock something else?

It would unblock ARM tracepoints, as per Yao's requirements...

>
>> However I wonder if range stepping or ARM depends on this of if we
>> should treat it as two different issues ?
>
> Offhand, the knee-jerk reaction is that if enabling range stepping
> causes a regression, then it sounds like range stepping has a
> problem that should be fixed and it may be premature to enable it.
>
> I see a parallel here with all the all-stop-on-top-of-non-stop
> work, which exposed a ton of such latent problems that were treated
> as dependencies that needed to be addressed first.  That's what
> resulted in the creation of this test (see 'git log ede9f622af1f').
>
> as-ns is enabled by default on native, but not on remote.  It sounds
> like testing with as-ns enabled on remote could reveal the same
> range stepping problems, but all over the testsuite instead.  :-/
>

I see, in that sense we could consider it unsupported for now in remote
and fix it along the rest of the issues as non-stop gains support ?

Yao any comments on this?

Thanks,
Antoine
  
Pedro Alves Sept. 1, 2016, 3:59 p.m. UTC | #7
On 09/01/2016 04:21 PM, Antoine Tremblay wrote:
> 
> Pedro Alves writes:
> 
>> On 08/31/2016 08:14 PM, Antoine Tremblay wrote:
>>
>>> I'm sorry I can't be more helpful at the moment but I wanted to post
>>> this issue before I have to leave for a while.
>>
>> Understood.  Does enabling range stepping unblock something else?
> 
> It would unblock ARM tracepoints, as per Yao's requirements...

Tracepoints make gdbserver single-step and then not report the event
to gdb, so I do see the parallel with range-stepping.  Throwing
while-stepping into the equation would make it even more clear.

But maybe we can paralyze?  If enabling tracepoints without range
stepping causes no known regression, but enabling range stepping with
no tracepoints causes regressions, seems to me like we could put
tracepoints in first, and fix whatever range stepping problems
in parallel.

Skipping the test sounds far from ideal to me, since the test has a
tendency of catching problems.  Witness patch 1/2 in this very
series, for example...

Thanks,
Pedro Alves
  
Antoine Tremblay Sept. 1, 2016, 4:46 p.m. UTC | #8
Pedro Alves writes:

> On 09/01/2016 04:21 PM, Antoine Tremblay wrote:
>> 
>> Pedro Alves writes:
>> 
>>> On 08/31/2016 08:14 PM, Antoine Tremblay wrote:
>>>
>>>> I'm sorry I can't be more helpful at the moment but I wanted to post
>>>> this issue before I have to leave for a while.
>>>
>>> Understood.  Does enabling range stepping unblock something else?
>> 
>> It would unblock ARM tracepoints, as per Yao's requirements...
>
> Tracepoints make gdbserver single-step and then not report the event
> to gdb, so I do see the parallel with range-stepping.  Throwing
> while-stepping into the equation would make it even more clear.
>
> But maybe we can paralyze?  If enabling tracepoints without range
> stepping causes no known regression, but enabling range stepping with
> no tracepoints causes regressions, seems to me like we could put
> tracepoints in first, and fix whatever range stepping problems
> in parallel.
>

I would totally agree with that. (tracepoints do not cause any
regressions without range stepping)

Yao ?

> Skipping the test sounds far from ideal to me, since the test has a
> tendency of catching problems.  Witness patch 1/2 in this very
> series, for example...

Indeed.

Thanks,
Antoine
  
Yao Qi Sept. 18, 2016, 7:58 p.m. UTC | #9
On Wed, Aug 31, 2016 at 7:39 PM, Pedro Alves <palves@redhat.com> wrote:
> On 08/31/2016 07:15 PM, Antoine Tremblay wrote:
>
> It sounds like gdbserver's event starvation avoidance isn't really
> working on sss targets with range stepping enabled.  E.g., are we
> doing the randomization too late?
>

We only randomize the events to be reported to GDB, however, we
don't randomize the pending events.  Every time, GDBserver select
pending events from the first one in thread list, threads in the end of
the list are likely starved.  This issue can be fixed by introducing pending
events randomization.

My patches are ready, and being regression tested.  During the
regression tests, some other issues are found (without my patches), so
I need to take a look at them first.
  

Patch

diff --git a/gdb/gdbserver/linux-arm-low.c b/gdb/gdbserver/linux-arm-low.c
index e1261e5..cadda98 100644
--- a/gdb/gdbserver/linux-arm-low.c
+++ b/gdb/gdbserver/linux-arm-low.c
@@ -943,6 +943,14 @@  arm_gdbserver_get_next_pcs (struct regcache *regcache)
   return next_pcs;
 }
 
+/* Support for range stepping.  */
+
+static int
+arm_supports_range_stepping (void)
+{
+  return 1;
+}
+
 /* Support for hardware single step.  */
 
 static int
@@ -1063,7 +1071,7 @@  struct linux_target_ops the_low_target = {
   NULL, /* install_fast_tracepoint_jump_pad */
   NULL, /* emit_ops */
   NULL, /* get_min_fast_tracepoint_insn_len */
-  NULL, /* supports_range_stepping */
+  arm_supports_range_stepping, /* supports_range_stepping */
   arm_breakpoint_kind_from_current_state,
   arm_supports_hardware_single_step,
   arm_get_syscall_trapinfo,