Message ID | 20161129120702.9490-2-antoine.tremblay@ericsson.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 26822 invoked by alias); 29 Nov 2016 12:07:44 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 26795 invoked by uid 89); 29 Nov 2016 12:07:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=stepped, null_ptid, instuction, sk:step_ov X-HELO: usplmg21.ericsson.net Received: from usplmg21.ericsson.net (HELO usplmg21.ericsson.net) (198.24.6.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Nov 2016 12:07:31 +0000 Received: from EUSAAHC003.ericsson.se (Unknown_Domain [147.117.188.81]) by (Symantec Mail Security) with SMTP id 22.2C.02571.D0B1D385; Tue, 29 Nov 2016 07:07:10 +0100 (CET) Received: from elxa4wqvvz1.ca.am.ericsson.se (147.117.188.8) by smtps-am.internal.ericsson.com (147.117.188.81) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 29 Nov 2016 07:07:15 -0500 From: Antoine Tremblay <antoine.tremblay@ericsson.com> To: <gdb-patches@sourceware.org> CC: Antoine Tremblay <antoine.tremblay@ericsson.com> Subject: [PATCH 2/2] Avoid step-over infinite loop in GDBServer Date: Tue, 29 Nov 2016 07:07:02 -0500 Message-ID: <20161129120702.9490-2-antoine.tremblay@ericsson.com> In-Reply-To: <20161129120702.9490-1-antoine.tremblay@ericsson.com> References: <20161129120702.9490-1-antoine.tremblay@ericsson.com> MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes |
Commit Message
Antoine Tremblay
Nov. 29, 2016, 12:07 p.m. UTC
Before this patch, GDBServer always executed a step-over if it found a thread that needed one. This could be a problem in a situation exposed by non-stop-fair-events.exp where the code and the breakpoint placement is like so: instruction A : has a single-step breakpoint installed for thread 1 and 2 instruction B : has a single-step breakpoint installed for thread 3 and is a branch to A. In this particular case: - GDBServer stops on instruction A in thread 1. - Deletes thread 1 single-step breakpoint. - Starts a step-over of thread 1 to step-over the thread 2 breakpoint. - GDBServer finishes a step-over and is at instruction B. - GDBserver starts a step-over of thread 1 to step-over the thread 3 breakpoint at instruction B. - GDBServer stops on instuction A in thread 1. - GDBServer is now in an infinite loop. This patch avoids this issue by counting the number of times a thread does a step-over consecutively. If the thread reaches a threshold, which is currently 32, GDBServer will not step-over but rather restart all the threads. I chose a threshold of 32, so to trigger this there needs to be 32 consecutive instructions with breakpoints installed that one thread needs to step over. I think this should be rare enought to trigger only in this case but suggestions on the threshold value are welcome. If the threshold is hit and the step-over is delayed, when the thread that needed a step-over runs it will simply hit the breakpoint it needed to step-over and retry to start a step-over. This makes it possible for other threads to run and start a step-over dance of their own or pending events like signals to be handled. This patch fixes the intermittent FAILs for gdb.threads/non-stop-fair-events.exp on ARM like discussed here: https://sourceware.org/ml/gdb-patches/2016-11/msg00132.html No regressions, tested on ubuntu 14.04 ARMv7. With gdbserver-native/-m{arm,thumb} gdb/gdbserver/ChangeLog: * linux-low.c (class step_over_limiter): New class. (_step_over_limiter): New global variable. (linux_wait_1): Count step-overs. (proceed_all_lwps): Delay step-over if threshold is reached. --- gdb/gdbserver/linux-low.c | 76 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 70 insertions(+), 6 deletions(-)
Comments
Ping. Antoine Tremblay writes: > Before this patch, GDBServer always executed a step-over if it found a > thread that needed one. > > This could be a problem in a situation exposed by non-stop-fair-events.exp > where the code and the breakpoint placement is like so: > > instruction A : has a single-step breakpoint installed for thread 1 and 2 > instruction B : has a single-step breakpoint installed for thread 3 > and is a branch to A. > > In this particular case: > > - GDBServer stops on instruction A in thread 1. > - Deletes thread 1 single-step breakpoint. > - Starts a step-over of thread 1 to step-over the thread 2 breakpoint. > - GDBServer finishes a step-over and is at instruction B. > - GDBserver starts a step-over of thread 1 to step-over the thread 3 > breakpoint at instruction B. > - GDBServer stops on instuction A in thread 1. > - GDBServer is now in an infinite loop. > > This patch avoids this issue by counting the number of times a thread does > a step-over consecutively. If the thread reaches a threshold, which is > currently 32, GDBServer will not step-over but rather restart all the > threads. > > I chose a threshold of 32, so to trigger this there needs to be 32 > consecutive instructions with breakpoints installed that one thread needs > to step over. I think this should be rare enought to trigger only in this > case but suggestions on the threshold value are welcome. > > If the threshold is hit and the step-over is delayed, when the thread > that needed a step-over runs it will simply hit the breakpoint it needed > to step-over and retry to start a step-over. > > This makes it possible for other threads to run and start a step-over > dance of their own or pending events like signals to be handled. > > This patch fixes the intermittent FAILs for > gdb.threads/non-stop-fair-events.exp on ARM like discussed here: > https://sourceware.org/ml/gdb-patches/2016-11/msg00132.html > > No regressions, tested on ubuntu 14.04 ARMv7. > With gdbserver-native/-m{arm,thumb} > > gdb/gdbserver/ChangeLog: > > * linux-low.c (class step_over_limiter): New class. > (_step_over_limiter): New global variable. > (linux_wait_1): Count step-overs. > (proceed_all_lwps): Delay step-over if threshold is reached. > --- > gdb/gdbserver/linux-low.c | 76 +++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 70 insertions(+), 6 deletions(-) > > diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c > index 15fb726..b84b62a 100644 > --- a/gdb/gdbserver/linux-low.c > +++ b/gdb/gdbserver/linux-low.c > @@ -282,6 +282,53 @@ static int proceed_one_lwp (struct inferior_list_entry *entry, void *except); > being stepped. */ > ptid_t step_over_bkpt; > > +/* Class limiting the number of consecutive step-overs for a thread. */ > + > +class step_over_limiter > +{ > + public: > + > + step_over_limiter () : m_ptid (null_ptid), m_count (0), m_max_count (32) {} > + > + void step_over_done (struct lwp_info *lwp) > + { > + ptid_t ptid = lwp->thread->entry.id; > + > + if (!ptid_equal (ptid, m_ptid)) > + { > + m_ptid = ptid; > + m_count = 0; > + } > + > + m_count++; > + } > + > + bool can_step_over (struct lwp_info *lwp) > + { > + if (!ptid_equal(lwp->thread->entry.id, m_ptid) > + || m_count < m_max_count) > + return true; > + else > + { > + /* Reset here so that the step_over is retried. */ > + m_ptid = null_ptid; > + m_count = 0; > + return false; > + } > + } > + > + private: > + > + ptid_t m_ptid; > + int m_count; > + > + /* Maximum step overs for a thread, before all threads are run instead of > + stepping over. */ > + const int m_max_count; > +}; > + > +step_over_limiter _step_over_limiter; > + > /* True if the low target can hardware single-step. */ > > static int > @@ -3607,6 +3654,8 @@ linux_wait_1 (ptid_t ptid, > doesn't lose it. */ > enqueue_pending_signal (event_child, WSTOPSIG (w), info_p); > > + _step_over_limiter.step_over_done (event_child); > + > proceed_all_lwps (); > } > else > @@ -3694,6 +3743,8 @@ linux_wait_1 (ptid_t ptid, > We're going to keep waiting, so use proceed, which > handles stepping over the next breakpoint. */ > unsuspend_all_lwps (event_child); > + > + _step_over_limiter.step_over_done (event_child); > } > else > { > @@ -5400,13 +5451,26 @@ proceed_all_lwps (void) > > if (need_step_over != NULL) > { > - if (debug_threads) > - debug_printf ("proceed_all_lwps: found " > - "thread %ld needing a step-over\n", > - lwpid_of (need_step_over)); > + /* Don't step over if we're looping too much. */ > + if (_step_over_limiter.can_step_over > + (get_thread_lwp (need_step_over))) > + { > + if (debug_threads) > + debug_printf ("proceed_all_lwps: found " > + "thread %ld needing a step-over\n", > + lwpid_of (need_step_over)); > > - start_step_over (get_thread_lwp (need_step_over)); > - return; > + start_step_over (get_thread_lwp (need_step_over)); > + return; > + } > + else > + { > + if (debug_threads) > + debug_printf ("proceed_all_lwps: found " > + "thread %ld needing a step-over " > + "but max consecutive step-overs reached\n", > + lwpid_of (need_step_over)); > + } > } > }
Antoine Tremblay writes: > Ping. > FYI, I just realised that this can break traceframes and breakpoint counts, I'll try to fix that or find a better way.
On 11/29/2016 12:07 PM, Antoine Tremblay wrote: > - GDBServer stops on instruction A in thread 1. > - Deletes thread 1 single-step breakpoint. > - Starts a step-over of thread 1 to step-over the thread 2 breakpoint. > - GDBServer finishes a step-over and is at instruction B. > - GDBserver starts a step-over of thread 1 to step-over the thread 3 > breakpoint at instruction B. > - GDBServer stops on instuction A in thread 1. > - GDBServer is now in an infinite loop. This sounds to me very much like a fairness issue. There were three threads stopped that needed to move past a breakpoint, but gdbserver always picks thread 1. Why? Thanks, Pedro Alves
Pedro Alves writes: > On 11/29/2016 12:07 PM, Antoine Tremblay wrote: >> - GDBServer stops on instruction A in thread 1. >> - Deletes thread 1 single-step breakpoint. >> - Starts a step-over of thread 1 to step-over the thread 2 breakpoint. >> - GDBServer finishes a step-over and is at instruction B. >> - GDBserver starts a step-over of thread 1 to step-over the thread 3 >> breakpoint at instruction B. >> - GDBServer stops on instuction A in thread 1. >> - GDBServer is now in an infinite loop. > > This sounds to me very much like a fairness issue. There were > three threads stopped that needed to move past a breakpoint, but > gdbserver always picks thread 1. Why? It is a fairness issue but not between the stepping over threads, gdbserver could pick any thread rather than thread 1 and still get into the same loop. The problem is one of fairness between stepping over threads and non-stepping-over threads. For example if all the instructions in the thread's execution path have a breakpoint on it such that this thread always needs to step-over and that the only way to break that loop is to let a thread that doesn't need a step-over run then we're in an infinite loop since GDBServer always gives precedence to the threads needing a step-over. Thus this patch created more fairness by giving non-stepping over threads a chance to run. But it doesn't work... I'm not sure this can be fixed it might be more a problem like putting a breakpoint on a mutex and wanting a thread that depends on that mutex to continue... so more a user problem... but non-stop-fair-events is done in a way that this can happen... Ideas are welcome :)
On Tue, Nov 29, 2016 at 12:07 PM, Antoine Tremblay <antoine.tremblay@ericsson.com> wrote: > Before this patch, GDBServer always executed a step-over if it found a > thread that needed one. > > This could be a problem in a situation exposed by non-stop-fair-events.exp > where the code and the breakpoint placement is like so: > > instruction A : has a single-step breakpoint installed for thread 1 and 2 > instruction B : has a single-step breakpoint installed for thread 3 > and is a branch to A. > Is instruction B following instruction A? Is it like .L1: nop b .L1 > In this particular case: > > - GDBServer stops on instruction A in thread 1. > - Deletes thread 1 single-step breakpoint. > - Starts a step-over of thread 1 to step-over the thread 2 breakpoint. > - GDBServer finishes a step-over and is at instruction B. > - GDBserver starts a step-over of thread 1 to step-over the thread 3 > breakpoint at instruction B. Why does GDBserver starts a step-over again? is it because need_step_over_p doing checks like this, if (breakpoint_here (pc) || fast_tracepoint_jump_here (pc)) { /* Don't step over a breakpoint that GDB expects to hit though. If the condition is being evaluated on the target's side and it evaluate to false, step over this breakpoint as well. */ if (gdb_breakpoint_here (pc) && gdb_condition_true_at_breakpoint (pc) && gdb_no_commands_at_breakpoint (pc)) { if (debug_threads) debug_printf ("Need step over [LWP %ld]? yes, but found" " GDB breakpoint at 0x%s; skipping step over\n", lwpid_of (thread), paddress (pc)); current_thread = saved_thread; return 0; } else { if (debug_threads) debug_printf ("Need step over [LWP %ld]? yes, " "found breakpoint at 0x%s\n", lwpid_of (thread), paddress (pc)); /* We've found an lwp that needs stepping over --- return 1 so that find_inferior stops looking. */ current_thread = saved_thread; return 1; } } there is a single step breakpoint on pc, and it is obviously not a gdb breakpoint, so 1 is returned. > - GDBServer stops on instuction A in thread 1. > - GDBServer is now in an infinite loop. > I am wondering can we take the information that we've already step over a breakpoint for thread A into need_step_over_p, if we see pc is on another single step breakpoint for thread B, don't do step over.
Yao Qi writes: > On Tue, Nov 29, 2016 at 12:07 PM, Antoine Tremblay > <antoine.tremblay@ericsson.com> wrote: >> Before this patch, GDBServer always executed a step-over if it found a >> thread that needed one. >> >> This could be a problem in a situation exposed by non-stop-fair-events.exp >> where the code and the breakpoint placement is like so: >> >> instruction A : has a single-step breakpoint installed for thread 1 and 2 >> instruction B : has a single-step breakpoint installed for thread 3 >> and is a branch to A. >> > > Is instruction B following instruction A? Is it like > > .L1: > nop > b .L1 > Yes, assuming A is nop an B is b .L1. I reduced it to that from the real world case in non-stop-fair-events.c with : 0x0000000000400767 <+38>: mov 0x200963(%rip),%eax # 0x6010d0 <got_sig> 0x000000000040076d <+44>: test %eax,%eax 0x000000000040076f <+46>: je 0x400767 <child_function+38> And a single-step breakpoint on all instructions. >> In this particular case: >> >> - GDBServer stops on instruction A in thread 1. >> - Deletes thread 1 single-step breakpoint. >> - Starts a step-over of thread 1 to step-over the thread 2 breakpoint. >> - GDBServer finishes a step-over and is at instruction B. >> - GDBserver starts a step-over of thread 1 to step-over the thread 3 >> breakpoint at instruction B. > > Why does GDBserver starts a step-over again? is it because > need_step_over_p doing checks like this, > > if (breakpoint_here (pc) || fast_tracepoint_jump_here (pc)) > { > /* Don't step over a breakpoint that GDB expects to hit > though. If the condition is being evaluated on the target's side > and it evaluate to false, step over this breakpoint as well. */ > if (gdb_breakpoint_here (pc) > && gdb_condition_true_at_breakpoint (pc) > && gdb_no_commands_at_breakpoint (pc)) > { > if (debug_threads) > debug_printf ("Need step over [LWP %ld]? yes, but found" > " GDB breakpoint at 0x%s; skipping step over\n", > lwpid_of (thread), paddress (pc)); > > current_thread = saved_thread; > return 0; > } > else > { > if (debug_threads) > debug_printf ("Need step over [LWP %ld]? yes, " > "found breakpoint at 0x%s\n", > lwpid_of (thread), paddress (pc)); > > /* We've found an lwp that needs stepping over --- return 1 so > that find_inferior stops looking. */ > current_thread = saved_thread; > > return 1; > } > } > > there is a single step breakpoint on pc, and it is obviously not a > gdb breakpoint, so 1 is returned. > Yes. >> - GDBServer stops on instuction A in thread 1. >> - GDBServer is now in an infinite loop. >> > > I am wondering can we take the information that we've already step > over a breakpoint for thread A into need_step_over_p, if we see pc > is on another single step breakpoint for thread B, don't do step over. There's 3 parts to consider here: - What to restart ? My first thought with this was step-over the other threads that needed to step-over in a queue like fashion, but this may not be enough since we may depend on another thread to be able to make progress on the single stepped threads like is the case in non-stop-fair-events. So like this patch does I though it was better to restart all threads... - What does it mean to not step-over ? The don't do step over part is more tricky then it seems since let's say GDBServer: - hits a breakpoint on A with thread 1 - doesn't step-over and restart all threads - any thread hits the breakpoint on A At this point thread 1 has already hit the breakpoint on A even if we do not allow it to step-over, so it's stuck there and could hit the breakpoint a number of times, thus breaking the breakpoints/trace frames counts and maybe some other stuff... I'm not sure about the solution to that, ideas ? How to "undo" a breakpoint hit ? Or some other solution... - When not to step-over ? After some thought I think we could make it so that we expect the single-step breakpoints for PC to be hit in the order of insertion. Since the breakpoints are already a linked list we could make it so that if you hit a single step breakpoint and that this breakpoint is installed on multiple threads then single-step only if the thread stopped matches the thread of the first breakpoint for that PC. Otherwise continue all threads. I think this would be simpler than to record the last executed step-over for all threads and check/reset that flag so that all threads wait for each other, but that may be possible too. WDYT ?
diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c index 15fb726..b84b62a 100644 --- a/gdb/gdbserver/linux-low.c +++ b/gdb/gdbserver/linux-low.c @@ -282,6 +282,53 @@ static int proceed_one_lwp (struct inferior_list_entry *entry, void *except); being stepped. */ ptid_t step_over_bkpt; +/* Class limiting the number of consecutive step-overs for a thread. */ + +class step_over_limiter +{ + public: + + step_over_limiter () : m_ptid (null_ptid), m_count (0), m_max_count (32) {} + + void step_over_done (struct lwp_info *lwp) + { + ptid_t ptid = lwp->thread->entry.id; + + if (!ptid_equal (ptid, m_ptid)) + { + m_ptid = ptid; + m_count = 0; + } + + m_count++; + } + + bool can_step_over (struct lwp_info *lwp) + { + if (!ptid_equal(lwp->thread->entry.id, m_ptid) + || m_count < m_max_count) + return true; + else + { + /* Reset here so that the step_over is retried. */ + m_ptid = null_ptid; + m_count = 0; + return false; + } + } + + private: + + ptid_t m_ptid; + int m_count; + + /* Maximum step overs for a thread, before all threads are run instead of + stepping over. */ + const int m_max_count; +}; + +step_over_limiter _step_over_limiter; + /* True if the low target can hardware single-step. */ static int @@ -3607,6 +3654,8 @@ linux_wait_1 (ptid_t ptid, doesn't lose it. */ enqueue_pending_signal (event_child, WSTOPSIG (w), info_p); + _step_over_limiter.step_over_done (event_child); + proceed_all_lwps (); } else @@ -3694,6 +3743,8 @@ linux_wait_1 (ptid_t ptid, We're going to keep waiting, so use proceed, which handles stepping over the next breakpoint. */ unsuspend_all_lwps (event_child); + + _step_over_limiter.step_over_done (event_child); } else { @@ -5400,13 +5451,26 @@ proceed_all_lwps (void) if (need_step_over != NULL) { - if (debug_threads) - debug_printf ("proceed_all_lwps: found " - "thread %ld needing a step-over\n", - lwpid_of (need_step_over)); + /* Don't step over if we're looping too much. */ + if (_step_over_limiter.can_step_over + (get_thread_lwp (need_step_over))) + { + if (debug_threads) + debug_printf ("proceed_all_lwps: found " + "thread %ld needing a step-over\n", + lwpid_of (need_step_over)); - start_step_over (get_thread_lwp (need_step_over)); - return; + start_step_over (get_thread_lwp (need_step_over)); + return; + } + else + { + if (debug_threads) + debug_printf ("proceed_all_lwps: found " + "thread %ld needing a step-over " + "but max consecutive step-overs reached\n", + lwpid_of (need_step_over)); + } } }