Regression with default scheduler-locking=step [Re: [pushed] Consecutive step-overs trigger internal error.]

Message ID 53A1998D.7090006@redhat.com
State Committed
Headers

Commit Message

Pedro Alves June 18, 2014, 1:52 p.m. UTC
  On 06/17/2014 08:24 PM, Jan Kratochvil wrote:
> On Tue, 22 Apr 2014 20:24:28 +0200, Pedro Alves wrote:
>> Tested on x86_64 Fedora 17, native and gdbserver, and also native on
>> top of my "software single-step on x86_64" series.
> 
> 483805cf9ea5a6dace41415d8830e93fccc49c43 is the first bad commit
> commit 483805cf9ea5a6dace41415d8830e93fccc49c43
> Author: Pedro Alves <palves@redhat.com>
> Date:   Tue Apr 22 15:00:56 2014 +0100
>     Consecutive step-overs trigger internal error.
>     
> (gdb) next^M
> [Thread 0x7ffff7fda700 (LWP 27168) exited]^M
> [New LWP 27168]^M
> [Thread 0x7ffff74ee700 (LWP 27174) exited]^M
> process 27168 is executing new program: /home/jkratoch/redhat/gdb-clean/gdb/testsuite/gdb.threads/thread-execl^M
> [Thread debugging using libthread_db enabled]^M
> Using host libthread_db library "/lib64/libthread_db.so.1".^M
> infrun.c:5225: internal-error: switch_back_to_stepped_thread: Assertion `!schedlock_applies (1)' failed.^M
> A problem internal to GDB has been detected,^M
> further debugging may prove unreliable.^M
> Quit this debugging session? (y or n) FAIL: gdb.threads/thread-execl.exp: get to main in new image (GDB internal error)
> Resyncing due to internal error.

Thanks Jan.

> 
> The regressions happens only with the attached patch which I am not sure if it
> is considered as a valid FSF GDB regression or not but I think it is.

If it worked before, then it's certainly a regression.  The user is
free to do "set scheduler-locking step" herself.

Here's a fix.  Let me know what you think.

8<---------------------------------
From f717378c16cb04f8350935a1336767d2541b36a5 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 18 Jun 2014 14:20:31 +0100
Subject: [PATCH] Fix next over threaded execl with "set scheduler-locking
 step".

Running gdb.threads/thread-execl.exp with scheduler-locking set to
"step" reveals a problem:

 (gdb) next^M
 [Thread 0x7ffff7fda700 (LWP 27168) exited]^M
 [New LWP 27168]^M
 [Thread 0x7ffff74ee700 (LWP 27174) exited]^M
 process 27168 is executing new program: /home/jkratoch/redhat/gdb-clean/gdb/testsuite/gdb.threads/thread-execl^M
 [Thread debugging using libthread_db enabled]^M
 Using host libthread_db library "/lib64/libthread_db.so.1".^M
 infrun.c:5225: internal-error: switch_back_to_stepped_thread: Assertion `!schedlock_applies (1)' failed.^M
 A problem internal to GDB has been detected,^M
 further debugging may prove unreliable.^M
 Quit this debugging session? (y or n) FAIL: gdb.threads/thread-execl.exp: schedlock step: get to main in new image (GDB internal error)

The assertion is correct.  The issue is that GDB is mistakenly trying
to switch back to an exited thread, that was previously stepping when
it exited.  This is exactly the sort of thing the test wants to make
sure doesn't happen:

	# Now set a breakpoint at `main', and step over the execl call.  The
	# breakpoint at main should be reached.  GDB should not try to revert
	# back to the old thread from the old image and resume stepping it

We don't see this bug with schedlock off only because a different
sequence of events makes GDB manage to delete the thread instead of
marking it exited.

This particular internal error can be fixed by making the loop over
all threads in switch_back_to_stepped_thread skip exited threads.
But, looking over other ALL_THREADS users, all either can or should be
skipping exited threads too.  So for simplicity, this patch replaces
ALL_THREADS with a new macro that skips exited threads itself, and
updates everything to use it.

Tested on x86_64 Fedora 20.

gdb/
2014-06-18  Pedro Alves  <palves@redhat.com>

	* gdbthread.h (ALL_THREADS): Delete.
	(ALL_NON_EXITED_THREADS): New macro.
	* btrace.c (btrace_free_objfile): Use ALL_NON_EXITED_THREADS
	instead of ALL_THREADS.
	* infrun.c (find_thread_needs_step_over)
	(switch_back_to_stepped_thread): Use ALL_NON_EXITED_THREADS
	instead of ALL_THREADS.
	* record-btrace.c (record_btrace_open)
	(record_btrace_stop_recording, record_btrace_close)
	(record_btrace_is_replaying, record_btrace_resume)
	(record_btrace_find_thread_to_move, record_btrace_wait): Likewise.
	* remote.c (append_pending_thread_resumptions): Likewise.

gdb/testsuite/
2014-06-18  Pedro Alves  <palves@redhat.com>

	* gdb.threads/thread-execl.exp (do_test): New procedure, factored
	out from ...
	(top level): ... here.  Iterate running tests under different
	scheduler-locking settings.
---
 gdb/btrace.c                               |  2 +-
 gdb/gdbthread.h                            |  8 ++++--
 gdb/infrun.c                               |  4 +--
 gdb/record-btrace.c                        | 14 +++++-----
 gdb/remote.c                               |  2 +-
 gdb/testsuite/gdb.threads/thread-execl.exp | 44 ++++++++++++++++++++----------
 gdb/thread.c                               |  2 +-
 7 files changed, 46 insertions(+), 30 deletions(-)
  

Comments

Jan Kratochvil June 18, 2014, 5:57 p.m. UTC | #1
On Wed, 18 Jun 2014 15:52:13 +0200, Pedro Alves wrote:
> Here's a fix.  Let me know what you think.

The change looks OK and as it fixes the assertion I am for the patch.


Thanks,
Jan
  
Pedro Alves June 19, 2014, 11:30 a.m. UTC | #2
On 06/18/2014 06:57 PM, Jan Kratochvil wrote:
> On Wed, 18 Jun 2014 15:52:13 +0200, Pedro Alves wrote:
>> Here's a fix.  Let me know what you think.
> 
> The change looks OK and as it fixes the assertion I am for the patch.

Thanks.  Now pushed to both master and gdb-7.8-branch.
  

Patch

diff --git a/gdb/btrace.c b/gdb/btrace.c
index 601eb41..87a171e 100644
--- a/gdb/btrace.c
+++ b/gdb/btrace.c
@@ -958,7 +958,7 @@  btrace_free_objfile (struct objfile *objfile)
 
   DEBUG ("free objfile");
 
-  ALL_THREADS (tp)
+  ALL_NON_EXITED_THREADS (tp)
     btrace_clear (tp);
 }
 
diff --git a/gdb/gdbthread.h b/gdb/gdbthread.h
index 9f5dee6..64e37c3 100644
--- a/gdb/gdbthread.h
+++ b/gdb/gdbthread.h
@@ -317,10 +317,12 @@  void thread_change_ptid (ptid_t old_ptid, ptid_t new_ptid);
 typedef int (*thread_callback_func) (struct thread_info *, void *);
 extern struct thread_info *iterate_over_threads (thread_callback_func, void *);
 
-/* Traverse all threads.  */
+/* Traverse all threads, except those that have THREAD_EXITED
+   state.  */
 
-#define ALL_THREADS(T)				\
-  for (T = thread_list; T; T = T->next)
+#define ALL_NON_EXITED_THREADS(T)				\
+  for (T = thread_list; T; T = T->next) \
+    if ((T)->state != THREAD_EXITED)
 
 extern int thread_count (void);
 
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 4e808d9..e8e26e0 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -2160,7 +2160,7 @@  find_thread_needs_step_over (int step, struct thread_info *except)
       return NULL;
     }
 
-  ALL_THREADS (tp)
+  ALL_NON_EXITED_THREADS (tp)
     {
       /* Ignore the EXCEPT thread.  */
       if (tp == except)
@@ -5204,7 +5204,7 @@  switch_back_to_stepped_thread (struct execution_control_state *ecs)
 	 step/next/etc.  */
       stepping_thread = NULL;
       step_over = NULL;
-      ALL_THREADS (tp)
+      ALL_NON_EXITED_THREADS (tp)
         {
 	  /* Ignore threads of processes we're not resuming.  */
 	  if (!sched_multi
diff --git a/gdb/record-btrace.c b/gdb/record-btrace.c
index e0c537a8..6a9bfe1 100644
--- a/gdb/record-btrace.c
+++ b/gdb/record-btrace.c
@@ -206,7 +206,7 @@  record_btrace_open (char *args, int from_tty)
   gdb_assert (record_btrace_thread_observer == NULL);
 
   disable_chain = make_cleanup (null_cleanup, NULL);
-  ALL_THREADS (tp)
+  ALL_NON_EXITED_THREADS (tp)
     if (args == NULL || *args == 0 || number_is_in_list (args, tp->num))
       {
 	btrace_enable (tp);
@@ -238,7 +238,7 @@  record_btrace_stop_recording (struct target_ops *self)
 
   record_btrace_auto_disable ();
 
-  ALL_THREADS (tp)
+  ALL_NON_EXITED_THREADS (tp)
     if (tp->btrace.target != NULL)
       btrace_disable (tp);
 }
@@ -259,7 +259,7 @@  record_btrace_close (struct target_ops *self)
 
   /* We should have already stopped recording.
      Tear down btrace in case we have not.  */
-  ALL_THREADS (tp)
+  ALL_NON_EXITED_THREADS (tp)
     btrace_teardown (tp);
 }
 
@@ -835,7 +835,7 @@  record_btrace_is_replaying (struct target_ops *self)
 {
   struct thread_info *tp;
 
-  ALL_THREADS (tp)
+  ALL_NON_EXITED_THREADS (tp)
     if (btrace_is_replaying (tp))
       return 1;
 
@@ -1522,7 +1522,7 @@  record_btrace_resume (struct target_ops *ops, ptid_t ptid, int step,
 
   /* Stop replaying other threads if the thread to resume is not replaying.  */
   if (!btrace_is_replaying (tp) && execution_direction != EXEC_REVERSE)
-    ALL_THREADS (other)
+    ALL_NON_EXITED_THREADS (other)
       record_btrace_stop_replaying (other);
 
   /* As long as we're not replaying, just forward the request.  */
@@ -1572,7 +1572,7 @@  record_btrace_find_thread_to_move (ptid_t ptid)
     return tp;
 
   /* Otherwise, find one other thread that has been resumed.  */
-  ALL_THREADS (tp)
+  ALL_NON_EXITED_THREADS (tp)
     if ((tp->btrace.flags & BTHR_MOVE) != 0)
       return tp;
 
@@ -1777,7 +1777,7 @@  record_btrace_wait (struct target_ops *ops, ptid_t ptid,
 
   /* Stop all other threads. */
   if (!non_stop)
-    ALL_THREADS (other)
+    ALL_NON_EXITED_THREADS (other)
       other->btrace.flags &= ~BTHR_MOVE;
 
   /* Start record histories anew from the current position.  */
diff --git a/gdb/remote.c b/gdb/remote.c
index 6915dd8..b5318f1 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -4626,7 +4626,7 @@  append_pending_thread_resumptions (char *p, char *endp, ptid_t ptid)
 {
   struct thread_info *thread;
 
-  ALL_THREADS (thread)
+  ALL_NON_EXITED_THREADS (thread)
     if (ptid_match (thread->ptid, ptid)
 	&& !ptid_equal (inferior_ptid, thread->ptid)
 	&& thread->suspend.stop_signal != GDB_SIGNAL_0
diff --git a/gdb/testsuite/gdb.threads/thread-execl.exp b/gdb/testsuite/gdb.threads/thread-execl.exp
index d837fe4..14b96d2 100644
--- a/gdb/testsuite/gdb.threads/thread-execl.exp
+++ b/gdb/testsuite/gdb.threads/thread-execl.exp
@@ -28,19 +28,33 @@  if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" \
     return -1
 }
 
-clean_restart ${binfile}
-
-runto_main
-
-# Get ourselves to the thread that execs
-gdb_breakpoint "thread_execler"
-gdb_test "continue" ".*thread_execler.*" "continue to thread start"
-
-# Now set a breakpoint at `main', and step over the execl call.  The
-# breakpoint at main should be reached.  GDB should not try to revert
-# back to the old thread from the old image and resume stepping it
-# (since it is gone).
-gdb_breakpoint "main"
-gdb_test "next" ".*main.*" "get to main in new image"
+# Run the test proper.  SCHEDLOCK specifies what scheduler-locking
+# should be set to.
+
+proc do_test { schedlock } {
+    global binfile
+
+    with_test_prefix "schedlock $schedlock" {
+	clean_restart ${binfile}
+
+	if ![runto_main] {
+	    return 0
+	}
+
+	# Get ourselves to the thread that execs.
+	gdb_breakpoint "thread_execler"
+	gdb_test "continue" ".*thread_execler.*" "continue to thread start"
+
+	# Now set a breakpoint at `main', and step over the execl call.  The
+	# breakpoint at main should be reached.  GDB should not try to revert
+	# back to the old thread from the old image and resume stepping it
+	# (since it is gone).
+	gdb_breakpoint "main"
+	gdb_test_no_output "set scheduler-locking $schedlock"
+	gdb_test "next" ".*main.*" "get to main in new image"
+    }
+}
 
-return 0
+foreach schedlock {"off" "step" "on"} {
+    do_test $schedlock
+}
diff --git a/gdb/thread.c b/gdb/thread.c
index 7bc5271..532149d 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1243,7 +1243,7 @@  thread_apply_all_command (char *cmd, int from_tty)
       ta_cleanup.tp_array = tp_array;
       ta_cleanup.count = tc;
 
-      ALL_THREADS (tp)
+      ALL_NON_EXITED_THREADS (tp)
         {
           tp_array[i] = tp;
           tp->refcount++;