Regression with default scheduler-locking=step [Re: [pushed] Consecutive step-overs trigger internal error.]

From: Pedro Alves <palves@redhat.com>

  On 06/17/2014 08:24 PM, Jan Kratochvil wrote:
> On Tue, 22 Apr 2014 20:24:28 +0200, Pedro Alves wrote:
>> Tested on x86_64 Fedora 17, native and gdbserver, and also native on
>> top of my "software single-step on x86_64" series.
> 
> 483805cf9ea5a6dace41415d8830e93fccc49c43 is the first bad commit
> commit 483805cf9ea5a6dace41415d8830e93fccc49c43
> Author: Pedro Alves <palves@redhat.com>
> Date:   Tue Apr 22 15:00:56 2014 +0100
>     Consecutive step-overs trigger internal error.
>     
> (gdb) next^M
> [Thread 0x7ffff7fda700 (LWP 27168) exited]^M
> [New LWP 27168]^M
> [Thread 0x7ffff74ee700 (LWP 27174) exited]^M
> process 27168 is executing new program: /home/jkratoch/redhat/gdb-clean/gdb/testsuite/gdb.threads/thread-execl^M
> [Thread debugging using libthread_db enabled]^M
> Using host libthread_db library "/lib64/libthread_db.so.1".^M
> infrun.c:5225: internal-error: switch_back_to_stepped_thread: Assertion `!schedlock_applies (1)' failed.^M
> A problem internal to GDB has been detected,^M
> further debugging may prove unreliable.^M
> Quit this debugging session? (y or n) FAIL: gdb.threads/thread-execl.exp: get to main in new image (GDB internal error)
> Resyncing due to internal error.

Thanks Jan.

> 
> The regressions happens only with the attached patch which I am not sure if it
> is considered as a valid FSF GDB regression or not but I think it is.

If it worked before, then it's certainly a regression.  The user is
free to do "set scheduler-locking step" herself.

Here's a fix.  Let me know what you think.

8<---------------------------------
From f717378c16cb04f8350935a1336767d2541b36a5 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 18 Jun 2014 14:20:31 +0100
Subject: [PATCH] Fix next over threaded execl with "set scheduler-locking
 step".

Running gdb.threads/thread-execl.exp with scheduler-locking set to
"step" reveals a problem:

 (gdb) next^M
 [Thread 0x7ffff7fda700 (LWP 27168) exited]^M
 [New LWP 27168]^M
 [Thread 0x7ffff74ee700 (LWP 27174) exited]^M
 process 27168 is executing new program: /home/jkratoch/redhat/gdb-clean/gdb/testsuite/gdb.threads/thread-execl^M
 [Thread debugging using libthread_db enabled]^M
 Using host libthread_db library "/lib64/libthread_db.so.1".^M
 infrun.c:5225: internal-error: switch_back_to_stepped_thread: Assertion `!schedlock_applies (1)' failed.^M
 A problem internal to GDB has been detected,^M
 further debugging may prove unreliable.^M
 Quit this debugging session? (y or n) FAIL: gdb.threads/thread-execl.exp: schedlock step: get to main in new image (GDB internal error)

The assertion is correct.  The issue is that GDB is mistakenly trying
to switch back to an exited thread, that was previously stepping when
it exited.  This is exactly the sort of thing the test wants to make
sure doesn't happen:

	# Now set a breakpoint at `main', and step over the execl call.  The
	# breakpoint at main should be reached.  GDB should not try to revert
	# back to the old thread from the old image and resume stepping it

We don't see this bug with schedlock off only because a different
sequence of events makes GDB manage to delete the thread instead of
marking it exited.

This particular internal error can be fixed by making the loop over
all threads in switch_back_to_stepped_thread skip exited threads.
But, looking over other ALL_THREADS users, all either can or should be
skipping exited threads too.  So for simplicity, this patch replaces
ALL_THREADS with a new macro that skips exited threads itself, and
updates everything to use it.

Tested on x86_64 Fedora 20.

gdb/
2014-06-18  Pedro Alves  <palves@redhat.com>

	* gdbthread.h (ALL_THREADS): Delete.
	(ALL_NON_EXITED_THREADS): New macro.
	* btrace.c (btrace_free_objfile): Use ALL_NON_EXITED_THREADS
	instead of ALL_THREADS.
	* infrun.c (find_thread_needs_step_over)
	(switch_back_to_stepped_thread): Use ALL_NON_EXITED_THREADS
	instead of ALL_THREADS.
	* record-btrace.c (record_btrace_open)
	(record_btrace_stop_recording, record_btrace_close)
	(record_btrace_is_replaying, record_btrace_resume)
	(record_btrace_find_thread_to_move, record_btrace_wait): Likewise.
	* remote.c (append_pending_thread_resumptions): Likewise.

gdb/testsuite/
2014-06-18  Pedro Alves  <palves@redhat.com>

	* gdb.threads/thread-execl.exp (do_test): New procedure, factored
	out from ...
	(top level): ... here.  Iterate running tests under different
	scheduler-locking settings.
---
 gdb/btrace.c                               |  2 +-
 gdb/gdbthread.h                            |  8 ++++--
 gdb/infrun.c                               |  4 +--
 gdb/record-btrace.c                        | 14 +++++-----
 gdb/remote.c                               |  2 +-
 gdb/testsuite/gdb.threads/thread-execl.exp | 44 ++++++++++++++++++++----------
 gdb/thread.c                               |  2 +-
 7 files changed, 46 insertions(+), 30 deletions(-)

Regression with default scheduler-locking=step [Re: [pushed] Consecutive step-overs trigger internal error.]

Commit Message

Comments

Patch