Tighten gdb.base/disp-step-syscall.exp (was: Re: [PATCH v5 0/6] Remote fork events)

Message ID 54F4C5A4.40503@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves March 2, 2015, 8:18 p.m. UTC
  On 02/27/2015 12:46 AM, Don Breazeal wrote:
>  - There are a couple of tests that show new failures that actually
>    fail in the current mainline.  Details of these are as follows:
> 
>    * when vfork events are enabled, gdb.base/disp-step-syscall.exp
>      shows PASS => FAIL in .sum diffs.  The test actually always
>      fails.  With native/master, we see
> 
>       stepi^M
>       FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn
> (timeout)
> 

Hmm, I don't see that here.  I get a full pass on x86_64 Fedora 20.
Can you try "set debug infrun 1" / "set debug lin-lwp 1" / "set debug displaced 1"
to check if there's a gdb or kernel bug here?

>      With remote and extended-remote/master, we see a bogus PASS result:
>       stepi^M
>       [Inferior 1 (process 9399) exited normally]^M
>       (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn
> 
>     The criteria to pass that test are pretty lax:
>       gdb_test "stepi" ".*" "stepi $syscall insn"

Yeah.  I see several other problems.  Here's a patch to improve it.

Comments?

Unfortunately, with your full series applied, I get this:

 (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc"
 stepi
 Detaching from process 29944
 Killing process(es): 29942 29944
 /home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c:998: A problem internal to GDBserver has been detected.
 kill_wait_lwp: Assertion `res > 0' failed.
 /home/pedro/gdb/mygit/src/gdb/thread.c:1182: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn (GDB internal error)
 Resyncing due to internal error.
 n

Note, you'll need this one:

 https://sourceware.org/ml/gdb-patches/2015-03/msg00045.html

for that internal error to result in a quick bail...

----------
From 1f825812d3f17a2940065d0de38592700e7437bc Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Mon, 2 Mar 2015 20:16:23 +0000
Subject: [PATCH] Tighten gdb.base/disp-step-syscall.exp

This fixes several problems with this test.

E.g,. with --target_board=native-extended-gdbserver on
x86_64 Fedora 20, I get:

 Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/disp-step-syscall.exp ...
 FAIL: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc" (timeout)
 FAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork final pc
 FAIL: gdb.base/disp-step-syscall.exp: vfork: delete break vfork insn
 FAIL: gdb.base/disp-step-syscall.exp: vfork: continue to marker (vfork) (the program is no longer running)

And with --target=native-gdbserver, I get:

 Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/disp-step-syscall.exp ...
 KPASS: gdb.base/disp-step-syscall.exp: vfork: single step over vfork (PRMS server/13796)
 FAIL: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc" (timeout)
 FAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork final pc
 FAIL: gdb.base/disp-step-syscall.exp: vfork: delete break vfork insn
 FAIL: gdb.base/disp-step-syscall.exp: vfork: continue to marker (vfork) (the program is no longer running)

First, the lack of fork support on remote targets is supposed to be
kfailed, so the KPASS is obviously bogus.  The extended-remote board
should have KFAILed too.

The problem is that the test is using "is_remote" instead of
gdb_is_target_remote.

And then, I get:

 (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: set displaced-stepping on
 stepi

 Program terminated with signal SIGSEGV, Segmentation fault.
 The program no longer exists.
 (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: single step over vfork

Obviously, that should be a FAIL.  The problem is that the test only
expects SIGILL, not SIGSEGV.  It also doesn't bail correctly if an
internal error or some other pattern caught by gdb_test_multiple
matches.  The test doesn't really need to match specific exits/crashes
patterns, if the PASS regex is improved, like in ...

... this and the other "stepi" tests are a bit too lax, passing on
".*".  This tightens those up to expect "x/i" and the "=>" current PC
indicator, like in:

 1: x/i $pc
 => 0x3b36abc9e2 <vfork+34>:     syscall

On x86_64 Fedora 20, I now get a quick KFAIL instead of timeouts with
both the native-extended-gdbserver and native-gdbserver boards:

 PASS: gdb.base/disp-step-syscall.exp: vfork: delete break vfork
 PASS: gdb.base/disp-step-syscall.exp: vfork: continue to syscall insn vfork
 PASS: gdb.base/disp-step-syscall.exp: vfork: set displaced-stepping on
 KFAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork (PRMS: server/13796)

and a full pass with native testing.

gdb/testsuite/
2015-03-02  Pedro Alves  <palves@redhat.com>

	* gdb.base/disp-step-syscall.exp (disp_step_cross_syscall.exp):
	Use gdb_is_target_remote instead of is_remote.  Use
	gdb_test_multiple instead of gdb_expect.  Exit early if
	gdb_test_multiple hits its internal matches.  Tighten stepi tests
	expected output.  Fail on exit with any signal, instead of just
	SIGILL.
---
 gdb/testsuite/gdb.base/disp-step-syscall.exp | 53 ++++++++++++++--------------
 1 file changed, 26 insertions(+), 27 deletions(-)
  

Comments

Don Breazeal March 3, 2015, 6:20 a.m. UTC | #1
On 3/2/2015 12:18 PM, Pedro Alves wrote:
> On 02/27/2015 12:46 AM, Don Breazeal wrote:
>>  - There are a couple of tests that show new failures that actually
>>    fail in the current mainline.  Details of these are as follows:
>>
>>    * when vfork events are enabled, gdb.base/disp-step-syscall.exp
>>      shows PASS => FAIL in .sum diffs.  The test actually always
>>      fails.  With native/master, we see
>>
>>       stepi^M
>>       FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn
>> (timeout)
>>
> 
> Hmm, I don't see that here.  I get a full pass on x86_64 Fedora 20.
> Can you try "set debug infrun 1" / "set debug lin-lwp 1" / "set debug displaced 1"
> to check if there's a gdb or kernel bug here?

I am traveling this week, so I haven't had a chance to look at this
much.  With your two patches below applied, I get a full pass on x86_64
Ubuntu 14.04 (which I hadn't tried before) and I see the timeout failure
on x86_64 Ubuntu 10.04.  Here is the relevant portion of the gdb.log
file from my failing run on Ubuntu 10.04 with the debugging turned on,
if you want to look at it.  Otherwise I will try to look at it in the
next day or so.

(gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal
valueof "$pc"
stepi^M
infrun: clear_proceed_status_thread (process 6486)^M
infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT,
step=1)^M
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current
thread [process 6486] at 0x2aaaaaffcf02^M
LLR: Preparing to step process 6486, 0, inferior_ptid process 6486^M
LLR: PTRACE_SINGLESTEP process 6486, 0 (resume event thread)^M
sigchld^M
linux_nat_wait: [process -1], [TARGET_WNOHANG]^M
LLW: enter^M
LNW: waitpid(-1, ...) returned 6837, No child processes^M
LLW: waitpid 6837 received Trace/breakpoint trap (stopped)^M
LHEW: saving LWP 6837 status Trace/breakpoint trap (stopped) in
stopped_pids list^M
LNW: waitpid(-1, ...) returned 6486, No child processes^M
LLW: waitpid 6486 received Trace/breakpoint trap (stopped)^M
LLW: Handling extended status 0x02057f^M
LNW: waitpid(-1, ...) returned 0, No child processes^M
SEL: Select single-step process 6486^M
LLW: trap ptid is process 6486.^M
LLW: exit^M
sigchld^M
infrun: target_wait (-1, status) =^M
infrun:   6486 [process 6486],^M
infrun:   status->kind = vforked^M
infrun: TARGET_WAITKIND_VFORKED^M
Detaching after vfork from child process 6837.^M
sigchld^M
LCFF: waiting for VFORK_DONE on 6486^M
infrun: resume : clear step^M
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current
thread [process 6486] at 0x2aaaaaffcf04^M
LLR: Preparing to resume process 6486, 0, inferior_ptid process 6486^M
LLR: PTRACE_CONT process 6486, 0 (resume event thread)^M
infrun: prepare_to_wait^M
linux_nat_wait: [process -1], [TARGET_WNOHANG]^M
LLW: enter^M
LNW: waitpid(-1, ...) returned 0, No child processes^M
LLW: exit (ignore)^M
infrun: target_wait (-1, status) =^M
infrun:   -1 [process -1],^M
infrun:   status->kind = ignore^M
infrun: TARGET_WAITKIND_IGNORE^M
infrun: prepare_to_wait^M
FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn (timeout)
testcase
/scratch/dbreazea/sandbox/gdb-with-exec-6/binutils-gdb/gdb/testsuite/gdb.base/disp-step-syscall.exp
completed in 21 seconds

> 
>>      With remote and extended-remote/master, we see a bogus PASS result:
>>       stepi^M
>>       [Inferior 1 (process 9399) exited normally]^M
>>       (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn
>>
>>     The criteria to pass that test are pretty lax:
>>       gdb_test "stepi" ".*" "stepi $syscall insn"
> 
> Yeah.  I see several other problems.  Here's a patch to improve it.
> 
> Comments?

I will try to walk through this in the next day or so.

> 
> Unfortunately, with your full series applied, I get this:
> 
>  (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc"
>  stepi
>  Detaching from process 29944
>  Killing process(es): 29942 29944
>  /home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c:998: A problem internal to GDBserver has been detected.
>  kill_wait_lwp: Assertion `res > 0' failed.
>  /home/pedro/gdb/mygit/src/gdb/thread.c:1182: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
>  A problem internal to GDB has been detected,
>  further debugging may prove unreliable.
>  Quit this debugging session? (y or n) FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn (GDB internal error)
>  Resyncing due to internal error.
>  n

With the two patches below applied, disp-step-syscall.exp passes
consistently for me using native-extended-gdbserver on x86_64 Ubuntu
10.04 and Ubuntu 14.04.  I haven't looked (yet) at how your patches
might have caused this change in behavior, or at how I might be able to
reproduce the failure you are seeing.

I have seen the "inf != NULL" assertion before, when stopped at a remote
fork/vfork catchpoint and executing "info threads".  In that case
gdbserver was reporting the new thread created by the fork.  It was
added to the host-side thread list, but the new inferior had not been
created yet on the host side.  That specific scenario should be
prevented now in the remote follow fork patch series by not reporting
the forked child's thread until the follow_fork has been completed. (If
I am remembering that right.)

Thanks
--Don

> 
> Note, you'll need this one:
> 
>  https://sourceware.org/ml/gdb-patches/2015-03/msg00045.html
> 
> for that internal error to result in a quick bail...
> 
> ----------
> From 1f825812d3f17a2940065d0de38592700e7437bc Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Mon, 2 Mar 2015 20:16:23 +0000
> Subject: [PATCH] Tighten gdb.base/disp-step-syscall.exp
> 
> This fixes several problems with this test.
> 
> E.g,. with --target_board=native-extended-gdbserver on
> x86_64 Fedora 20, I get:
> 
>  Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/disp-step-syscall.exp ...
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc" (timeout)
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork final pc
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: delete break vfork insn
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: continue to marker (vfork) (the program is no longer running)
> 
> And with --target=native-gdbserver, I get:
> 
>  Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/disp-step-syscall.exp ...
>  KPASS: gdb.base/disp-step-syscall.exp: vfork: single step over vfork (PRMS server/13796)
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc" (timeout)
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork final pc
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: delete break vfork insn
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: continue to marker (vfork) (the program is no longer running)
> 
> First, the lack of fork support on remote targets is supposed to be
> kfailed, so the KPASS is obviously bogus.  The extended-remote board
> should have KFAILed too.
> 
> The problem is that the test is using "is_remote" instead of
> gdb_is_target_remote.
> 
> And then, I get:
> 
>  (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: set displaced-stepping on
>  stepi
> 
>  Program terminated with signal SIGSEGV, Segmentation fault.
>  The program no longer exists.
>  (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: single step over vfork
> 
> Obviously, that should be a FAIL.  The problem is that the test only
> expects SIGILL, not SIGSEGV.  It also doesn't bail correctly if an
> internal error or some other pattern caught by gdb_test_multiple
> matches.  The test doesn't really need to match specific exits/crashes
> patterns, if the PASS regex is improved, like in ...
> 
> ... this and the other "stepi" tests are a bit too lax, passing on
> ".*".  This tightens those up to expect "x/i" and the "=>" current PC
> indicator, like in:
> 
>  1: x/i $pc
>  => 0x3b36abc9e2 <vfork+34>:     syscall
> 
> On x86_64 Fedora 20, I now get a quick KFAIL instead of timeouts with
> both the native-extended-gdbserver and native-gdbserver boards:
> 
>  PASS: gdb.base/disp-step-syscall.exp: vfork: delete break vfork
>  PASS: gdb.base/disp-step-syscall.exp: vfork: continue to syscall insn vfork
>  PASS: gdb.base/disp-step-syscall.exp: vfork: set displaced-stepping on
>  KFAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork (PRMS: server/13796)
> 
> and a full pass with native testing.
> 
> gdb/testsuite/
> 2015-03-02  Pedro Alves  <palves@redhat.com>
> 
> 	* gdb.base/disp-step-syscall.exp (disp_step_cross_syscall.exp):
> 	Use gdb_is_target_remote instead of is_remote.  Use
> 	gdb_test_multiple instead of gdb_expect.  Exit early if
> 	gdb_test_multiple hits its internal matches.  Tighten stepi tests
> 	expected output.  Fail on exit with any signal, instead of just
> 	SIGILL.
> ---
>  gdb/testsuite/gdb.base/disp-step-syscall.exp | 53 ++++++++++++++--------------
>  1 file changed, 26 insertions(+), 27 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.base/disp-step-syscall.exp b/gdb/testsuite/gdb.base/disp-step-syscall.exp
> index ff66f83..b13dce4 100644
> --- a/gdb/testsuite/gdb.base/disp-step-syscall.exp
> +++ b/gdb/testsuite/gdb.base/disp-step-syscall.exp
> @@ -49,6 +49,8 @@ proc disp_step_cross_syscall { syscall } {
>  	    return
>  	}
>  
> +	set is_target_remote [gdb_is_target_remote]
> +
>  	# Delete the breakpoint on main.
>  	gdb_test_no_output "delete break 1"
>  
> @@ -77,27 +79,34 @@ proc disp_step_cross_syscall { syscall } {
>  	gdb_test "display/i \$pc" ".*"
>  
>  
> -	# Single step until we see sysall insn or we reach the upper bound of loop
> -	# iterations.
> -	set see_syscall_insn 0
> -
> -	for {set i 0} {$i < 1000 && $see_syscall_insn == 0} {incr i} {
> -	    send_gdb "stepi\n"
> -	    gdb_expect {
> -		-re ".*$syscall_insn.*$gdb_prompt $" {
> -		    set see_syscall_insn 1
> +	# Single step until we see a syscall insn or we reach the
> +	# upper bound of loop iterations.
> +	set msg "find syscall insn in $syscall"
> +	set steps 0
> +	set max_steps 1000
> +	gdb_test_multiple "stepi" $msg {
> +	    -re ".*$syscall_insn.*$gdb_prompt $" {
> +		pass $msg
> +	    }
> +	    -re "x/i .*=>.*\r\n$gdb_prompt $" {
> +		incr steps
> +		if {$steps == $max_steps} {
> +		    fail $msg
> +		} else {
> +		    send_gdb "stepi\n"
> +		    exp_continue
>  		}
> -		-re ".*$gdb_prompt $" {}
>  	    }
>  	}
>  
> -	if {$see_syscall_insn == 0} then {
> -	    fail "find syscall insn in $syscall"
> +	if {$steps == $max_steps} {
>  	    return -1
>  	}
>  
>  	set syscall_insn_addr [get_hexadecimal_valueof "\$pc" "0"]
> -	gdb_test "stepi" ".*" "stepi $syscall insn"
> +	if {[gdb_test "stepi" "x/i .*=>.*" "stepi $syscall insn"] != 0} {
> +	    return -1
> +	}
>  	set syscall_insn_next_addr [get_hexadecimal_valueof "\$pc" "0"]
>  
>  	gdb_test "continue" "Continuing\\..*Breakpoint \[0-9\]+, (.* in |__libc_|)$syscall \\(\\).*" \
> @@ -121,22 +130,12 @@ proc disp_step_cross_syscall { syscall } {
>  	gdb_test_no_output "set displaced-stepping on"
>  
>  	# Check the address of next instruction of syscall.
> -	if {$syscall == "vfork" && [is_remote target]} {
> +	if {$syscall == "vfork" && $is_target_remote} {
>  	    setup_kfail server/13796 "*-*-*"
>  	}
> -	set test "single step over $syscall"
> -	gdb_test_multiple "stepi" $test {
> -	    -re "Program terminated with signal SIGILL,.*\r\n$gdb_prompt $" {
> -		fail $test
> -		return
> -	    }
> -	    -re "\\\[Inferior .* exited normally\\\].*\r\n$gdb_prompt $" {
> -		fail $test
> -		return
> -	    }
> -	    -re "\r\n$gdb_prompt $" {
> -		pass $test
> -	    }
> +
> +	if {[gdb_test "stepi" "x/i .*=>.*" "single step over $syscall"] != 0} {
> +	    return -1
>  	}
>  
>  	set syscall_insn_next_addr_found [get_hexadecimal_valueof "\$pc" "0"]
>
  
Pedro Alves March 3, 2015, 3 p.m. UTC | #2
On 03/03/2015 06:20 AM, Breazeal, Don wrote:

> infrun: target_wait (-1, status) =^M
> infrun:   6486 [process 6486],^M
> infrun:   status->kind = vforked^M
> infrun: TARGET_WAITKIND_VFORKED^M
> Detaching after vfork from child process 6837.^M
> sigchld^M
> LCFF: waiting for VFORK_DONE on 6486^M
> infrun: resume : clear step^M
> infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current
> thread [process 6486] at 0x2aaaaaffcf04^M
> LLR: Preparing to resume process 6486, 0, inferior_ptid process 6486^M
> LLR: PTRACE_CONT process 6486, 0 (resume event thread)^M
> infrun: prepare_to_wait^M
> linux_nat_wait: [process -1], [TARGET_WNOHANG]^M
> LLW: enter^M
> LNW: waitpid(-1, ...) returned 0, No child processes^M
> LLW: exit (ignore)^M
> infrun: target_wait (-1, status) =^M
> infrun:   -1 [process -1],^M
> infrun:   status->kind = ignore^M
> infrun: TARGET_WAITKIND_IGNORE^M
> infrun: prepare_to_wait^M
> FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn (timeout)
> testcase
> /scratch/dbreazea/sandbox/gdb-with-exec-6/binutils-gdb/gdb/testsuite/gdb.base/disp-step-syscall.exp
> completed in 21 seconds
> 

Odd.  Quite possibly a kernel bug.  Looks like ptrace never
reports the VFORK_DONE, or it does but SIGCHLD was never generated
and thus we got stuck in the event loop.

>>
>> Unfortunately, with your full series applied, I get this:
>>
>>  (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc"
>>  stepi
>>  Detaching from process 29944
>>  Killing process(es): 29942 29944
>>  /home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c:998: A problem internal to GDBserver has been detected.
>>  kill_wait_lwp: Assertion `res > 0' failed.
>>  /home/pedro/gdb/mygit/src/gdb/thread.c:1182: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
>>  A problem internal to GDB has been detected,
>>  further debugging may prove unreliable.
>>  Quit this debugging session? (y or n) FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn (GDB internal error)
>>  Resyncing due to internal error.
>>  n
> 
> With the two patches below applied, disp-step-syscall.exp passes
> consistently for me using native-extended-gdbserver on x86_64 Ubuntu
> 10.04 and Ubuntu 14.04.  I haven't looked (yet) at how your patches
> might have caused this change in behavior, or at how I might be able to
> reproduce the failure you are seeing.

TBC, I get the internal errors (F20, x86_64) without my patches too.
The only difference is that without my patches the FAIL is following by
slow timeouts:

 Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/disp-step-syscall.exp ...
 FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn (GDB internal error)
 FAIL: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc" (timeout)
 FAIL: gdb.base/disp-step-syscall.exp: vfork: continue to vfork (3rd time) (GDB internal error)
 FAIL: gdb.base/disp-step-syscall.exp: vfork: continue to syscall insn vfork (the program is no longer running)

> 
> I have seen the "inf != NULL" assertion before, when stopped at a remote
> fork/vfork catchpoint and executing "info threads".  In that case
> gdbserver was reporting the new thread created by the fork.  It was
> added to the host-side thread list, but the new inferior had not been
> created yet on the host side.  That specific scenario should be
> prevented now in the remote follow fork patch series by not reporting
> the forked child's thread until the follow_fork has been completed. (If
> I am remembering that right.)

Adding infrun/remote logging, I see:

infrun: target_wait (-1, status) =
infrun:   26217 [Thread 26217.26217],
infrun:   status->kind = vforked
infrun: TARGET_WAITKIND_VFORKED
Sending packet: $z0,400624,1#63...Packet received: OK
Sending packet: $z0,3b36603966,1#6f...Packet received: OK
Sending packet: $z0,3b36613970,1#6b...Packet received: OK
Sending packet: $z0,3b36614891,1#6e...Packet received: OK
Sending packet: $z0,3b36abc9c0,1#23...Packet received: OK
Detaching after vfork from child process 26219.
Sending packet: $D;666b#83...Detaching from process 26219
Killing process(es): 26217 26219
/home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c:998: A problem internal to GDBserver has been detected.
kill_wait_lwp: Assertion `res > 0' failed.
Packet received: E01
/home/pedro/gdb/mygit/src/gdb/thread.c:1182: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn (GDB internal error)
Resyncing due to internal error.


The backtrace:

...
#2  0x000000000041451e in internal_verror (file=0x456008 "/home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c", line=998, fmt=0x455fe6 "%s: Assertion `%s' failed.",
    args=0x7fff7b828258) at /home/pedro/gdb/mygit/src/gdb/gdbserver/utils.c:106
#3  0x0000000000426ea8 in internal_error (file=0x456008 "/home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c", line=998, fmt=0x455fe6 "%s: Assertion `%s' failed.")
    at /home/pedro/gdb/mygit/src/gdb/gdbserver/../common/errors.c:55
#4  0x00000000004293cb in kill_wait_lwp (lwp=0x14a38b0) at /home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c:998
#5  0x0000000000429543 in linux_kill (pid=26624) at /home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c:1050
#6  0x00000000004140ea in kill_inferior (pid=26624) at /home/pedro/gdb/mygit/src/gdb/gdbserver/target.c:219
#7  0x00000000004110e1 in detach_or_kill_inferior_callback (entry=0x14a2ad0) at /home/pedro/gdb/mygit/src/gdb/gdbserver/server.c:3087
#8  0x00000000004064da in for_each_inferior (list=0x670110 <all_processes>, action=0x41107f <detach_or_kill_inferior_callback>)
    at /home/pedro/gdb/mygit/src/gdb/gdbserver/inferiors.c:55
#9  0x0000000000411258 in detach_or_kill_for_exit () at /home/pedro/gdb/mygit/src/gdb/gdbserver/server.c:3148
#10 0x0000000000411295 in detach_or_kill_for_exit_cleanup (ignore=0x0) at /home/pedro/gdb/mygit/src/gdb/gdbserver/server.c:3163
#11 0x0000000000427178 in do_my_cleanups (pmy_chain=0x668938 <cleanup_chain>, old_chain=0x44c440 <sentinel_cleanup>)
    at /home/pedro/gdb/mygit/src/gdb/gdbserver/../common/cleanups.c:155
#12 0x00000000004271e5 in do_cleanups (old_chain=0x44c440 <sentinel_cleanup>) at /home/pedro/gdb/mygit/src/gdb/gdbserver/../common/cleanups.c:177
#13 0x00000000004276bf in throw_exception (exception=...) at /home/pedro/gdb/mygit/src/gdb/gdbserver/../common/common-exceptions.c:215
#14 0x0000000000427843 in throw_it (reason=RETURN_ERROR, error=GENERIC_ERROR, fmt=0x4480e7 "%s.", ap=0x7fff7b828648)
    at /home/pedro/gdb/mygit/src/gdb/gdbserver/../common/common-exceptions.c:274
#15 0x000000000042786d in throw_verror (error=GENERIC_ERROR, fmt=0x4480e7 "%s.", ap=0x7fff7b828648)
    at /home/pedro/gdb/mygit/src/gdb/gdbserver/../common/common-exceptions.c:280
#16 0x0000000000414456 in verror (string=0x4480e7 "%s.", args=0x7fff7b828648) at /home/pedro/gdb/mygit/src/gdb/gdbserver/utils.c:85
#17 0x0000000000426e00 in error (fmt=0x4480e7 "%s.") at /home/pedro/gdb/mygit/src/gdb/gdbserver/../common/errors.c:43
#18 0x0000000000414431 in perror_with_name (string=0x4435a6 "Can't determine port") at /home/pedro/gdb/mygit/src/gdb/gdbserver/utils.c:71
#19 0x0000000000407e49 in remote_open (name=0x7fff7b82a6fc ":2347") at /home/pedro/gdb/mygit/src/gdb/gdbserver/remote-utils.c:389
#20 0x0000000000411b12 in captured_main (argc=4, argv=0x7fff7b828a68) at /home/pedro/gdb/mygit/src/gdb/gdbserver/server.c:3414
#21 0x0000000000411ca2 in main (argc=4, argv=0x7fff7b828a68) at /home/pedro/gdb/mygit/src/gdb/gdbserver/server.c:3490
...

Seems like gdb disconnects, and we end up in remote_open again.
Then probably due to --once (the list descriptor is closed), that
fails and throws, which runs the "kill or detach everything" cleanup
(detach_or_kill_for_exit_cleanup).  And that ends up in your new
code here:

static int
linux_kill (int pid)
{
  struct process_info *process;
  struct lwp_info *lwp;
  struct target_waitstatus last;
  ptid_t last_ptid;

  /* If we're stopped while forking and we haven't followed yet,
     kill the child task.  We need to do this first because the
     parent will be sleeping if this is a vfork.  */

  get_last_target_status (&last_ptid, &last);

  if (last.kind == TARGET_WAITKIND_FORKED
      || last.kind == TARGET_WAITKIND_VFORKED)
    {
      lwp = find_lwp_pid (last.value.related_pid);
      gdb_assert (lwp != NULL);
      kill_wait_lwp (lwp);
      process = find_process_pid (ptid_get_pid (last.value.related_pid));
      the_target->mourn (process);
    }

trying to kill the vfork child.

Really get_last_target_status is not a good idea.  It's broken
on the native side already, and adding it to gdbserver too is
not a good idea.  E.g., consider scheduler-locking or non-stop.
Other events on other processes/threads can easily happen
and thus overwrite the last target status, before something
decides to kill the fork parent.

Thanks,
Pedro Alves
  
Don Breazeal March 17, 2015, 9:18 p.m. UTC | #3
On 3/2/2015 12:18 PM, Pedro Alves wrote:
> On 02/27/2015 12:46 AM, Don Breazeal wrote:
>>  - There are a couple of tests that show new failures that actually
>>    fail in the current mainline.  Details of these are as follows:
>>
>>    * when vfork events are enabled, gdb.base/disp-step-syscall.exp
>>      shows PASS => FAIL in .sum diffs.  The test actually always
>>      fails.  With native/master, we see
>>
>>       stepi^M
>>       FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn
>> (timeout)
>>
> 
> Hmm, I don't see that here.  I get a full pass on x86_64 Fedora 20.
> Can you try "set debug infrun 1" / "set debug lin-lwp 1" / "set debug displaced 1"
> to check if there's a gdb or kernel bug here?
> 
>>      With remote and extended-remote/master, we see a bogus PASS result:
>>       stepi^M
>>       [Inferior 1 (process 9399) exited normally]^M
>>       (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn
>>
>>     The criteria to pass that test are pretty lax:
>>       gdb_test "stepi" ".*" "stepi $syscall insn"
> 
> Yeah.  I see several other problems.  Here's a patch to improve it.
> 
> Comments?

Hi Pedro,
This patch makes sense to me, and it has been working great for me while
debugging my updates to the follow-fork patchset.  We will need to
update this once the remote follow patches are committed, I guess,
since presumably the kfail 13796 will be resolved then.

> 
> Unfortunately, with your full series applied, I get this:
> 
>  (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc"
>  stepi
>  Detaching from process 29944
>  Killing process(es): 29942 29944
>  /home/pedro/gdb/mygit/src/gdb/gdbserver/linux-low.c:998: A problem internal to GDBserver has been detected.
>  kill_wait_lwp: Assertion `res > 0' failed.
>  /home/pedro/gdb/mygit/src/gdb/thread.c:1182: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
>  A problem internal to GDB has been detected,
>  further debugging may prove unreliable.
>  Quit this debugging session? (y or n) FAIL: gdb.base/disp-step-syscall.exp: vfork: stepi vfork insn (GDB internal error)
>  Resyncing due to internal error.
>  n

The updated patchset just posted should address this issue.
https://sourceware.org/ml/gdb-patches/2015-03/msg00503.html

Thanks,
--Don

> 
> Note, you'll need this one:
> 
>  https://sourceware.org/ml/gdb-patches/2015-03/msg00045.html
> 
> for that internal error to result in a quick bail...
> 
> ----------
> From 1f825812d3f17a2940065d0de38592700e7437bc Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Mon, 2 Mar 2015 20:16:23 +0000
> Subject: [PATCH] Tighten gdb.base/disp-step-syscall.exp
> 
> This fixes several problems with this test.
> 
> E.g,. with --target_board=native-extended-gdbserver on
> x86_64 Fedora 20, I get:
> 
>  Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/disp-step-syscall.exp ...
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc" (timeout)
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork final pc
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: delete break vfork insn
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: continue to marker (vfork) (the program is no longer running)
> 
> And with --target=native-gdbserver, I get:
> 
>  Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/disp-step-syscall.exp ...
>  KPASS: gdb.base/disp-step-syscall.exp: vfork: single step over vfork (PRMS server/13796)
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: get hexadecimal valueof "$pc" (timeout)
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork final pc
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: delete break vfork insn
>  FAIL: gdb.base/disp-step-syscall.exp: vfork: continue to marker (vfork) (the program is no longer running)
> 
> First, the lack of fork support on remote targets is supposed to be
> kfailed, so the KPASS is obviously bogus.  The extended-remote board
> should have KFAILed too.
> 
> The problem is that the test is using "is_remote" instead of
> gdb_is_target_remote.
> 
> And then, I get:
> 
>  (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: set displaced-stepping on
>  stepi
> 
>  Program terminated with signal SIGSEGV, Segmentation fault.
>  The program no longer exists.
>  (gdb) PASS: gdb.base/disp-step-syscall.exp: vfork: single step over vfork
> 
> Obviously, that should be a FAIL.  The problem is that the test only
> expects SIGILL, not SIGSEGV.  It also doesn't bail correctly if an
> internal error or some other pattern caught by gdb_test_multiple
> matches.  The test doesn't really need to match specific exits/crashes
> patterns, if the PASS regex is improved, like in ...
> 
> ... this and the other "stepi" tests are a bit too lax, passing on
> ".*".  This tightens those up to expect "x/i" and the "=>" current PC
> indicator, like in:
> 
>  1: x/i $pc
>  => 0x3b36abc9e2 <vfork+34>:     syscall
> 
> On x86_64 Fedora 20, I now get a quick KFAIL instead of timeouts with
> both the native-extended-gdbserver and native-gdbserver boards:
> 
>  PASS: gdb.base/disp-step-syscall.exp: vfork: delete break vfork
>  PASS: gdb.base/disp-step-syscall.exp: vfork: continue to syscall insn vfork
>  PASS: gdb.base/disp-step-syscall.exp: vfork: set displaced-stepping on
>  KFAIL: gdb.base/disp-step-syscall.exp: vfork: single step over vfork (PRMS: server/13796)
> 
> and a full pass with native testing.
> 
> gdb/testsuite/
> 2015-03-02  Pedro Alves  <palves@redhat.com>
> 
> 	* gdb.base/disp-step-syscall.exp (disp_step_cross_syscall.exp):
> 	Use gdb_is_target_remote instead of is_remote.  Use
> 	gdb_test_multiple instead of gdb_expect.  Exit early if
> 	gdb_test_multiple hits its internal matches.  Tighten stepi tests
> 	expected output.  Fail on exit with any signal, instead of just
> 	SIGILL.
> ---
>  gdb/testsuite/gdb.base/disp-step-syscall.exp | 53 ++++++++++++++--------------
>  1 file changed, 26 insertions(+), 27 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.base/disp-step-syscall.exp b/gdb/testsuite/gdb.base/disp-step-syscall.exp
> index ff66f83..b13dce4 100644
> --- a/gdb/testsuite/gdb.base/disp-step-syscall.exp
> +++ b/gdb/testsuite/gdb.base/disp-step-syscall.exp
> @@ -49,6 +49,8 @@ proc disp_step_cross_syscall { syscall } {
>  	    return
>  	}
>  
> +	set is_target_remote [gdb_is_target_remote]
> +
>  	# Delete the breakpoint on main.
>  	gdb_test_no_output "delete break 1"
>  
> @@ -77,27 +79,34 @@ proc disp_step_cross_syscall { syscall } {
>  	gdb_test "display/i \$pc" ".*"
>  
>  
> -	# Single step until we see sysall insn or we reach the upper bound of loop
> -	# iterations.
> -	set see_syscall_insn 0
> -
> -	for {set i 0} {$i < 1000 && $see_syscall_insn == 0} {incr i} {
> -	    send_gdb "stepi\n"
> -	    gdb_expect {
> -		-re ".*$syscall_insn.*$gdb_prompt $" {
> -		    set see_syscall_insn 1
> +	# Single step until we see a syscall insn or we reach the
> +	# upper bound of loop iterations.
> +	set msg "find syscall insn in $syscall"
> +	set steps 0
> +	set max_steps 1000
> +	gdb_test_multiple "stepi" $msg {
> +	    -re ".*$syscall_insn.*$gdb_prompt $" {
> +		pass $msg
> +	    }
> +	    -re "x/i .*=>.*\r\n$gdb_prompt $" {
> +		incr steps
> +		if {$steps == $max_steps} {
> +		    fail $msg
> +		} else {
> +		    send_gdb "stepi\n"
> +		    exp_continue
>  		}
> -		-re ".*$gdb_prompt $" {}
>  	    }
>  	}
>  
> -	if {$see_syscall_insn == 0} then {
> -	    fail "find syscall insn in $syscall"
> +	if {$steps == $max_steps} {
>  	    return -1
>  	}
>  
>  	set syscall_insn_addr [get_hexadecimal_valueof "\$pc" "0"]
> -	gdb_test "stepi" ".*" "stepi $syscall insn"
> +	if {[gdb_test "stepi" "x/i .*=>.*" "stepi $syscall insn"] != 0} {
> +	    return -1
> +	}
>  	set syscall_insn_next_addr [get_hexadecimal_valueof "\$pc" "0"]
>  
>  	gdb_test "continue" "Continuing\\..*Breakpoint \[0-9\]+, (.* in |__libc_|)$syscall \\(\\).*" \
> @@ -121,22 +130,12 @@ proc disp_step_cross_syscall { syscall } {
>  	gdb_test_no_output "set displaced-stepping on"
>  
>  	# Check the address of next instruction of syscall.
> -	if {$syscall == "vfork" && [is_remote target]} {
> +	if {$syscall == "vfork" && $is_target_remote} {
>  	    setup_kfail server/13796 "*-*-*"
>  	}
> -	set test "single step over $syscall"
> -	gdb_test_multiple "stepi" $test {
> -	    -re "Program terminated with signal SIGILL,.*\r\n$gdb_prompt $" {
> -		fail $test
> -		return
> -	    }
> -	    -re "\\\[Inferior .* exited normally\\\].*\r\n$gdb_prompt $" {
> -		fail $test
> -		return
> -	    }
> -	    -re "\r\n$gdb_prompt $" {
> -		pass $test
> -	    }
> +
> +	if {[gdb_test "stepi" "x/i .*=>.*" "single step over $syscall"] != 0} {
> +	    return -1
>  	}
>  
>  	set syscall_insn_next_addr_found [get_hexadecimal_valueof "\$pc" "0"]
>
  
Pedro Alves March 18, 2015, 7:37 p.m. UTC | #4
Hi Don,

On 03/17/2015 09:18 PM, Breazeal, Don wrote:

> Hi Pedro,
> This patch makes sense to me, and it has been working great for me while
> debugging my updates to the follow-fork patchset.  We will need to
> update this once the remote follow patches are committed, I guess,
> since presumably the kfail 13796 will be resolved then.

Alright, might as well push it in then.  I've done that now.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/testsuite/gdb.base/disp-step-syscall.exp b/gdb/testsuite/gdb.base/disp-step-syscall.exp
index ff66f83..b13dce4 100644
--- a/gdb/testsuite/gdb.base/disp-step-syscall.exp
+++ b/gdb/testsuite/gdb.base/disp-step-syscall.exp
@@ -49,6 +49,8 @@  proc disp_step_cross_syscall { syscall } {
 	    return
 	}
 
+	set is_target_remote [gdb_is_target_remote]
+
 	# Delete the breakpoint on main.
 	gdb_test_no_output "delete break 1"
 
@@ -77,27 +79,34 @@  proc disp_step_cross_syscall { syscall } {
 	gdb_test "display/i \$pc" ".*"
 
 
-	# Single step until we see sysall insn or we reach the upper bound of loop
-	# iterations.
-	set see_syscall_insn 0
-
-	for {set i 0} {$i < 1000 && $see_syscall_insn == 0} {incr i} {
-	    send_gdb "stepi\n"
-	    gdb_expect {
-		-re ".*$syscall_insn.*$gdb_prompt $" {
-		    set see_syscall_insn 1
+	# Single step until we see a syscall insn or we reach the
+	# upper bound of loop iterations.
+	set msg "find syscall insn in $syscall"
+	set steps 0
+	set max_steps 1000
+	gdb_test_multiple "stepi" $msg {
+	    -re ".*$syscall_insn.*$gdb_prompt $" {
+		pass $msg
+	    }
+	    -re "x/i .*=>.*\r\n$gdb_prompt $" {
+		incr steps
+		if {$steps == $max_steps} {
+		    fail $msg
+		} else {
+		    send_gdb "stepi\n"
+		    exp_continue
 		}
-		-re ".*$gdb_prompt $" {}
 	    }
 	}
 
-	if {$see_syscall_insn == 0} then {
-	    fail "find syscall insn in $syscall"
+	if {$steps == $max_steps} {
 	    return -1
 	}
 
 	set syscall_insn_addr [get_hexadecimal_valueof "\$pc" "0"]
-	gdb_test "stepi" ".*" "stepi $syscall insn"
+	if {[gdb_test "stepi" "x/i .*=>.*" "stepi $syscall insn"] != 0} {
+	    return -1
+	}
 	set syscall_insn_next_addr [get_hexadecimal_valueof "\$pc" "0"]
 
 	gdb_test "continue" "Continuing\\..*Breakpoint \[0-9\]+, (.* in |__libc_|)$syscall \\(\\).*" \
@@ -121,22 +130,12 @@  proc disp_step_cross_syscall { syscall } {
 	gdb_test_no_output "set displaced-stepping on"
 
 	# Check the address of next instruction of syscall.
-	if {$syscall == "vfork" && [is_remote target]} {
+	if {$syscall == "vfork" && $is_target_remote} {
 	    setup_kfail server/13796 "*-*-*"
 	}
-	set test "single step over $syscall"
-	gdb_test_multiple "stepi" $test {
-	    -re "Program terminated with signal SIGILL,.*\r\n$gdb_prompt $" {
-		fail $test
-		return
-	    }
-	    -re "\\\[Inferior .* exited normally\\\].*\r\n$gdb_prompt $" {
-		fail $test
-		return
-	    }
-	    -re "\r\n$gdb_prompt $" {
-		pass $test
-	    }
+
+	if {[gdb_test "stepi" "x/i .*=>.*" "single step over $syscall"] != 0} {
+	    return -1
 	}
 
 	set syscall_insn_next_addr_found [get_hexadecimal_valueof "\$pc" "0"]