Guard against killing unrelated processes in amd64-disp-step.exp

Message ID yddfs5srrc4.fsf@CeBiTec.Uni-Bielefeld.DE
State New
Headers
Series Guard against killing unrelated processes in amd64-disp-step.exp |

Commit Message

Rainer Orth July 13, 2023, 11:19 a.m. UTC
  When testing current gdb trunk on Solaris/amd64, the whole session was
reliably terminated by make check.  I could trace this to the following
entry in gdb.arch/amd64-disp-step/gdb.log:

FAIL: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=on: get inferior pid
Executing on target: kill -ALRM -1    (timeout = 300)
builtin_spawn -ignore SIGHUP kill -ALRM -1

If $inferior_pid doesn't refer a single process for some reason, this
kill would terminate either a process group or the whole session.

This patch avoids this by ensuring that the pid arg is positive.

Tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu.

Ok for trunk?

	Rainer
  

Comments

Tom Tromey July 13, 2023, 4:34 p.m. UTC | #1
>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:

Rainer> When testing current gdb trunk on Solaris/amd64, the whole session was
Rainer> reliably terminated by make check.  I could trace this to the following
Rainer> entry in gdb.arch/amd64-disp-step/gdb.log:

Thank you for the patch.

Rainer> If $inferior_pid doesn't refer a single process for some reason, this
Rainer> kill would terminate either a process group or the whole session.

I don't mind the patch, it seems like an improvement -- but I wonder why
this ends up as -1, and whether a fix belongs elsewhere.

Tom
  
Rainer Orth July 13, 2023, 5:59 p.m. UTC | #2
Hi Tom,

>>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:
>
> Rainer> When testing current gdb trunk on Solaris/amd64, the whole session was
> Rainer> reliably terminated by make check.  I could trace this to the following
> Rainer> entry in gdb.arch/amd64-disp-step/gdb.log:
>
> Thank you for the patch.
>
> Rainer> If $inferior_pid doesn't refer a single process for some reason, this
> Rainer> kill would terminate either a process group or the whole session.
>
> I don't mind the patch, it seems like an improvement -- but I wonder why

That's what I thought: if for whatever reason the pid turns
non-positive, hell breaks lose if that's passed to kill unchecked.

> this ends up as -1, and whether a fix belongs elsewhere.

gdb.log shows

(gdb) PASS: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=off: verify_regs: rdi expected value
jump test_rip_rcx
Continuing at 0x4015b2.

Program terminated with signal SIGALRM, Alarm clock.
The program no longer exists.
[...]
[Current inferior is 1 [<null>] (/vol/obj/gnu/gdb/gdb/11.4-amd64-dist/gdb/testsuite/outputs/gdb.arch/amd64-disp-step/amd64-disp-step)]

lib/gdb.exp (get_inferior_pid) turns this <null> into -1:

proc get_inferior_pid {} {
    set pid -1
    gdb_test_multiple "inferior" "get inferior pid" {
        -re "process (\[0-9\]*).*$::gdb_prompt $" {
            set pid $expect_out(1,string)
            pass $gdb_test_name
        }
    }
    return $pid
}

This is certainly something that shouldn't happen and fixing the
underlying problem(s) would avoid the kill for sure.  However, the
Solaris gdb port is in a terrible state with about 3200 testsuite
failures on amd64-pc-solaris2.11.  I've made a few attempts to fix the
most glaring issues, but that never got me anywere until I've decided
that the gdb codebase is simply beyond my abilities.

	Rainer
  
Pedro Alves July 14, 2023, 5:25 p.m. UTC | #3
On 2023-07-13 18:59, Rainer Orth wrote:
> Hi Tom,
> 
>>>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:
>>
>> Rainer> When testing current gdb trunk on Solaris/amd64, the whole session was
>> Rainer> reliably terminated by make check.  I could trace this to the following
>> Rainer> entry in gdb.arch/amd64-disp-step/gdb.log:
>>
>> Thank you for the patch.
>>
>> Rainer> If $inferior_pid doesn't refer a single process for some reason, this
>> Rainer> kill would terminate either a process group or the whole session.
>>
>> I don't mind the patch, it seems like an improvement -- but I wonder why

Agreed.  This is the second time all these years that I'm seeing something like this.
ISTR some test killing everything on OpenBSD many years ago.  :-)

> 
> That's what I thought: if for whatever reason the pid turns
> non-positive, hell breaks lose if that's passed to kill unchecked.
> 
>> this ends up as -1, and whether a fix belongs elsewhere.
> 
> gdb.log shows
> 
> (gdb) PASS: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=off: verify_regs: rdi expected value
> jump test_rip_rcx
> Continuing at 0x4015b2.
> 
> Program terminated with signal SIGALRM, Alarm clock.
> The program no longer exists.
> [...]
> [Current inferior is 1 [<null>] (/vol/obj/gnu/gdb/gdb/11.4-amd64-dist/gdb/testsuite/outputs/gdb.arch/amd64-disp-step/amd64-disp-step)]
> 
> lib/gdb.exp (get_inferior_pid) turns this <null> into -1:
> 

I'm not exactly sure what is the particular problem triggering what you're seeing, but I observe a couple things:

#1 - Solaris doesn't support displaced stepping.  It could, the x86-64 gdbarch displaced step implementation is pretty generic.
But they're only installed on Linux today.  That's because displaced stepping is more useful with non-stop, and Solaris doesn't
support that.

However, I tweaked the testcase to force displaced-stepping off, with:

    -gdb_test "set displaced-stepping on" ""
    +gdb_test "set displaced-stepping off" ""

and the test still passes cleanly on Linux.  So that shouldn't itself be a problem.  GDB will just do the
regular remove-breakpoint, step, re-insert breakpoint dance on Solaris.

#2 - I did notice however something else.  The .S file has this:

 /* test syscall */

	.global test_syscall
	mov $0x27,%eax /* getpid */
 test_syscall:
	syscall
	nop
 test_syscall_end:
	nop

That seems like it is assuming Linux syscalls?  Or is 0x27 getpid on Solaris as well?  If not, I wouldn't
be surprised if that syscall is doing something undefined.  Wonder what happens if you comment out that
code and the corresponding test in the .exp file.

Pedro Alves
  
Andrew Burgess July 15, 2023, 1:38 p.m. UTC | #4
Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:

> When testing current gdb trunk on Solaris/amd64, the whole session was
> reliably terminated by make check.  I could trace this to the following
> entry in gdb.arch/amd64-disp-step/gdb.log:
>
> FAIL: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=on: get inferior pid
> Executing on target: kill -ALRM -1    (timeout = 300)
> builtin_spawn -ignore SIGHUP kill -ALRM -1
>
> If $inferior_pid doesn't refer a single process for some reason, this
> kill would terminate either a process group or the whole session.
>
> This patch avoids this by ensuring that the pid arg is positive.
>
> Tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu.
>
> Ok for trunk?
>
> 	Rainer
>
> -- 
> -----------------------------------------------------------------------------
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step.exp b/gdb/testsuite/gdb.arch/amd64-disp-step.exp
> --- a/gdb/testsuite/gdb.arch/amd64-disp-step.exp
> +++ b/gdb/testsuite/gdb.arch/amd64-disp-step.exp
> @@ -222,7 +222,10 @@ proc rip_test { reg test_start_label tes
>  	    # If we use 'signal' to send the signal GDB doesn't actually do
>  	    # the displaced step, but instead just delivers the signal.
>  	    set inferior_pid [get_inferior_pid]
> -	    remote_exec target "kill -ALRM $inferior_pid"
> +	    # Ensure that $inferior_pid refers to a single process.
> +	    if {$inferior_pid > 0} {
> +		remote_exec target "kill -ALRM $inferior_pid"
> +	    }

Does this not hide the fact that the test is no longer doing what it
expected?

I'm fine with the 'if {$inferior_pid > 0}' being added to ensure we
don't signal some random process(es), but I think we should probably add
something like:

  gdb_assert {[expr $inferior_pid > 0]} \
    "check for a sane inferior pid"
  if {$inferior_pid > 0} {
    remote_exec target "kill -ALRM $inferior_pid"
  }

This way you will still see a FAIL.

Thoughts?

Thanks,
Andrew
  
Rainer Orth July 19, 2023, 12:21 p.m. UTC | #5
Hi Pedro,

>> That's what I thought: if for whatever reason the pid turns
>> non-positive, hell breaks lose if that's passed to kill unchecked.
>> 
>>> this ends up as -1, and whether a fix belongs elsewhere.
>> 
>> gdb.log shows
>> 
>> (gdb) PASS: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=off:
>> verify_regs: rdi expected value
>> jump test_rip_rcx
>> Continuing at 0x4015b2.
>> 
>> Program terminated with signal SIGALRM, Alarm clock.
>> The program no longer exists.
>> [...]
>> [Current inferior is 1 [<null>]
>> (/vol/obj/gnu/gdb/gdb/11.4-amd64-dist/gdb/testsuite/outputs/gdb.arch/amd64-disp-step/amd64-disp-step)]
>> 
>> lib/gdb.exp (get_inferior_pid) turns this <null> into -1:
>> 
>
> I'm not exactly sure what is the particular problem triggering what you're seeing, but I observe a couple things:
>
> #1 - Solaris doesn't support displaced stepping.  It could, the x86-64 gdbarch displaced step implementation is pretty generic.
> But they're only installed on Linux today.  That's because displaced stepping is more useful with non-stop, and Solaris doesn't
> support that.

TBH, given the miserable state of the Solaris port (3000+ failures), I
haven't even looked into what features it is missing.  Unless the basis
is solid, there's probably not much point in that.

> However, I tweaked the testcase to force displaced-stepping off, with:
>
>     -gdb_test "set displaced-stepping on" ""
>     +gdb_test "set displaced-stepping off" ""
>
> and the test still passes cleanly on Linux.  So that shouldn't itself be a problem.  GDB will just do the
> regular remove-breakpoint, step, re-insert breakpoint dance on Solaris.

What had me wondering is this snippet in the test's gdb.log:

(gdb) PASS: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=off: verify_regs: rdi expected value
jump test_rip_rcx
Continuing at 0x401579.

Program terminated with signal SIGALRM, Alarm clock.
The program no longer exists.
(gdb) FAIL: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=on: jump back to test_rip_rcx
[...]
(gdb) set $rdi = 0
No registers.
(gdb) inferior
[Current inferior is 1 [<null>] (/vol/obj/gnu/gdb/gdb/11.4-amd64-dist/gdb/testsuite/outputs/gdb.arch/amd64-disp-step/outputs/gdb.arch/amd64-disp-step/amd64-disp-step)]
(gdb) FAIL: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=on: get inferior pid
Executing on target: kill -ALRM -1    (timeout = 300)
builtin_spawn -ignore SIGHUP kill -ALRM -1

  I really don't know where the first kill -ALRM is coming from.  In
  amd64-disp.step.exp (rip_test), the run for rcx with signal_modes=off
  succeeds, as does the jump to test_rip_rcx.

  On the next iteration, with signal_modes=on, however, before set_regs
  even started, the SIGALRM arrives (which should happen only *after*
  the set_regs IIUC), kills the inferior, and causes the -1
  inferior_pid.

  Quite weird unless I'm fundamentally misunderstanding something
  (highly probable).

> #2 - I did notice however something else.  The .S file has this:
>
>  /* test syscall */
>
> 	.global test_syscall
> 	mov $0x27,%eax /* getpid */
>  test_syscall:
> 	syscall
> 	nop
>  test_syscall_end:
> 	nop
>
> That seems like it is assuming Linux syscalls?  Or is 0x27 getpid on Solaris as well?  If not, I wouldn't
> be surprised if that syscall is doing something undefined.  Wonder what happens if you comment out that
> code and the corresponding test in the .exp file.

0x27 is SYS_pgrpsys on Solaris, with subcodes for the likes of getpgrp,
setpgrp etc.  However, disabling both test and code doesn't make a
difference for the overall outcome of the test.

	Rainer
  
Rainer Orth July 19, 2023, 12:37 p.m. UTC | #6
Hi Andrew,

>> diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step.exp b/gdb/testsuite/gdb.arch/amd64-disp-step.exp
>> --- a/gdb/testsuite/gdb.arch/amd64-disp-step.exp
>> +++ b/gdb/testsuite/gdb.arch/amd64-disp-step.exp
>> @@ -222,7 +222,10 @@ proc rip_test { reg test_start_label tes
>>  	    # If we use 'signal' to send the signal GDB doesn't actually do
>>  	    # the displaced step, but instead just delivers the signal.
>>  	    set inferior_pid [get_inferior_pid]
>> -	    remote_exec target "kill -ALRM $inferior_pid"
>> +	    # Ensure that $inferior_pid refers to a single process.
>> +	    if {$inferior_pid > 0} {
>> +		remote_exec target "kill -ALRM $inferior_pid"
>> +	    }
>
> Does this not hide the fact that the test is no longer doing what it
> expected?

it does.  However, the results for this particular test were so bad
already that I didn't think about one or more FAILs here.

> I'm fine with the 'if {$inferior_pid > 0}' being added to ensure we
> don't signal some random process(es), but I think we should probably add
> something like:
>
>   gdb_assert {[expr $inferior_pid > 0]} \
>     "check for a sane inferior pid"
>   if {$inferior_pid > 0} {
>     remote_exec target "kill -ALRM $inferior_pid"
>   }
>
> This way you will still see a FAIL.

True, but you will also see quite a bunch of PASSes in the working case
that tell you nothing.  Seems like unnecessary noise to me.  Isn't there
another way to convey the failure info without that noise?

	Rainer
  
Rainer Orth Aug. 1, 2023, 2:05 p.m. UTC | #7
Hi Andrew,

>>> diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step.exp
>>> b/gdb/testsuite/gdb.arch/amd64-disp-step.exp
>>> --- a/gdb/testsuite/gdb.arch/amd64-disp-step.exp
>>> +++ b/gdb/testsuite/gdb.arch/amd64-disp-step.exp
>>> @@ -222,7 +222,10 @@ proc rip_test { reg test_start_label tes
>>>  	    # If we use 'signal' to send the signal GDB doesn't actually do
>>>  	    # the displaced step, but instead just delivers the signal.
>>>  	    set inferior_pid [get_inferior_pid]
>>> -	    remote_exec target "kill -ALRM $inferior_pid"
>>> +	    # Ensure that $inferior_pid refers to a single process.
>>> +	    if {$inferior_pid > 0} {
>>> +		remote_exec target "kill -ALRM $inferior_pid"
>>> +	    }
>>
>> Does this not hide the fact that the test is no longer doing what it
>> expected?
>
> it does.  However, the results for this particular test were so bad
> already that I didn't think about one or more FAILs here.
>
>> I'm fine with the 'if {$inferior_pid > 0}' being added to ensure we
>> don't signal some random process(es), but I think we should probably add
>> something like:
>>
>>   gdb_assert {[expr $inferior_pid > 0]} \
>>     "check for a sane inferior pid"
>>   if {$inferior_pid > 0} {
>>     remote_exec target "kill -ALRM $inferior_pid"
>>   }
>>
>> This way you will still see a FAIL.
>
> True, but you will also see quite a bunch of PASSes in the working case
> that tell you nothing.  Seems like unnecessary noise to me.  Isn't there
> another way to convey the failure info without that noise?

how should we proceed with this patch?  It would be a pity to release
GDB 14 with make check killing the whole session on Solaris...

Thanks.
        Rainer
  
Tom Tromey Aug. 2, 2023, 8:56 p.m. UTC | #8
>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:

>>> gdb_assert {[expr $inferior_pid > 0]} \
>>> "check for a sane inferior pid"
>>> if {$inferior_pid > 0} {
>>> remote_exec target "kill -ALRM $inferior_pid"
>>> }
>>> 
>>> This way you will still see a FAIL.
>> 
>> True, but you will also see quite a bunch of PASSes in the working case
>> that tell you nothing.  Seems like unnecessary noise to me.  Isn't there
>> another way to convey the failure info without that noise?

Rainer> how should we proceed with this patch?  It would be a pity to release
Rainer> GDB 14 with make check killing the whole session on Solaris...

I think just adding Andrew's proposed assert to your patch should be
good enough.

The idea behind the assert is so that we can detect the bad case, if it
ever happens, on a platform that is otherwise ok.  The noise of an extra
pass doesn't seem so bad, we have zillions of those already.  The noise
from the fail also shouldn't be too bad since, IIRC, you said this test
is already not fully passing on Solaris.

Anyway to sum up, the assert would be there as a "just in case" for
other platforms, not Solaris.

thanks,
Tom
  
Rainer Orth Aug. 7, 2023, 1:51 p.m. UTC | #9
Hi Tom,

>>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:
>
>>>> gdb_assert {[expr $inferior_pid > 0]} \
>>>> "check for a sane inferior pid"
>>>> if {$inferior_pid > 0} {
>>>> remote_exec target "kill -ALRM $inferior_pid"
>>>> }
>>>> 
>>>> This way you will still see a FAIL.
>>> 
>>> True, but you will also see quite a bunch of PASSes in the working case
>>> that tell you nothing.  Seems like unnecessary noise to me.  Isn't there
>>> another way to convey the failure info without that noise?
>
> Rainer> how should we proceed with this patch?  It would be a pity to release
> Rainer> GDB 14 with make check killing the whole session on Solaris...
>
> I think just adding Andrew's proposed assert to your patch should be
> good enough.
>
> The idea behind the assert is so that we can detect the bad case, if it
> ever happens, on a platform that is otherwise ok.  The noise of an extra
> pass doesn't seem so bad, we have zillions of those already.  The noise
> from the fail also shouldn't be too bad since, IIRC, you said this test
> is already not fully passing on Solaris.
>
> Anyway to sum up, the assert would be there as a "just in case" for
> other platforms, not Solaris.

fine with me.  Here's the patch as amended.

Re-tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu.

Ok for master now?

Thanks.
	Rainer
  
Tom Tromey Aug. 7, 2023, 10:14 p.m. UTC | #10
>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:

Rainer> Re-tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu.

Rainer> Ok for master now?

Yes, thank you.

Approved-By: Tom Tromey <tom@tromey.com>

Tom
  

Patch

diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step.exp b/gdb/testsuite/gdb.arch/amd64-disp-step.exp
--- a/gdb/testsuite/gdb.arch/amd64-disp-step.exp
+++ b/gdb/testsuite/gdb.arch/amd64-disp-step.exp
@@ -222,7 +222,10 @@  proc rip_test { reg test_start_label tes
 	    # If we use 'signal' to send the signal GDB doesn't actually do
 	    # the displaced step, but instead just delivers the signal.
 	    set inferior_pid [get_inferior_pid]
-	    remote_exec target "kill -ALRM $inferior_pid"
+	    # Ensure that $inferior_pid refers to a single process.
+	    if {$inferior_pid > 0} {
+		remote_exec target "kill -ALRM $inferior_pid"
+	    }
 	}
 
 	gdb_test "continue" \