Message ID | yddfs5srrc4.fsf@CeBiTec.Uni-Bielefeld.DE |
---|---|
State | New |
Headers |
Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2F0723857006 for <patchwork@sourceware.org>; Thu, 13 Jul 2023 11:20:14 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from smtp.CeBiTec.Uni-Bielefeld.DE (smtp.CeBiTec.Uni-Bielefeld.DE [129.70.160.84]) by sourceware.org (Postfix) with ESMTPS id 68E4F3858C74 for <gdb-patches@sourceware.org>; Thu, 13 Jul 2023 11:19:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 68E4F3858C74 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=CeBiTec.Uni-Bielefeld.DE Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cebitec.uni-bielefeld.de Received: from localhost (localhost [127.0.0.1]) by smtp.CeBiTec.Uni-Bielefeld.DE (Postfix) with ESMTP id 41D33AE95B; Thu, 13 Jul 2023 13:19:56 +0200 (CEST) X-Virus-Scanned: amavisd-new at CeBiTec.Uni-Bielefeld.DE Received: from smtp.CeBiTec.Uni-Bielefeld.DE ([127.0.0.1]) by localhost (smtp.cebitec.uni-bielefeld.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a6zFVASi0Win; Thu, 13 Jul 2023 13:19:55 +0200 (CEST) Received: from manam.CeBiTec.Uni-Bielefeld.DE (p50855614.dip0.t-ipconnect.de [80.133.86.20]) (Authenticated sender: ro) by smtp.CeBiTec.Uni-Bielefeld.DE (Postfix) with ESMTPSA id 9C3FFAEA36; Thu, 13 Jul 2023 13:19:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=CeBiTec.Uni-Bielefeld.DE; s=20200306; t=1689247195; bh=1+3gIt2voXk56j5EeaYzp2EaQLi5vDPRWvQxnElWZV0=; h=From:To:Cc:Subject:Date:From; b=PQG85GZP2PIw/wOOepTWXfGa7VMtyj72pbFR4auzc2XSd15TtWK1ggAugLQE3Esi6 1ly7JWn/PUVv7BtsJSj5QaGiICid63KRYLGRip0E9GBNMkNsKSttaAmBq5XqtOwX18 q/g6yNjYeCd4dUb3bCePXjEnTCVi7G8YKZhy8+Ip1RIFmkZ/EMaop2/wQnAWDfdNyc p7vc9ckLvPI5mKnrEnRp55v4ia/rZyAmW/sbVqRqkPQaB0C4bZ0iaz6brhinFRGK6T tKupgw/4yiD6RV17ZRE4uzDFhnosvkxmq/UuTqXE/SXo4RAfCXIXPxxfPek+cX3kMa CeFxzZtsIddiw== From: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> To: gdb-patches@sourceware.org Cc: Andrew Burgess <aburgess@redhat.com> Subject: [PATCH] Guard against killing unrelated processes in amd64-disp-step.exp Date: Thu, 13 Jul 2023 13:19:55 +0200 Message-ID: <yddfs5srrc4.fsf@CeBiTec.Uni-Bielefeld.DE> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1.90 (usg-unix-v) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Status: No, score=-3792.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Guard against killing unrelated processes in amd64-disp-step.exp
|
|
Commit Message
Rainer Orth
July 13, 2023, 11:19 a.m. UTC
When testing current gdb trunk on Solaris/amd64, the whole session was reliably terminated by make check. I could trace this to the following entry in gdb.arch/amd64-disp-step/gdb.log: FAIL: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=on: get inferior pid Executing on target: kill -ALRM -1 (timeout = 300) builtin_spawn -ignore SIGHUP kill -ALRM -1 If $inferior_pid doesn't refer a single process for some reason, this kill would terminate either a process group or the whole session. This patch avoids this by ensuring that the pid arg is positive. Tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu. Ok for trunk? Rainer
Comments
>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:
Rainer> When testing current gdb trunk on Solaris/amd64, the whole session was
Rainer> reliably terminated by make check. I could trace this to the following
Rainer> entry in gdb.arch/amd64-disp-step/gdb.log:
Thank you for the patch.
Rainer> If $inferior_pid doesn't refer a single process for some reason, this
Rainer> kill would terminate either a process group or the whole session.
I don't mind the patch, it seems like an improvement -- but I wonder why
this ends up as -1, and whether a fix belongs elsewhere.
Tom
Hi Tom, >>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: > > Rainer> When testing current gdb trunk on Solaris/amd64, the whole session was > Rainer> reliably terminated by make check. I could trace this to the following > Rainer> entry in gdb.arch/amd64-disp-step/gdb.log: > > Thank you for the patch. > > Rainer> If $inferior_pid doesn't refer a single process for some reason, this > Rainer> kill would terminate either a process group or the whole session. > > I don't mind the patch, it seems like an improvement -- but I wonder why That's what I thought: if for whatever reason the pid turns non-positive, hell breaks lose if that's passed to kill unchecked. > this ends up as -1, and whether a fix belongs elsewhere. gdb.log shows (gdb) PASS: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=off: verify_regs: rdi expected value jump test_rip_rcx Continuing at 0x4015b2. Program terminated with signal SIGALRM, Alarm clock. The program no longer exists. [...] [Current inferior is 1 [<null>] (/vol/obj/gnu/gdb/gdb/11.4-amd64-dist/gdb/testsuite/outputs/gdb.arch/amd64-disp-step/amd64-disp-step)] lib/gdb.exp (get_inferior_pid) turns this <null> into -1: proc get_inferior_pid {} { set pid -1 gdb_test_multiple "inferior" "get inferior pid" { -re "process (\[0-9\]*).*$::gdb_prompt $" { set pid $expect_out(1,string) pass $gdb_test_name } } return $pid } This is certainly something that shouldn't happen and fixing the underlying problem(s) would avoid the kill for sure. However, the Solaris gdb port is in a terrible state with about 3200 testsuite failures on amd64-pc-solaris2.11. I've made a few attempts to fix the most glaring issues, but that never got me anywere until I've decided that the gdb codebase is simply beyond my abilities. Rainer
On 2023-07-13 18:59, Rainer Orth wrote: > Hi Tom, > >>>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: >> >> Rainer> When testing current gdb trunk on Solaris/amd64, the whole session was >> Rainer> reliably terminated by make check. I could trace this to the following >> Rainer> entry in gdb.arch/amd64-disp-step/gdb.log: >> >> Thank you for the patch. >> >> Rainer> If $inferior_pid doesn't refer a single process for some reason, this >> Rainer> kill would terminate either a process group or the whole session. >> >> I don't mind the patch, it seems like an improvement -- but I wonder why Agreed. This is the second time all these years that I'm seeing something like this. ISTR some test killing everything on OpenBSD many years ago. :-) > > That's what I thought: if for whatever reason the pid turns > non-positive, hell breaks lose if that's passed to kill unchecked. > >> this ends up as -1, and whether a fix belongs elsewhere. > > gdb.log shows > > (gdb) PASS: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=off: verify_regs: rdi expected value > jump test_rip_rcx > Continuing at 0x4015b2. > > Program terminated with signal SIGALRM, Alarm clock. > The program no longer exists. > [...] > [Current inferior is 1 [<null>] (/vol/obj/gnu/gdb/gdb/11.4-amd64-dist/gdb/testsuite/outputs/gdb.arch/amd64-disp-step/amd64-disp-step)] > > lib/gdb.exp (get_inferior_pid) turns this <null> into -1: > I'm not exactly sure what is the particular problem triggering what you're seeing, but I observe a couple things: #1 - Solaris doesn't support displaced stepping. It could, the x86-64 gdbarch displaced step implementation is pretty generic. But they're only installed on Linux today. That's because displaced stepping is more useful with non-stop, and Solaris doesn't support that. However, I tweaked the testcase to force displaced-stepping off, with: -gdb_test "set displaced-stepping on" "" +gdb_test "set displaced-stepping off" "" and the test still passes cleanly on Linux. So that shouldn't itself be a problem. GDB will just do the regular remove-breakpoint, step, re-insert breakpoint dance on Solaris. #2 - I did notice however something else. The .S file has this: /* test syscall */ .global test_syscall mov $0x27,%eax /* getpid */ test_syscall: syscall nop test_syscall_end: nop That seems like it is assuming Linux syscalls? Or is 0x27 getpid on Solaris as well? If not, I wouldn't be surprised if that syscall is doing something undefined. Wonder what happens if you comment out that code and the corresponding test in the .exp file. Pedro Alves
Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: > When testing current gdb trunk on Solaris/amd64, the whole session was > reliably terminated by make check. I could trace this to the following > entry in gdb.arch/amd64-disp-step/gdb.log: > > FAIL: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=on: get inferior pid > Executing on target: kill -ALRM -1 (timeout = 300) > builtin_spawn -ignore SIGHUP kill -ALRM -1 > > If $inferior_pid doesn't refer a single process for some reason, this > kill would terminate either a process group or the whole session. > > This patch avoids this by ensuring that the pid arg is positive. > > Tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu. > > Ok for trunk? > > Rainer > > -- > ----------------------------------------------------------------------------- > Rainer Orth, Center for Biotechnology, Bielefeld University > > > diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step.exp b/gdb/testsuite/gdb.arch/amd64-disp-step.exp > --- a/gdb/testsuite/gdb.arch/amd64-disp-step.exp > +++ b/gdb/testsuite/gdb.arch/amd64-disp-step.exp > @@ -222,7 +222,10 @@ proc rip_test { reg test_start_label tes > # If we use 'signal' to send the signal GDB doesn't actually do > # the displaced step, but instead just delivers the signal. > set inferior_pid [get_inferior_pid] > - remote_exec target "kill -ALRM $inferior_pid" > + # Ensure that $inferior_pid refers to a single process. > + if {$inferior_pid > 0} { > + remote_exec target "kill -ALRM $inferior_pid" > + } Does this not hide the fact that the test is no longer doing what it expected? I'm fine with the 'if {$inferior_pid > 0}' being added to ensure we don't signal some random process(es), but I think we should probably add something like: gdb_assert {[expr $inferior_pid > 0]} \ "check for a sane inferior pid" if {$inferior_pid > 0} { remote_exec target "kill -ALRM $inferior_pid" } This way you will still see a FAIL. Thoughts? Thanks, Andrew
Hi Pedro, >> That's what I thought: if for whatever reason the pid turns >> non-positive, hell breaks lose if that's passed to kill unchecked. >> >>> this ends up as -1, and whether a fix belongs elsewhere. >> >> gdb.log shows >> >> (gdb) PASS: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=off: >> verify_regs: rdi expected value >> jump test_rip_rcx >> Continuing at 0x4015b2. >> >> Program terminated with signal SIGALRM, Alarm clock. >> The program no longer exists. >> [...] >> [Current inferior is 1 [<null>] >> (/vol/obj/gnu/gdb/gdb/11.4-amd64-dist/gdb/testsuite/outputs/gdb.arch/amd64-disp-step/amd64-disp-step)] >> >> lib/gdb.exp (get_inferior_pid) turns this <null> into -1: >> > > I'm not exactly sure what is the particular problem triggering what you're seeing, but I observe a couple things: > > #1 - Solaris doesn't support displaced stepping. It could, the x86-64 gdbarch displaced step implementation is pretty generic. > But they're only installed on Linux today. That's because displaced stepping is more useful with non-stop, and Solaris doesn't > support that. TBH, given the miserable state of the Solaris port (3000+ failures), I haven't even looked into what features it is missing. Unless the basis is solid, there's probably not much point in that. > However, I tweaked the testcase to force displaced-stepping off, with: > > -gdb_test "set displaced-stepping on" "" > +gdb_test "set displaced-stepping off" "" > > and the test still passes cleanly on Linux. So that shouldn't itself be a problem. GDB will just do the > regular remove-breakpoint, step, re-insert breakpoint dance on Solaris. What had me wondering is this snippet in the test's gdb.log: (gdb) PASS: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=off: verify_regs: rdi expected value jump test_rip_rcx Continuing at 0x401579. Program terminated with signal SIGALRM, Alarm clock. The program no longer exists. (gdb) FAIL: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=on: jump back to test_rip_rcx [...] (gdb) set $rdi = 0 No registers. (gdb) inferior [Current inferior is 1 [<null>] (/vol/obj/gnu/gdb/gdb/11.4-amd64-dist/gdb/testsuite/outputs/gdb.arch/amd64-disp-step/outputs/gdb.arch/amd64-disp-step/amd64-disp-step)] (gdb) FAIL: gdb.arch/amd64-disp-step.exp: add into rcx: send_signal=on: get inferior pid Executing on target: kill -ALRM -1 (timeout = 300) builtin_spawn -ignore SIGHUP kill -ALRM -1 I really don't know where the first kill -ALRM is coming from. In amd64-disp.step.exp (rip_test), the run for rcx with signal_modes=off succeeds, as does the jump to test_rip_rcx. On the next iteration, with signal_modes=on, however, before set_regs even started, the SIGALRM arrives (which should happen only *after* the set_regs IIUC), kills the inferior, and causes the -1 inferior_pid. Quite weird unless I'm fundamentally misunderstanding something (highly probable). > #2 - I did notice however something else. The .S file has this: > > /* test syscall */ > > .global test_syscall > mov $0x27,%eax /* getpid */ > test_syscall: > syscall > nop > test_syscall_end: > nop > > That seems like it is assuming Linux syscalls? Or is 0x27 getpid on Solaris as well? If not, I wouldn't > be surprised if that syscall is doing something undefined. Wonder what happens if you comment out that > code and the corresponding test in the .exp file. 0x27 is SYS_pgrpsys on Solaris, with subcodes for the likes of getpgrp, setpgrp etc. However, disabling both test and code doesn't make a difference for the overall outcome of the test. Rainer
Hi Andrew, >> diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step.exp b/gdb/testsuite/gdb.arch/amd64-disp-step.exp >> --- a/gdb/testsuite/gdb.arch/amd64-disp-step.exp >> +++ b/gdb/testsuite/gdb.arch/amd64-disp-step.exp >> @@ -222,7 +222,10 @@ proc rip_test { reg test_start_label tes >> # If we use 'signal' to send the signal GDB doesn't actually do >> # the displaced step, but instead just delivers the signal. >> set inferior_pid [get_inferior_pid] >> - remote_exec target "kill -ALRM $inferior_pid" >> + # Ensure that $inferior_pid refers to a single process. >> + if {$inferior_pid > 0} { >> + remote_exec target "kill -ALRM $inferior_pid" >> + } > > Does this not hide the fact that the test is no longer doing what it > expected? it does. However, the results for this particular test were so bad already that I didn't think about one or more FAILs here. > I'm fine with the 'if {$inferior_pid > 0}' being added to ensure we > don't signal some random process(es), but I think we should probably add > something like: > > gdb_assert {[expr $inferior_pid > 0]} \ > "check for a sane inferior pid" > if {$inferior_pid > 0} { > remote_exec target "kill -ALRM $inferior_pid" > } > > This way you will still see a FAIL. True, but you will also see quite a bunch of PASSes in the working case that tell you nothing. Seems like unnecessary noise to me. Isn't there another way to convey the failure info without that noise? Rainer
Hi Andrew, >>> diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step.exp >>> b/gdb/testsuite/gdb.arch/amd64-disp-step.exp >>> --- a/gdb/testsuite/gdb.arch/amd64-disp-step.exp >>> +++ b/gdb/testsuite/gdb.arch/amd64-disp-step.exp >>> @@ -222,7 +222,10 @@ proc rip_test { reg test_start_label tes >>> # If we use 'signal' to send the signal GDB doesn't actually do >>> # the displaced step, but instead just delivers the signal. >>> set inferior_pid [get_inferior_pid] >>> - remote_exec target "kill -ALRM $inferior_pid" >>> + # Ensure that $inferior_pid refers to a single process. >>> + if {$inferior_pid > 0} { >>> + remote_exec target "kill -ALRM $inferior_pid" >>> + } >> >> Does this not hide the fact that the test is no longer doing what it >> expected? > > it does. However, the results for this particular test were so bad > already that I didn't think about one or more FAILs here. > >> I'm fine with the 'if {$inferior_pid > 0}' being added to ensure we >> don't signal some random process(es), but I think we should probably add >> something like: >> >> gdb_assert {[expr $inferior_pid > 0]} \ >> "check for a sane inferior pid" >> if {$inferior_pid > 0} { >> remote_exec target "kill -ALRM $inferior_pid" >> } >> >> This way you will still see a FAIL. > > True, but you will also see quite a bunch of PASSes in the working case > that tell you nothing. Seems like unnecessary noise to me. Isn't there > another way to convey the failure info without that noise? how should we proceed with this patch? It would be a pity to release GDB 14 with make check killing the whole session on Solaris... Thanks. Rainer
>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: >>> gdb_assert {[expr $inferior_pid > 0]} \ >>> "check for a sane inferior pid" >>> if {$inferior_pid > 0} { >>> remote_exec target "kill -ALRM $inferior_pid" >>> } >>> >>> This way you will still see a FAIL. >> >> True, but you will also see quite a bunch of PASSes in the working case >> that tell you nothing. Seems like unnecessary noise to me. Isn't there >> another way to convey the failure info without that noise? Rainer> how should we proceed with this patch? It would be a pity to release Rainer> GDB 14 with make check killing the whole session on Solaris... I think just adding Andrew's proposed assert to your patch should be good enough. The idea behind the assert is so that we can detect the bad case, if it ever happens, on a platform that is otherwise ok. The noise of an extra pass doesn't seem so bad, we have zillions of those already. The noise from the fail also shouldn't be too bad since, IIRC, you said this test is already not fully passing on Solaris. Anyway to sum up, the assert would be there as a "just in case" for other platforms, not Solaris. thanks, Tom
Hi Tom, >>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: > >>>> gdb_assert {[expr $inferior_pid > 0]} \ >>>> "check for a sane inferior pid" >>>> if {$inferior_pid > 0} { >>>> remote_exec target "kill -ALRM $inferior_pid" >>>> } >>>> >>>> This way you will still see a FAIL. >>> >>> True, but you will also see quite a bunch of PASSes in the working case >>> that tell you nothing. Seems like unnecessary noise to me. Isn't there >>> another way to convey the failure info without that noise? > > Rainer> how should we proceed with this patch? It would be a pity to release > Rainer> GDB 14 with make check killing the whole session on Solaris... > > I think just adding Andrew's proposed assert to your patch should be > good enough. > > The idea behind the assert is so that we can detect the bad case, if it > ever happens, on a platform that is otherwise ok. The noise of an extra > pass doesn't seem so bad, we have zillions of those already. The noise > from the fail also shouldn't be too bad since, IIRC, you said this test > is already not fully passing on Solaris. > > Anyway to sum up, the assert would be there as a "just in case" for > other platforms, not Solaris. fine with me. Here's the patch as amended. Re-tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu. Ok for master now? Thanks. Rainer
>>>>> "Rainer" == Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:
Rainer> Re-tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu.
Rainer> Ok for master now?
Yes, thank you.
Approved-By: Tom Tromey <tom@tromey.com>
Tom
diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step.exp b/gdb/testsuite/gdb.arch/amd64-disp-step.exp --- a/gdb/testsuite/gdb.arch/amd64-disp-step.exp +++ b/gdb/testsuite/gdb.arch/amd64-disp-step.exp @@ -222,7 +222,10 @@ proc rip_test { reg test_start_label tes # If we use 'signal' to send the signal GDB doesn't actually do # the displaced step, but instead just delivers the signal. set inferior_pid [get_inferior_pid] - remote_exec target "kill -ALRM $inferior_pid" + # Ensure that $inferior_pid refers to a single process. + if {$inferior_pid > 0} { + remote_exec target "kill -ALRM $inferior_pid" + } } gdb_test "continue" \