From patchwork Wed Nov 15 19:41:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin Buettner X-Patchwork-Id: 79974 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 88C7C3858D20 for ; Wed, 15 Nov 2023 19:56:55 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 9E5F83858D20 for ; Wed, 15 Nov 2023 19:56:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9E5F83858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9E5F83858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700078202; cv=none; b=Gj5stITh21wxn5WBG3hYxpgVVnUm6k1Do8Xd1bypp+VgIODAPGocCRfGp76JPp4+HkyQfdxAsQIiXh355DCaf+9FSBnsTrC21UDKiBEXX/n8Lq+8rDhiW2zdUE2GhUD8aXfEmnX+LGKRQXvIYlRCC5DQzM8W98nSUon9/LoVSxw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700078202; c=relaxed/simple; bh=3jT+AfqHUtqNGfhtUVG6GG4j9ND40pZPzha7crFSFmE=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=ucYmSd96q9oS0pBvnsCmUeFYvRcBTC0r7Edu4FFxiFHSio4+2AWmZoNGomQ7gns/fGr3O6tKhgdcO3j3qU9ThERLSoGata+JEGpjUvWAWemVGGdVViFOxhferDxjFV8LYnOr5qP648v3b3Kw/TbTgsQYailgV97hKiQmtoLJqzA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700078199; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YsKGVvadJ5X/bhPYp1gIZBf6DZ2nG2ikwR+OBmosg4M=; b=HpTuexNuEts4FmFpKMSxWiZIiI5XmHaSywgnbhPg9JKtQJ1xkgdzOsjBi7kbEcIeSz/qkQ /e06fVI4U0EUQ0bmYeAasesq7/0m928W/sYpFwyZNRLiLUxBepOwS6EwJMIj05TyrFerxX EOp4PKJmo0HbultCxSaC/mxWngAGPa8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-515-H7GxD4XFNMiudgfdxHJY_w-1; Wed, 15 Nov 2023 14:56:37 -0500 X-MC-Unique: H7GxD4XFNMiudgfdxHJY_w-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5770D80513D for ; Wed, 15 Nov 2023 19:56:37 +0000 (UTC) Received: from f39-1.lan (unknown [10.22.8.105]) by smtp.corp.redhat.com (Postfix) with ESMTP id CB6062026D4C; Wed, 15 Nov 2023 19:56:36 +0000 (UTC) From: Kevin Buettner To: gdb-patches@sourceware.org Cc: Kevin Buettner , Andrew Burgess Subject: [PATCH v2 1/1] Fix detach bug when lwp has exited/terminated Date: Wed, 15 Nov 2023 12:41:33 -0700 Message-ID: <20231115195458.2938701-2-kevinb@redhat.com> In-Reply-To: <20231115195458.2938701-1-kevinb@redhat.com> References: <20231115195458.2938701-1-kevinb@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org When using GDB on native linux, it can happen that, while attempting to detach an inferior, the inferior may have been exited or have been killed, yet still be in the list of lwps. Should that happen, the assert in x86_linux_update_debug_registers in gdb/nat/x86-linux-dregs.c will trigger. The line in question looks like this: gdb_assert (lwp_is_stopped (lwp)); For this case, the lwp isn't stopped - it's dead. The bug which brought this problem to my attention is one in which the pwntools library uses GDB to to debug a process; as the script is shutting things down, it kills the process that GDB is debugging and also sends GDB a SIGTERM signal, which causes GDB to detach all inferiors prior to exiting. Here's a link to the bug: https://bugzilla.redhat.com/show_bug.cgi?id=2192169 The following shell command mimics part of what the pwntools reproducer script does (with regard to shutting things down), but reproduces the bug much less reliably. I have found it necessary to run the command a bunch of times before seeing the bug. (I usually see it within 5-10 repetitions.) If you choose to try this command, make sure that you have no running "cat" or "gdb" processes first! cat /dev/null & \ (sleep 5; (kill -KILL `pgrep cat` & kill -TERM `pgrep gdb`)) & \ sleep 1 ; \ gdb -q -iex 'set debuginfod enabled off' -ex 'set height 0' \ -ex c /usr/bin/cat `pgrep cat` So, basically, the idea here is to kill both gdb and cat at roughly the same time. If we happen to attempt the detach before the process lwp has been deleted from GDB's (linux native) LWP data structures, then the assert will trigger. The relevant part of the backtrace looks like this: #8 0x00000000008a83ae in x86_linux_update_debug_registers (lwp=0x1873280) at gdb/nat/x86-linux-dregs.c:146 #9 0x00000000008a862f in x86_linux_prepare_to_resume (lwp=0x1873280) at gdb/nat/x86-linux.c:81 #10 0x000000000048ea42 in x86_linux_nat_target::low_prepare_to_resume ( this=0x121eee0 , lwp=0x1873280) at gdb/x86-linux-nat.h:70 #11 0x000000000081a452 in detach_one_lwp (lp=0x1873280, signo_p=0x7fff8ca3441c) at gdb/linux-nat.c:1374 #12 0x000000000081a85f in linux_nat_target::detach ( this=0x121eee0 , inf=0x16e8f70, from_tty=0) at gdb/linux-nat.c:1450 #13 0x000000000083a23b in thread_db_target::detach ( this=0x1206ae0 , inf=0x16e8f70, from_tty=0) at gdb/linux-thread-db.c:1385 #14 0x0000000000a66722 in target_detach (inf=0x16e8f70, from_tty=0) at gdb/target.c:2526 #15 0x0000000000a8f0ad in kill_or_detach (inf=0x16e8f70, from_tty=0) at gdb/top.c:1659 #16 0x0000000000a8f4fa in quit_force (exit_arg=0x0, from_tty=0) at gdb/top.c:1762 #17 0x000000000070829c in async_sigterm_handler (arg=0x0) at gdb/event-top.c:1141 My colleague, Andrew Burgess, has done some recent work on other problems with detach. Upon hearing of this problem, he came up a test case which reliably reproduces the problem and tests for a few other problems as well. In addition to testing detach when the inferior has terminated due to a signal, it also tests detach when the inferior has exited normally. Andrew observed that the linux-native-only "checkpoint" command would be affected too, so the test also tests those cases when there's an active checkpoint. For the LWP exit / termination case with no checkpoint, that's handled via newly added checks of the waitstatus in detach_one_lwp in linux-nat.c. For the checkpoint detach problem, I chose to pass the lwp_info to linux_fork_detach in linux-fork.c. With that in place, suitable tests were added before attempting a PTRACE_DETACH operation. I added a few asserts at the beginning of linux_fork_detach and modified the caller code so that the newly added asserts shouldn't trigger. (That's what the 'pid == inferior_ptid.pid' check is about in gdb/linux-nat.c.) Lastly, I'll note that the checkpoint code needs some work with regard to background execution. This patch doesn't attempt to fix that problem, but it doesn't make it any worse. It does slightly improve the situation with detach because, due to the check noted above, linux_fork_detach() won't be called for the wrong inferior when there are multiple inferiors. (There are at least two other problems with the checkpoint code when there are multiple inferiors. See: https://sourceware.org/bugzilla/show_bug.cgi?id=31065) This commit also adds a new test, gdb.base/process-dies-while-detaching.exp. Andrew Burgess is the primary author of this test case. Its design is similar to that of gdb.threads/main-thread-exit-during-detach.exp, which was also written by Andrew. This test checks that GDB correctly handles several cases that can occur when GDB attempts to detach an inferior process. The process can exit or be terminated (e.g. via SIGKILL) prior to GDB's event loop getting a chance to remove it from GDB's internal data structures. To complicate things even more, detach works differently when a checkpoint (created via GDB's "checkpoint" command) exists for the inferior. This test checks all four possibilities: process exit with no checkpoint, process termination with no checkpoint, process exit with a checkpoint, and process termination with a checkpoint. Co-Authored-By: Andrew Burgess --- gdb/linux-fork.c | 20 ++- gdb/linux-fork.h | 3 +- gdb/linux-nat.c | 18 ++- gdb/testsuite/gdb.base/kill-during-detach.c | 32 +++++ gdb/testsuite/gdb.base/kill-during-detach.exp | 132 ++++++++++++++++++ 5 files changed, 197 insertions(+), 8 deletions(-) create mode 100644 gdb/testsuite/gdb.base/kill-during-detach.c create mode 100644 gdb/testsuite/gdb.base/kill-during-detach.exp diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c index 52e385411c7..f23482f6da7 100644 --- a/gdb/linux-fork.c +++ b/gdb/linux-fork.c @@ -32,6 +32,7 @@ #include "nat/gdb_ptrace.h" #include "gdbsupport/gdb_wait.h" +#include "target/waitstatus.h" #include #include @@ -361,15 +362,24 @@ linux_fork_mourn_inferior (void) the first available. */ void -linux_fork_detach (int from_tty) +linux_fork_detach (int from_tty, lwp_info *lp) { + gdb_assert (lp != nullptr); + gdb_assert (lp->ptid == inferior_ptid); + /* OK, inferior_ptid is the one we are detaching from. We need to delete it from the fork_list, and switch to the next available - fork. */ + fork. But before doing the detach, do make sure that the lwp + hasn't exited or been terminated first. */ - if (ptrace (PTRACE_DETACH, inferior_ptid.pid (), 0, 0)) - error (_("Unable to detach %s"), - target_pid_to_str (inferior_ptid).c_str ()); + if (lp->waitstatus.kind () != TARGET_WAITKIND_EXITED + && lp->waitstatus.kind () != TARGET_WAITKIND_THREAD_EXITED + && lp->waitstatus.kind () != TARGET_WAITKIND_SIGNALLED) + { + if (ptrace (PTRACE_DETACH, inferior_ptid.pid (), 0, 0)) + error (_("Unable to detach %s"), + target_pid_to_str (inferior_ptid).c_str ()); + } delete_fork (inferior_ptid); diff --git a/gdb/linux-fork.h b/gdb/linux-fork.h index 5a593fca91e..e335fb24378 100644 --- a/gdb/linux-fork.h +++ b/gdb/linux-fork.h @@ -21,11 +21,12 @@ #define LINUX_FORK_H struct fork_info; +struct lwp_info; extern void add_fork (pid_t); extern struct fork_info *find_fork_pid (pid_t); extern void linux_fork_killall (void); extern void linux_fork_mourn_inferior (void); -extern void linux_fork_detach (int); +extern void linux_fork_detach (int, lwp_info *); extern int forks_exist_p (void); extern int linux_fork_checkpointing_p (int); diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index 8951b34e192..762e742bb0c 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -1383,6 +1383,20 @@ detach_one_lwp (struct lwp_info *lp, int *signo_p) lp->signalled = 0; } + /* If the lwp has exited or was terminated due to a signal, there's + nothing left to do. */ + if (lp->waitstatus.kind () == TARGET_WAITKIND_EXITED + || lp->waitstatus.kind () == TARGET_WAITKIND_THREAD_EXITED + || lp->waitstatus.kind () == TARGET_WAITKIND_SIGNALLED) + { + linux_nat_debug_printf + ("Can't detach %s - it has exited or was terminated: %s.", + lp->ptid.to_string ().c_str (), + lp->waitstatus.to_string ().c_str ()); + delete_lwp (lp->ptid); + return; + } + if (signo_p == NULL) { /* Pass on any pending signal for this LWP. */ @@ -1456,13 +1470,13 @@ linux_nat_target::detach (inferior *inf, int from_tty) gdb_assert (num_lwps (pid) == 1 || (target_is_non_stop_p () && num_lwps (pid) == 0)); - if (forks_exist_p ()) + if (pid == inferior_ptid.pid () && forks_exist_p ()) { /* Multi-fork case. The current inferior_ptid is being detached from, but there are other viable forks to debug. Detach from the current fork, and context-switch to the first available. */ - linux_fork_detach (from_tty); + linux_fork_detach (from_tty, find_lwp_pid (ptid_t (pid))); } else { diff --git a/gdb/testsuite/gdb.base/kill-during-detach.c b/gdb/testsuite/gdb.base/kill-during-detach.c new file mode 100644 index 00000000000..2d9cca91e2f --- /dev/null +++ b/gdb/testsuite/gdb.base/kill-during-detach.c @@ -0,0 +1,32 @@ +/* This testcase is part of GDB, the GNU debugger. + + Copyright 2023 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include + +volatile int dont_exit_just_yet = 1; + +int +main () +{ + alarm (300); + + /* Spin until GDB releases us. */ + while (dont_exit_just_yet) + usleep (100000); + + _exit (0); +} diff --git a/gdb/testsuite/gdb.base/kill-during-detach.exp b/gdb/testsuite/gdb.base/kill-during-detach.exp new file mode 100644 index 00000000000..26028d5fc34 --- /dev/null +++ b/gdb/testsuite/gdb.base/kill-during-detach.exp @@ -0,0 +1,132 @@ +# Copyright 2023 Free Software Foundation, Inc. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +# This test checks that GDB correctly handles several cases that can +# occur when GDB attempts to detach an inferior process. The process +# can exit or be terminated (e.g. via SIGKILL) prior to GDB's event +# loop getting a chance to remove it from GDB's internal data +# structures. To complicate things even more, detach works differently +# when a checkpoint (created via GDB's "checkpoint" command) exists for +# the inferior. This test checks all four possibilities: process exit +# with no checkpoint, process termination with no checkpoint, process +# exit with a checkpoint, and process termination with a checkpoint. + +standard_testfile + +# This test requires python. +require allow_python_tests + +# This test attempts to kill a process on the host running GDB, so +# disallow remote targets. (Setting --target_board to +# native-gdbserver or native-extended-gdbserver should still work.) +require {!is_remote target} + +# Checkpoint support only works on native Linux: +if { [istarget "*-*-linux*"] && [target_info gdb_protocol] == ""} { + set has_checkpoint true +} else { + set has_checkpoint false +} + +if {[build_executable "failed to prepare" $testfile $srcfile] == -1} { + return -1 +} + +# Start an inferior, which blocks in a spin loop. Setup a Python +# function that performs an action based on EXIT_P that will cause the +# inferior to exit, and then, within the same Python function, ask GDB +# to detach from the inferior. Use 'continue&' to run the inferior in +# the background, and then invoke the Python function. Note, too, that +# non-stop mode is enabled during the restart; if this is not done, +# remote_target::putpkt_binary in remote.c will disallow some of the +# operations necessary for this test. +# +# The idea is that GDB's event loop will not get a chance to handle +# the inferior exiting, so it will only be at the point that we try to +# detach that we notice that the inferior has exited. +# +# When EXIT_P is true the action we perform to terminate the inferior +# is to set a flag in the inferior, which allows the inferior to break +# out of its spin loop. +# +# When EXIT_P is false the action we perform is to send SIGKILL to the +# inferior. +# +# When CHECKPOINT_P is true, before issuing 'continue&' we use the +# 'checkpoint' command to create a checkpoint of GDB. +# +# When CHECKPOINT_P is false we don't use the 'checkpoint' command. +proc run_test { exit_p checkpoint_p } { + save_vars { ::GDBFLAGS } { + append ::GDBFLAGS " -ex \"set non-stop on\"" + clean_restart $::binfile + } + + if {![runto_main]} { + return -1 + } + + if { $checkpoint_p } { + gdb_test "checkpoint" \ + "checkpoint 1: fork returned pid $::decimal\\." + } + + # Must get the PID before we resume the inferior. + set inf_pid [get_inferior_pid] + + # Put the PID in a python variable so that a numerical PID won't + # appear in the PASS/FAIL output. + gdb_test_no_output "python inf_pid=$inf_pid" "assign inf_pid" + + gdb_test "continue &" + + if { $exit_p } { + set action_line "gdb.execute(\"set variable dont_exit_just_yet=0\")" + } else { + set action_line "os.kill(inf_pid, signal.SIGKILL)" + } + + gdb_test_multiline "Create worker function" \ + "python" "" \ + "import time" "" \ + "import os" "" \ + "import signal" "" \ + "def kill_and_detach():" "" \ + " $action_line" "" \ + " time.sleep(1)" "" \ + " gdb.execute(\"detach\")" "" \ + "end" "" + + if { $checkpoint_p } { + # NOTE: The 'checkpoint' system in GDB appears to be a little + # iffy. This detach does seem to restore the checkpoint, but + # it leaves the inferior stuck in a running state. + gdb_test_no_output "python kill_and_detach()" + } else { + gdb_test "python kill_and_detach()" \ + "\\\[Inferior $::decimal \[^\r\n\]+ detached\\\]" + } +} + +if { $has_checkpoint } { + set checkpoint_iters { true false } +} else { + set checkpoint_iters { false } +} + +foreach_with_prefix exit_p { true false } { + foreach_with_prefix checkpoint_p $checkpoint_iters { + run_test $exit_p $checkpoint_p + } +}