[03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event

  I noticed that after a following patch ("Step over clone syscall w/
breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the
gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd
end up with four new unexpected GDB core dumps:

		 === gdb Summary ===

 # of unexpected core files      4
 # of expected passes            48

That said patch is making the pre-existing
gdb.threads/step-over-exec.exp testcase (almost silently) expose a
latent problem in gdb/linux-nat.c, resulting in a GDB crash when:

 #1 - a non-leader thread execs
 #2 - the post-exec program stops somewhere
 #3 - you kill the inferior

Instead of #3 directly, the testcase just returns, which ends up in
gdb_exit, tearing down GDB, which kills the inferior, and is thus
equivalent to #3 above.

Vis:

 $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
 ...
 (top-gdb) r
 ...
 (gdb) b main
 ...
 (gdb) r
 ...
 Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
 69        argv0 = argv[0];
 (gdb) c
 Continuing.
 [New Thread 0x7ffff7d89700 (LWP 2506975)]
 Other going in exec.
 Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
 process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd

 Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
 28        foo ();
 (gdb) k
 ...
 Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
 393         return m_suspend.waitstatus_pending_p;
 (top-gdb) bt
 #0  0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
 #1  0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
 #2  0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
 #3  0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
 #4  0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
 #5  0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
 #6  0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
 #7  0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
 #8  0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
 ...

The root of the problem is that when a non-leader LWP execs, it just
changes its tid to the tgid, replacing the pre-exec leader thread,
becoming the new leader.  There's no thread exit event for the execing
thread.  It's as if the old pre-exec LWP vanishes without trace.  The
ptrace man page says:

"PTRACE_O_TRACEEXEC (since Linux 2.5.46)
	Stop the tracee at the next execve(2).  A waitpid(2) by the
	tracer will return a status value such that

	  status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))

	If the execing thread is not a thread group leader, the thread
	ID is reset to thread group leader's ID before this stop.
	Since Linux 3.0, the former thread ID can be retrieved with
	PTRACE_GETEVENTMSG."

When the core of GDB processes an exec events, it deletes all the
threads of the inferior.  But, that is too late -- deleting the thread
does not delete the corresponding LWP, so we end leaving the pre-exec
non-leader LWP stale in the LWP list.  That's what leads to the crash
above -- linux_nat_target::kill iterates over all LWPs, and after the
patch in question, that code will look for the corresponding
thread_info for each LWP.  For the pre-exec non-leader LWP still
listed, won't find one.

This patch fixes it, by deleting the pre-exec non-leader LWP (and
thread) from the LWP/thread lists as soon as we get an exec event out
of ptrace.

GDBserver does not need an equivalent fix, because it is already doing
this, as side effect of mourning the pre-exec process, in
gdbserver/linux-low.cc:

  else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
    {
...
      /* Delete the execing process and all its threads.  */
      mourn (proc);
      switch_to_thread (nullptr);

Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
---
 gdb/linux-nat.c                              | 15 +++++++++++++++
 gdb/testsuite/gdb.threads/step-over-exec.exp |  6 ++++++
 2 files changed, 21 insertions(+)

Message ID	20221212203101.1034916-4-pedro@palves.net
State	New
Headers	Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8F7CC384C907 for <patchwork@sourceware.org>; Mon, 12 Dec 2022 20:31:30 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by sourceware.org (Postfix) with ESMTPS id 5FF43384E224 for <gdb-patches@sourceware.org>; Mon, 12 Dec 2022 20:31:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5FF43384E224 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-f44.google.com with SMTP id m14so13439207wrh.7 for <gdb-patches@sourceware.org>; Mon, 12 Dec 2022 12:31:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KH+UGlIy6WlfQV0bCRRdlifF686BYp8xjFIvrLJfdRI=; b=TzSqiFUj02Akn4HVvgd+YbtdpZk9YH/GJ8J5n7Ut212SXd84gZrOevGYmNRAEh6R4y kX3Ghoi1ShblPECM3ZMHtkZZWQXJLmMP/liJSL+ns3rr5h/5HroxW7oh0q4F09FYRDsh X/5jtEneT5iQGCbtz2WvDoH4I39+yX7srBCFPyV3PUp2KUKK8zlRXSP/KbYgXbOCcDyz 2RowaneBU2HBigopCF1xXgV8gjJyzHmpeUVwa2gFMF0CWCa2tgv06tMzmFO6L2PX+Gac TlmMHmAHmojTo9xBlnjoZ6BIZy2rutc1Dvp9leZn3+hAa1AxW8rEx6nhyaMcy9DOgsA8 6iCA== X-Gm-Message-State: ANoB5pnjAsqq2Bj2ycm+pW5aIhwz2AvVdYaQVg/XyjsJhmZWIPz1dNLy tV1XJiVdmHRa5031QYc4TrAyU6DAUy6AWw== X-Google-Smtp-Source: AA0mqf4JyqOh5m5zU+k/iKe8DUUsTt7iPVyCP2hiAevdxX2O8LUPIZzkEz1tRtRXx4ZO5TXp0doUHg== X-Received: by 2002:a5d:68c8:0:b0:242:15e1:5805 with SMTP id p8-20020a5d68c8000000b0024215e15805mr11200696wrw.55.1670877061110; Mon, 12 Dec 2022 12:31:01 -0800 (PST) Received: from localhost ([2001:8a0:f912:6700:afd9:8b6d:223f:6170]) by smtp.gmail.com with ESMTPSA id m6-20020a5d4a06000000b002420cfcd13dsm9765546wrq.105.2022.12.12.12.31.00 for <gdb-patches@sourceware.org> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Dec 2022 12:31:00 -0800 (PST) From: Pedro Alves <pedro@palves.net> To: gdb-patches@sourceware.org Subject: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event Date: Mon, 12 Dec 2022 20:30:33 +0000 Message-Id: <20221212203101.1034916-4-pedro@palves.net> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20221212203101.1034916-1-pedro@palves.net> References: <20221212203101.1034916-1-pedro@palves.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org>
Series	Step over thread clone and thread exit \| [00/31] Step over thread clone and thread exit [01/31] displaced step: pass down target_waitstatus instead of gdb_signal [02/31] linux-nat: introduce pending_status_str [03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event [04/31] Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED [05/31] Support clone events in the remote protocol [06/31] Avoid duplicate QThreadEvents packets [07/31] enum_flags to_string [08/31] Thread options & clone events (core + remote) [09/31] Thread options & clone events (native Linux) [10/31] Thread options & clone events (Linux GDBserver) [11/31] gdbserver: Hide and don't detach pending clone children [12/31] Remove gdb/19675 kfails (displaced stepping + clone) [13/31] Add test for stepping over clone syscall [14/31] all-stop/synchronous RSP support thread-exit events [15/31] gdbserver/linux-low.cc: Ignore event_ptid if TARGET_WAITKIND_IGNORE [16/31] Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to core [17/31] Introduce GDB_THREAD_OPTION_EXIT thread option, fix step-over-thread-exit [18/31] Implement GDB_THREAD_OPTION_EXIT support for Linux GDBserver [19/31] Implement GDB_THREAD_OPTION_EXIT support for native Linux [20/31] gdb: clear step over information on thread exit (PR gdb/27338) [21/31] stop_all_threads: (re-)enable async before waiting for stops [22/31] gdbserver: Queue no-resumed event after thread exit [23/31] Don't resume new threads if scheduler-locking is in effect [24/31] Report thread exit event for leader if reporting thread exit events [25/31] Ignore failure to read PC when resuming [26/31] gdb/testsuite/lib/my-syscalls.S: Refactor new SYSCALL macro [27/31] Testcases for stepping over thread exit syscall (PR gdb/27338) [28/31] Document remote clone events, and QThreadOptions packet [29/31] inferior::clear_thread_list always silent [30/31] Centralize "[Thread ...exited]" notifications [31/31] Cancel execution command on thread exit, when stepping, nexting, etc.

[03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event

Commit Message

Comments

Patch