[PATCHv2,5/8] gdb: don't resume vfork parent while child is still running

  Like the last few commit, this fixes yet another vfork related issue.
Like the commit titled:

  gdb: don't restart vfork parent while waiting for child to finish

which addressed a case in linux-nat where we would try to resume a
vfork parent, this commit addresses a very similar case, but this time
occurring in infrun.c.  Just like with that previous commit, this bug
results in the assert:

  x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.

In this case the issue occurs when target-non-stop is on, but non-stop
is off, and again, schedule-multiple is on.  As with the previous
commit, GDB is in follow-fork-mode child.  If you have not done so, it
is worth reading the earlier commit as many of the problems leading to
the failure are the same, they just appear in a different part of GDB.

Here are the steps leading to the assertion failure:

  1. The user performs a 'next' over a vfork, GDB stop in the vfork
  child,

  2. As we are planning to follow the child GDB sets the vfork_parent
  and vfork_child member variables in the two inferiors, the
  thread_waiting_for_vfork_done member is left as nullptr, that member
  is only used when GDB is planning to follow the parent inferior,

  3. The user does 'continue', our expectation is that the vfork child
  should resume, and once that process has exited or execd, GDB should
  detach from the vfork parent.  As a result of the 'continue' GDB
  eventually enters the proceed function,

  4. In proceed we selected a ptid_t to resume, because
  schedule-multiple is on we select minus_one_ptid (see
  user_visible_resume_ptid),

  5. As GDB is running in all-stop on top of non-stop mode, in the
  proceed function we iterate over all threads that match the resume
  ptid, which turns out to be all threads, and call
  proceed_resume_thread_checked.  One of the threads we iterate over
  is the vfork parent thread,

  6. As the thread passed to proceed_resume_thread_checked doesn't
  match any of the early return conditions, GDB will set the thread
  resumed,

  7. As we are resuming one thread at a time, this thread is seen by
  the lower layers (e.g. linux-nat) as the "event thread", which means
  we don't apply any of the checks, e.g. is this thread a
  vfork parent, instead we assume that GDB core knows what it's doing,
  and linux-nat will resume the thread, we have now incorrectly set
  running the vfork parent thread when this thread should be waiting
  for the vfork child to complete,

  8. Back in the proceed function GDB continues to iterate over all
  threads, and now (correctly) resumes the vfork child thread,

  8. As the vfork child is still alive the kernel holds the vfork
  parent stopped,

  9. Eventually the child performs its exec and GDB is sent and EXECD
  event.  However, because the parent is resumed, as soon as the child
  performs its exec the vfork parent also sends a VFORK_DONE event to
  GDB,

  10. Depending on timing both of these events might seem to arrive in
  GDB at the same time.  Normally GDB expects to see the EXECD or
  EXITED/SIGNALED event from the vfork child before getting the
  VFORK_DONE in the parent.  We know this because it is as a result of
  the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see
  handle_vfork_child_exec_or_exit for details).  Further the comment
  in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that
  when we remain attached to the child (not the parent) we should not
  expect to see a VFORK_DONE,

  11. If both events arrive at the same time then GDB will randomly
  choose one event to handle first, in some cases this will be the
  VFORK_DONE.  As described above, upon seeing a VFORK_DONE GDB
  expects that (a) the vfork child has finished, however, in this case
  this is not completely true, the child has finished, but GDB has not
  processed the event associated with the completion yet, and (b) upon
  seeing a VFORK_DONE GDB assumes we are remaining attached to the
  parent, and so resumes the parent process,

  12. GDB now handles the EXECD event.  In our case we are detaching
  from the parent, so GDB calls target_detach (see
  handle_vfork_child_exec_or_exit),

  13. While this has been going on the vfork parent is executing, and
  might even exit,

  14. In linux_nat_target::detach the first thing we do is stop all
  threads in the process we're detaching from, the result of the stop
  request will be cached on the lwp_info object,

  15. In our case the vfork parent has exited though, so when GDB
  waits for the thread, instead of a stop due to signal, we instead
  get a thread exited status,

  16. Later in the detach process we try to resume the threads just
  prior to making the ptrace call to actually detach (see
  detach_one_lwp), as part of the process to resume a thread we try to
  touch some registers within the thread, and before doing this GDB
  asserts that the thread is stopped,

  17. An exited thread is not classified as stopped, and so the assert
  triggers!

Just like with the earlier commit, the fix is to spot the vfork parent
status of the thread, and not resume such threads.  Where the earlier
commit fixed this in linux-nat, in this case I think the fix should
live in infrun.c, in proceed_resume_thread_checked.  This function
already has a similar check to not resume the vfork parent in the case
where we are planning to follow the vfork parent, I propose adding a
similar case that checks for the vfork parent when we plan to follow
the vfork child.

This new check will mean that at step #6 above GDB doesn't try to
resume the vfork parent thread, which prevents the VFORK_DONE from
ever arriving.  Once GDB sees the EXECD/EXITED/SIGNALLED event from
the vfork child GDB will detach from the parent.

There's no test included in this commit.  In a subsequent commit I
will expand gdb.base/foll-vfork.exp which is when this bug would be
exposed.

If you do want to reproduce this failure then you will for certainly
need to run the gdb.base/foll-vfork.exp test in a loop as the failures
are all very timing sensitive.  I've found that running multiple
copies in parallel makes the failure more likely to appear, I usually
run ~6 copies in parallel and expect to see a failure after within
10mins.
---
 gdb/infrun.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

Message ID	9b3303bed5b6afcbe2e11e65d9696e3b59a61826.1688484032.git.aburgess@redhat.com
State	New
Headers	Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B0AD3870C36 for <patchwork@sourceware.org>; Tue, 4 Jul 2023 15:24:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4B0AD3870C36 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1688484273; bh=Yvjc9lwb+nxFF+zASO251ZxERL0t82RR+1Fj34o3l9k=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=pbKlsNYwaSlIiZGIJ2/gxzBjI/3SPyAOkN0VKRSFLFPwplC/FZh4B0h8q5QfwpHuQ bXjb65t10q7tw1dXUqhTSG3YNk3hZyMHyVRidxPM0VfoKExk+0kvdCIOqkVjpPFDKi MJtQ8csSIPazno0+Ve4sHZVwRZ8AB5NcI4nj6XV8= X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id A758A3856DDE for <gdb-patches@sourceware.org>; Tue, 4 Jul 2023 15:23:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A758A3856DDE Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-98-S_ZKalrMNXG8ibqEdlEKuQ-1; Tue, 04 Jul 2023 11:23:12 -0400 X-MC-Unique: S_ZKalrMNXG8ibqEdlEKuQ-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3fbdde92299so8769285e9.3 for <gdb-patches@sourceware.org>; Tue, 04 Jul 2023 08:23:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688484190; x=1691076190; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yvjc9lwb+nxFF+zASO251ZxERL0t82RR+1Fj34o3l9k=; b=LmANJqIP/A2dacscWEZQCZgEDu/GZkO7Zh3tH64XHsPdY3f3vK5FAUvqEhn8wkRoZ5 QgN+NzHhcncL10fC5TIkrcOBHwsv4eDttO1/hBI210yu7WLoZdR1lV5Plp8Vb2BhmuQ/ 1kSAvcnic3iXmZ0r7uj6TYxZJnqzn4pnUvJrzLcBxAO4Pm89Xs3j3mwuei9PNC+XMhkC OiaWv8Y5X7Hj+o65BEJw9MBmnq2qWkjNlMXgtXpg8wedkVCfuTSNLB3aP5bA8sPJzEl3 eR+8RFRc4G/B1eCCpfTv0ESG5ehrramhF88FSxU3S2st26+z7yklKr31QDJzYR2/LOPc xwAA== X-Gm-Message-State: AC+VfDwp3KExiBmKF0QpnFb1mKKO097gNNCidAQLWG/HYGANlBd9fQki emyQPxS/PRlHUxKZn8oH9pp3nBtGclg4dFq5Eff03gm05mo1uYSzh175+I5R4lomhvrk26bQBM7 ovqoVbFoPD+jcxUZd5KDTl2Ws7ryNJd1kCVn243ram3E7WaylDidPieBECITAFOuD0ECjzIIetu sAZA7PsA== X-Received: by 2002:a05:600c:c1:b0:3fa:8cd8:9743 with SMTP id u1-20020a05600c00c100b003fa8cd89743mr11933261wmm.19.1688484190180; Tue, 04 Jul 2023 08:23:10 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5ftUxUNLEf1kFnwwkzlYH3MSdJsGhTGUkkzxuSImOpr7J9lfZPLwVe6HmYPkabvu2XhA8POA== X-Received: by 2002:a05:600c:c1:b0:3fa:8cd8:9743 with SMTP id u1-20020a05600c00c100b003fa8cd89743mr11933238wmm.19.1688484189657; Tue, 04 Jul 2023 08:23:09 -0700 (PDT) Received: from localhost (2.72.115.87.dyn.plus.net. [87.115.72.2]) by smtp.gmail.com with ESMTPSA id y23-20020a7bcd97000000b003fbb2c0fce5sm17877445wmj.25.2023.07.04.08.23.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 08:23:09 -0700 (PDT) To: gdb-patches@sourceware.org Cc: Andrew Burgess <aburgess@redhat.com>, tankut.baris.aktemur@intel.com Subject: [PATCHv2 5/8] gdb: don't resume vfork parent while child is still running Date: Tue, 4 Jul 2023 16:22:55 +0100 Message-Id: <9b3303bed5b6afcbe2e11e65d9696e3b59a61826.1688484032.git.aburgess@redhat.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <cover.1688484032.git.aburgess@redhat.com> References: <cover.1687438786.git.aburgess@redhat.com> <cover.1688484032.git.aburgess@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> Reply-To: Andrew Burgess <aburgess@redhat.com> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org>
Series	Some vfork related fixes \| [PATCHv2,0/8] Some vfork related fixes [PATCHv2,1/8] gdb: catch more errors in gdb.base/foll-vfork.exp [PATCHv2,2/8] gdb: don't restart vfork parent while waiting for child to finish [PATCHv2,3/8] gdb: fix an issue with vfork in non-stop mode [PATCHv2,4/8] gdb, infrun: refactor part of `proceed` into separate function [PATCHv2,5/8] gdb: don't resume vfork parent while child is still running [PATCHv2,6/8] gdb/testsuite: expand gdb.base/foll-vfork.exp [PATCHv2,7/8] gdb/testsuite: remove use of sleep from gdb.base/foll-vfork.exp [PATCHv2,8/8] gdb: additional debug output in infrun.c and linux-nat.c

[PATCHv2,5/8] gdb: don't resume vfork parent while child is still running

Commit Message

Comments

Patch