Message ID | 20221212203101.1034916-26-pedro@palves.net |
---|---|
State | New |
Headers |
Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ED721383CD32 for <patchwork@sourceware.org>; Mon, 12 Dec 2022 20:34:09 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by sourceware.org (Postfix) with ESMTPS id 4E1E93850B2C for <gdb-patches@sourceware.org>; Mon, 12 Dec 2022 20:31:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4E1E93850B2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wm1-f47.google.com with SMTP id m19so6594134wms.5 for <gdb-patches@sourceware.org>; Mon, 12 Dec 2022 12:31:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=meA7moerUeX4A02TpXkakiAW9fEJZuUK8CqSBhZih4s=; b=29fGE+sei7rtWyfMh+S9joyOUJFQ/gJpJICMzEsRviwCOTarFvww2lA3DOSasyPQNs MTBzotBpdKaLrjvOnxgjy8ou83hT3gksnusfVr2h7qGQtwpzb3dkwD5UgflulhYtitf9 I7Wyq0ilZPLzrjbfebuSh/KRUO3IkV6uHC94Y52b35balbgrfy9LhTUfhtwdtV77CEbJ 1PBn6+PvEbdr6XqzFyDenfEbICc/LKo/WS2PJxIvNkM4EJWIWHd6qP+z3lTlM07RJHFB e77GWBU1IL1Chy6po3Bh29g2udN6Zetdc3sGL9B6PW7PlpPmGnKsdj8gew3GcchUJZHE rWFg== X-Gm-Message-State: ANoB5pmQG3blTbLYpp8sVOS2HJ3MtEgBjAjfoVOqMSVtpG572yRsFBav Ri2C1T7YXE+jire/CfGhdAzJm/HUORfbKw== X-Google-Smtp-Source: AA0mqf4X2ZP+v7p4YGZHvA7ssCYCwkvzBJDd+CmhhakpHs8d9kzEoxhpWOc0mKZ87vwF4CtG8scy9g== X-Received: by 2002:a1c:f216:0:b0:3c6:e60f:3f6f with SMTP id s22-20020a1cf216000000b003c6e60f3f6fmr13580431wmc.38.1670877092104; Mon, 12 Dec 2022 12:31:32 -0800 (PST) Received: from localhost ([2001:8a0:f912:6700:afd9:8b6d:223f:6170]) by smtp.gmail.com with ESMTPSA id l8-20020a05600c4f0800b003cf54b77bfesm10899120wmq.28.2022.12.12.12.31.31 for <gdb-patches@sourceware.org> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Dec 2022 12:31:31 -0800 (PST) From: Pedro Alves <pedro@palves.net> To: gdb-patches@sourceware.org Subject: [PATCH 25/31] Ignore failure to read PC when resuming Date: Mon, 12 Dec 2022 20:30:55 +0000 Message-Id: <20221212203101.1034916-26-pedro@palves.net> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20221212203101.1034916-1-pedro@palves.net> References: <20221212203101.1034916-1-pedro@palves.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Step over thread clone and thread exit
|
|
Commit Message
Pedro Alves
Dec. 12, 2022, 8:30 p.m. UTC
If GDB sets a GDB_THREAD_OPTION_EXIT option on a thread, and the thread exits, the server reports the corresponding thread exit event, and forgets about the thread, i.e., removes the exited thread from its thread list. On the GDB side, GDB set the GDB_THREAD_OPTION_EXIT option on a thread, GDB delays deleting the thread from its thread list until it sees the corresponding thread exit event, as that event needs special handling in infrun. When a thread disappears from the target, but it still exists on GDB's thread list, in all-stop RSP mode, it can happen that GDB ends up trying to resume such an already-exited-thread that GDB doesn't yet know is gone. When that happens, against GDBserver, typically the ongoing execution command fails with this error: ... PC register is not available (gdb) At the remote protocol level, we may see e.g., this: [remote] Packet received: w0;p97479.978d2 [remote] wait: exit [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: 619641.620754.0 [Thread 619641.620754], [infrun] print_target_wait_results: status->kind = THREAD_EXITED, exit_status = 0 [infrun] handle_inferior_event: status->kind = THREAD_EXITED, exit_status = 0 [infrun] context_switch: Switching context from 0.0.0 to 619641.620754.0 [infrun] clear_proceed_status_thread: 619641.620754.0 GDB saw an exit event for thread 619641.620754. After processing it, infrun decides to re-resume the target again. To do that, infrun picks some other thread that isn't exited yet from GDB's perspective, switches to it, and calls keep_going. Below, infrun happens to pick thread p97479.97479, the leader, which also exited, but GDB doesn't know yet: ... [remote] Sending packet: $Hgp97479.97479#75 [remote] Packet received: OK [remote] Sending packet: $g#67 [remote] Packet received: xxxxxxxxxxxxxxxxx (...snip...) [1120 bytes omitted] [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target remote, no resumed threads [infrun] fetch_inferior_event: exit PC register is not available (gdb) The Linux backends, both in GDB and in GDBserver, already silently ignore failures to resume, with the understanding that we'll see an exit event soon. Core of GDB doesn't do that yet, though. This patch is a small step in that direction. It swallows the error when thrown from within resume_1. There are likely are spots where we will need similar treatment, but we can tackle them as we find them. After this patch, we'll see something like this instead: [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [640478.640478.0] at 0x0 [infrun] do_target_resume: resume_ptid=640478.0.0, step=0, sig=GDB_SIGNAL_0 [remote] Sending packet: $vCont;c:p9c5de.-1#78 [infrun] prepare_to_wait: prepare_to_wait [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote [infrun] fetch_inferior_event: exit [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] random_pending_event_thread: None found. [remote] wait: enter [remote] Packet received: W0;process:9c5de [remote] wait: exit [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: 640478.0.0 [process 640478], [infrun] print_target_wait_results: status->kind = EXITED, exit_status = 0 [infrun] handle_inferior_event: status->kind = EXITED, exit_status = 0 [Inferior 1 (process 640478) exited normally] [infrun] stop_waiting: stop_waiting [infrun] reset: reason=handling event (gdb) [infrun] fetch_inferior_event: exit Change-Id: I7f1c7610923435c4e98e70acc5ebe5ebbac581e2 --- gdb/infrun.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
Comments
Pedro Alves <pedro@palves.net> writes: > If GDB sets a GDB_THREAD_OPTION_EXIT option on a thread, and the > thread exits, the server reports the corresponding thread exit event, > and forgets about the thread, i.e., removes the exited thread from its > thread list. > > On the GDB side, GDB set the GDB_THREAD_OPTION_EXIT option on a > thread, GDB delays deleting the thread from its thread list until it > sees the corresponding thread exit event, as that event needs special > handling in infrun. > > When a thread disappears from the target, but it still exists on GDB's > thread list, in all-stop RSP mode, it can happen that GDB ends up > trying to resume such an already-exited-thread that GDB doesn't yet > know is gone. When that happens, against GDBserver, typically the > ongoing execution command fails with this error: I'm slightly confused here. If GDB doesn't know the thread has exited doesn't that mean the server hasn't yet reported the exit, and so should be holding onto the thread? I wanted to investigate this a bit more to try and understand more about what's going on, but I couldn't find a test that was triggering the code added in this patch. Do you know if there's a test I can run to see this issue? > > ... > PC register is not available > (gdb) > > At the remote protocol level, we may see e.g., this: > > [remote] Packet received: w0;p97479.978d2 > [remote] wait: exit > [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = > [infrun] print_target_wait_results: 619641.620754.0 [Thread 619641.620754], > [infrun] print_target_wait_results: status->kind = THREAD_EXITED, exit_status = 0 > [infrun] handle_inferior_event: status->kind = THREAD_EXITED, exit_status = 0 > [infrun] context_switch: Switching context from 0.0.0 to 619641.620754.0 > [infrun] clear_proceed_status_thread: 619641.620754.0 > > GDB saw an exit event for thread 619641.620754. After processing it, > infrun decides to re-resume the target again. To do that, infrun > picks some other thread that isn't exited yet from GDB's perspective, > switches to it, and calls keep_going. Below, infrun happens to pick > thread p97479.97479, the leader, which also exited, but GDB doesn't > know yet: > > ... > [remote] Sending packet: $Hgp97479.97479#75 > [remote] Packet received: OK > [remote] Sending packet: $g#67 > [remote] Packet received: xxxxxxxxxxxxxxxxx (...snip...) [1120 bytes omitted] > [infrun] reset: reason=handling event > [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target remote, no resumed threads > [infrun] fetch_inferior_event: exit > PC register is not available > (gdb) > > The Linux backends, both in GDB and in GDBserver, already silently > ignore failures to resume, with the understanding that we'll see an > exit event soon. Core of GDB doesn't do that yet, though. > > This patch is a small step in that direction. It swallows the error > when thrown from within resume_1. There are likely are spots where we > will need similar treatment, but we can tackle them as we find them. > > After this patch, we'll see something like this instead: > > [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [640478.640478.0] at 0x0 > [infrun] do_target_resume: resume_ptid=640478.0.0, step=0, sig=GDB_SIGNAL_0 > [remote] Sending packet: $vCont;c:p9c5de.-1#78 I'm confuse by this example. I would have expected it to start off with the same intro as the above, that is, send the '$g#67' packet, get back the xxxx...etc... but then do things differently. > [infrun] prepare_to_wait: prepare_to_wait > [infrun] reset: reason=handling event > [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote > [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote > [infrun] fetch_inferior_event: exit > [infrun] fetch_inferior_event: enter > [infrun] scoped_disable_commit_resumed: reason=handling event > [infrun] random_pending_event_thread: None found. > [remote] wait: enter > [remote] Packet received: W0;process:9c5de > [remote] wait: exit > [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = > [infrun] print_target_wait_results: 640478.0.0 [process 640478], > [infrun] print_target_wait_results: status->kind = EXITED, exit_status = 0 > [infrun] handle_inferior_event: status->kind = EXITED, exit_status = 0 > [Inferior 1 (process 640478) exited normally] > [infrun] stop_waiting: stop_waiting > [infrun] reset: reason=handling event > (gdb) [infrun] fetch_inferior_event: exit > > Change-Id: I7f1c7610923435c4e98e70acc5ebe5ebbac581e2 > --- > gdb/infrun.c | 23 ++++++++++++++++++++++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > diff --git a/gdb/infrun.c b/gdb/infrun.c > index 09391d85256..21e5aa0f50e 100644 > --- a/gdb/infrun.c > +++ b/gdb/infrun.c > @@ -2595,7 +2595,28 @@ resume_1 (enum gdb_signal sig) > step = false; > } > > - CORE_ADDR pc = regcache_read_pc (regcache); > + CORE_ADDR pc = 0; I don't think we should be picking some arbitrary $pc value (0 in this case) and just using that as a default, instead, I think it would be better to change the type of pc to gdb::optional<CORE_ADDR>, and then update the rest of this function to only do the $pc relevant parts if we have a $pc value. > + try > + { > + pc = regcache_read_pc (regcache); > + } > + catch (const gdb_exception_error &err) > + { > + /* Swallow errors as it may be that the current thread exited > + and we've haven't seen its exit status yet. Let the > + resumption continue and we'll collect the exit event > + shortly. */ > + if (err.error == TARGET_CLOSE_ERROR) > + throw; > + > + if (debug_infrun) > + { > + string_file buf; > + exception_print (&buf, err); > + infrun_debug_printf ("resume: swallowing error: %s", > + buf.string ().c_str ()); > + } I guess this is the best we can probably do without changing the remote protocol. My worry would be that there could be other reasons that the read of $pc fails, which we are now just ignoring. It looks like you already ran into one such case with TARGET_CLOSE_ERROR, but maybe there's others? It almost feels like the ideal solution would invert the logic, so we could write: catch (const gdb_exception_error &err) { /* I just invent a new error type here... */ if (err.err != INFERIOR_EXITED_ERROR) throw; // ... etc ... } To use something like this we could have the H packet send back something other then "OK" when GDB asks to switch to a thread that has already exited, maybe send back the stop reply could be made to work? I say all that really just to check if you agree or not. I think for now I'd be happy to go with what you present here, I think the gains this series brings to GDB are worth some rough edges that we might want to address in the future. Would love to hear your thoughts, Thanks, Andrew > + } > > infrun_debug_printf ("step=%d, signal=%s, trap_expected=%d, " > "current thread [%s] at %s", > -- > 2.36.0
On 2023-06-10 11:33, Andrew Burgess wrote: > Pedro Alves <pedro@palves.net> writes: > >> If GDB sets a GDB_THREAD_OPTION_EXIT option on a thread, and the >> thread exits, the server reports the corresponding thread exit event, >> and forgets about the thread, i.e., removes the exited thread from its >> thread list. >> >> On the GDB side, GDB set the GDB_THREAD_OPTION_EXIT option on a >> thread, GDB delays deleting the thread from its thread list until it >> sees the corresponding thread exit event, as that event needs special >> handling in infrun. >> >> When a thread disappears from the target, but it still exists on GDB's >> thread list, in all-stop RSP mode, it can happen that GDB ends up >> trying to resume such an already-exited-thread that GDB doesn't yet >> know is gone. When that happens, against GDBserver, typically the >> ongoing execution command fails with this error: > > I'm slightly confused here. If GDB doesn't know the thread has exited > doesn't that mean the server hasn't yet reported the exit, and so should > be holding onto the thread? > > I wanted to investigate this a bit more to try and understand more about > what's going on, but I couldn't find a test that was triggering the code > added in this patch. Do you know if there's a test I can run to see > this issue? I think there was an existing testcase that would sometimes fail for this problem, but looks like I didn't write that anywhere, and now I don't remember... :-/ Sorry about this. I wasn't able to reproduce the problem in a few test runs, so I will drop this patch from the series for now until I find a better rationale, and we can discuss how to fix it then. Sorry again, and many thanks for the review and ideas. Pedro Alves
diff --git a/gdb/infrun.c b/gdb/infrun.c index 09391d85256..21e5aa0f50e 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -2595,7 +2595,28 @@ resume_1 (enum gdb_signal sig) step = false; } - CORE_ADDR pc = regcache_read_pc (regcache); + CORE_ADDR pc = 0; + try + { + pc = regcache_read_pc (regcache); + } + catch (const gdb_exception_error &err) + { + /* Swallow errors as it may be that the current thread exited + and we've haven't seen its exit status yet. Let the + resumption continue and we'll collect the exit event + shortly. */ + if (err.error == TARGET_CLOSE_ERROR) + throw; + + if (debug_infrun) + { + string_file buf; + exception_print (&buf, err); + infrun_debug_printf ("resume: swallowing error: %s", + buf.string ().c_str ()); + } + } infrun_debug_printf ("step=%d, signal=%s, trap_expected=%d, " "current thread [%s] at %s",