[v4,2/2] gdb: raise and handle NOT_AVAILABLE_ERROR when accessing frame PC
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_check--master-arm |
fail
|
Testing failed
|
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 |
fail
|
Testing failed
|
Commit Message
This patch can be considered a continuation of
commit 4778a5f87d253399083565b4919816f541ebe414
Author: Tom de Vries <tdevries@suse.de>
Date: Tue Apr 21 15:45:57 2020 +0200
[gdb] Fix hang after ext sigkill
and
commit 47f1aceffa02be4726b854082d7587eb259136e0
Author: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
Date: Thu May 14 13:59:54 2020 +0200
gdb/infrun: handle already-exited threads when attempting to stop
If a process dies before GDB reports the exit error to the user, we
may see the "Couldn't get registers: No such process." error message
in various places. For instance:
(gdb) start
...
(gdb) info inferior
Num Description Connection Executable
* 1 process 31943 1 (native) /tmp/a.out
(gdb) shell kill -9 31943
(gdb) maintenance flush register-cache
Register cache flushed.
Couldn't get registers: No such process.
(gdb) info threads
Id Target Id Frame
* 1 process 31943 "a.out" Couldn't get registers: No such process.
(gdb) backtrace
Python Exception <class 'gdb.error'>: Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) inferior 1
Couldn't get registers: No such process.
(gdb) thread
[Current thread is 1 (process 31943)]
Couldn't get registers: No such process.
(gdb)
The gdb.threads/killed-outside.exp, gdb.multi/multi-kill.exp, and
gdb.multi/multi-exit.exp tests also check related scenarios.
To improve the situation,
1. when printing the frame info, catch and process a NOT_AVAILABLE_ERROR.
2. when accessing the target to fetch registers, if the operation
fails, raise a NOT_AVAILABLE_ERROR instead of a generic error, so
that clients can attempt to recover accordingly. This patch updates
the amd64_linux_nat_target and remote_target in this direction.
With this patch, we obtain the following behavior:
(gdb) start
...
(gdb) info inferior
Num Description Connection Executable
* 1 process 748 1 (native) /tmp/a.out
(gdb) shell kill -9 748
(gdb) maintenance flush register-cache
Register cache flushed.
(gdb) info threads
Id Target Id Frame
* 1 process 748 "a.out" <PC register is not available>
(gdb) backtrace
#0 <PC register is not available>
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) inferior 1
[Switching to inferior 1 [process 748] (/tmp/a.out)]
[Switching to thread 1 (process 748)]
#0 <PC register is not available>
(gdb) thread
[Current thread is 1 (process 748)]
(gdb)
Here is another "before/after" case. Suppose we have two inferiors,
each having its own remote target underneath. Before this patch, we
get the following output:
# Create two inferiors on two remote targets, resume both until
# termination. Exit event from one of them is shown first, but the
# other also exited -- just not yet shown.
(gdb) maint set target-non-stop on
(gdb) target remote | gdbserver - ./a.out
(gdb) add-inferior -no-connection
(gdb) inferior 2
(gdb) target remote | gdbserver - ./a.out
(gdb) set schedule-multiple on
(gdb) continue
...
[Inferior 2 (process 22127) exited normally]
(gdb) inferior 1
[Switching to inferior 1 [process 22111] (target:/tmp/a.out)]
[Switching to thread 1.1 (Thread 22111.22111)]
Could not read registers; remote failure reply 'E01'
(gdb) info threads
Id Target Id Frame
* 1.1 Thread 22111.22111 "a.out" Could not read registers; remote failure reply 'E01'
(gdb) backtrace
Python Exception <class 'gdb.error'>: Could not read registers; remote failure reply 'E01'
Could not read registers; remote failure reply 'E01'
(gdb) thread
[Current thread is 1.1 (Thread 22111.22111)]
Could not read registers; remote failure reply 'E01'
(gdb)
With this patch, it becomes:
...
[Inferior 1 (process 11759) exited normally]
(gdb) inferior 2
[Switching to inferior 2 [process 13440] (target:/path/to/a.out)]
[Switching to thread 2.1 (Thread 13440.13440)]
#0 <unavailable> in ?? ()
(gdb) info threads
Id Target Id Frame
* 2.1 Thread 13440.13440 "a.out" <unavailable> in ?? ()
(gdb) backtrace
#0 <unavailable> in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) thread
[Current thread is 2.1 (Thread 13440.13440)]
(gdb)
Finally, together with its predecessor, this patch also fixes PR gdb/26877.
Regression-tested on X86_64-Linux.
---
gdb/amd64-linux-nat.c | 5 +-
gdb/remote.c | 15 ++--
gdb/stack.c | 33 ++++++-
gdb/testsuite/gdb.threads/killed-outside.exp | 8 +-
.../gdb.tui/multi-exit-remove-inferior.c | 21 +++++
.../gdb.tui/multi-exit-remove-inferior.exp | 86 +++++++++++++++++++
6 files changed, 156 insertions(+), 12 deletions(-)
create mode 100644 gdb/testsuite/gdb.tui/multi-exit-remove-inferior.c
create mode 100644 gdb/testsuite/gdb.tui/multi-exit-remove-inferior.exp
Comments
On 12/18/23 6:40 AM, Tankut Baris Aktemur wrote:
> This patch can be considered a continuation of
>
> commit 4778a5f87d253399083565b4919816f541ebe414
> Author: Tom de Vries <tdevries@suse.de>
> Date: Tue Apr 21 15:45:57 2020 +0200
>
> [gdb] Fix hang after ext sigkill
>
> and
>
> commit 47f1aceffa02be4726b854082d7587eb259136e0
> Author: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
> Date: Thu May 14 13:59:54 2020 +0200
>
> gdb/infrun: handle already-exited threads when attempting to stop
>
> If a process dies before GDB reports the exit error to the user, we
> may see the "Couldn't get registers: No such process." error message
> in various places. For instance:
>
> (gdb) start
> ...
> (gdb) info inferior
> Num Description Connection Executable
> * 1 process 31943 1 (native) /tmp/a.out
> (gdb) shell kill -9 31943
> (gdb) maintenance flush register-cache
> Register cache flushed.
> Couldn't get registers: No such process.
> (gdb) info threads
> Id Target Id Frame
> * 1 process 31943 "a.out" Couldn't get registers: No such process.
> (gdb) backtrace
> Python Exception <class 'gdb.error'>: Couldn't get registers: No such process.
> Couldn't get registers: No such process.
> (gdb) inferior 1
> Couldn't get registers: No such process.
> (gdb) thread
> [Current thread is 1 (process 31943)]
> Couldn't get registers: No such process.
> (gdb)
>
> The gdb.threads/killed-outside.exp, gdb.multi/multi-kill.exp, and
> gdb.multi/multi-exit.exp tests also check related scenarios.
>
> To improve the situation,
>
> 1. when printing the frame info, catch and process a NOT_AVAILABLE_ERROR.
>
> 2. when accessing the target to fetch registers, if the operation
> fails, raise a NOT_AVAILABLE_ERROR instead of a generic error, so
> that clients can attempt to recover accordingly. This patch updates
> the amd64_linux_nat_target and remote_target in this direction.
>
> With this patch, we obtain the following behavior:
>
> (gdb) start
> ...
> (gdb) info inferior
> Num Description Connection Executable
> * 1 process 748 1 (native) /tmp/a.out
> (gdb) shell kill -9 748
> (gdb) maintenance flush register-cache
> Register cache flushed.
> (gdb) info threads
> Id Target Id Frame
> * 1 process 748 "a.out" <PC register is not available>
> (gdb) backtrace
> #0 <PC register is not available>
> Backtrace stopped: not enough registers or memory available to unwind further
> (gdb) inferior 1
> [Switching to inferior 1 [process 748] (/tmp/a.out)]
> [Switching to thread 1 (process 748)]
> #0 <PC register is not available>
> (gdb) thread
> [Current thread is 1 (process 748)]
> (gdb)
>
> Here is another "before/after" case. Suppose we have two inferiors,
> each having its own remote target underneath. Before this patch, we
> get the following output:
>
> # Create two inferiors on two remote targets, resume both until
> # termination. Exit event from one of them is shown first, but the
> # other also exited -- just not yet shown.
> (gdb) maint set target-non-stop on
> (gdb) target remote | gdbserver - ./a.out
> (gdb) add-inferior -no-connection
> (gdb) inferior 2
> (gdb) target remote | gdbserver - ./a.out
> (gdb) set schedule-multiple on
> (gdb) continue
> ...
> [Inferior 2 (process 22127) exited normally]
> (gdb) inferior 1
> [Switching to inferior 1 [process 22111] (target:/tmp/a.out)]
> [Switching to thread 1.1 (Thread 22111.22111)]
> Could not read registers; remote failure reply 'E01'
> (gdb) info threads
> Id Target Id Frame
> * 1.1 Thread 22111.22111 "a.out" Could not read registers; remote failure reply 'E01'
> (gdb) backtrace
> Python Exception <class 'gdb.error'>: Could not read registers; remote failure reply 'E01'
> Could not read registers; remote failure reply 'E01'
> (gdb) thread
> [Current thread is 1.1 (Thread 22111.22111)]
> Could not read registers; remote failure reply 'E01'
> (gdb)
>
> With this patch, it becomes:
>
> ...
> [Inferior 1 (process 11759) exited normally]
> (gdb) inferior 2
> [Switching to inferior 2 [process 13440] (target:/path/to/a.out)]
> [Switching to thread 2.1 (Thread 13440.13440)]
> #0 <unavailable> in ?? ()
> (gdb) info threads
> Id Target Id Frame
> * 2.1 Thread 13440.13440 "a.out" <unavailable> in ?? ()
> (gdb) backtrace
> #0 <unavailable> in ?? ()
> Backtrace stopped: not enough registers or memory available to unwind further
> (gdb) thread
> [Current thread is 2.1 (Thread 13440.13440)]
> (gdb)
>
> Finally, together with its predecessor, this patch also fixes PR gdb/26877.
>
> Regression-tested on X86_64-Linux.
> ---
> gdb/amd64-linux-nat.c | 5 +-
> gdb/remote.c | 15 ++--
> gdb/stack.c | 33 ++++++-
> gdb/testsuite/gdb.threads/killed-outside.exp | 8 +-
> .../gdb.tui/multi-exit-remove-inferior.c | 21 +++++
> .../gdb.tui/multi-exit-remove-inferior.exp | 86 +++++++++++++++++++
> 6 files changed, 156 insertions(+), 12 deletions(-)
> create mode 100644 gdb/testsuite/gdb.tui/multi-exit-remove-inferior.c
> create mode 100644 gdb/testsuite/gdb.tui/multi-exit-remove-inferior.exp
>
> diff --git a/gdb/amd64-linux-nat.c b/gdb/amd64-linux-nat.c
> index f7f9a483def..aa9b10c52d1 100644
> --- a/gdb/amd64-linux-nat.c
> +++ b/gdb/amd64-linux-nat.c
> @@ -223,7 +223,10 @@ amd64_linux_nat_target::fetch_registers (struct regcache *regcache, int regnum)
> elf_gregset_t regs;
>
> if (ptrace (PTRACE_GETREGS, tid, 0, (long) ®s) < 0)
> - perror_with_name (_("Couldn't get registers"));
> + {
> + std::string msg = perror_string (_("Couldn't get registers"));
> + throw_error (NOT_AVAILABLE_ERROR, "%s", msg.c_str ());
> + }
Should other nat backends for other OS's (and other arches) also make this change when
failing to fetch registers?
> Date: Wed, 20 Dec 2023 14:00:03 -0800
> From: John Baldwin <jhb@FreeBSD.org>
>
> > --- a/gdb/amd64-linux-nat.c
> > +++ b/gdb/amd64-linux-nat.c
> > @@ -223,7 +223,10 @@ amd64_linux_nat_target::fetch_registers (struct regcache *regcache, int regnum)
> > elf_gregset_t regs;
> >
> > if (ptrace (PTRACE_GETREGS, tid, 0, (long) ®s) < 0)
> > - perror_with_name (_("Couldn't get registers"));
> > + {
> > + std::string msg = perror_string (_("Couldn't get registers"));
> > + throw_error (NOT_AVAILABLE_ERROR, "%s", msg.c_str ());
> > + }
>
> Should other nat backends for other OS's (and other arches) also
> make this change when failing to fetch registers?
That's probably a good idea, yes.
But I also wonder whether the difference between
Couldn't get registers: No such process.
or
Could not read registers; remote failure reply 'E01'
and
<PC register is not available>
or
Backtrace stopped: not enough registers or memory available to unwind further
is such a significant improvement in terms of UX. Both the "before"
and "after" messages allude to issues that are extremely technical and
obscure, IMO. If anything, "No such process" sounds better to me than
"<PC register is not available>", because the former explicitly
explains the reason in high-level terms understood by anyone, while
the latter alludes to the PC register, which is a GDB abstraction, and
the fact that some register is not available doesn't necessarily tell
me what is wrong in practical terms.
IOW, it seems to me that, when we catch the error, we ought to produce
some meaningful message, and with this change we don't yet do that.
Does this make sense?
On Thursday, December 21, 2023 7:42 AM, Eli Zaretskii wrote:
> > Date: Wed, 20 Dec 2023 14:00:03 -0800
> > From: John Baldwin <jhb@FreeBSD.org>
> >
> > > --- a/gdb/amd64-linux-nat.c
> > > +++ b/gdb/amd64-linux-nat.c
> > > @@ -223,7 +223,10 @@ amd64_linux_nat_target::fetch_registers (struct regcache *regcache,
> int regnum)
> > > elf_gregset_t regs;
> > >
> > > if (ptrace (PTRACE_GETREGS, tid, 0, (long) ®s) < 0)
> > > - perror_with_name (_("Couldn't get registers"));
> > > + {
> > > + std::string msg = perror_string (_("Couldn't get registers"));
> > > + throw_error (NOT_AVAILABLE_ERROR, "%s", msg.c_str ());
> > > + }
> >
> > Should other nat backends for other OS's (and other arches) also
> > make this change when failing to fetch registers?
>
> That's probably a good idea, yes.
Yes. I didn't do that, however, because I don't have a chance to do regression
testing for the other targets.
> But I also wonder whether the difference between
>
> Couldn't get registers: No such process.
>
> or
>
> Could not read registers; remote failure reply 'E01'
>
> and
>
> <PC register is not available>
>
> or
>
> Backtrace stopped: not enough registers or memory available to unwind further
>
> is such a significant improvement in terms of UX. Both the "before"
> and "after" messages allude to issues that are extremely technical and
> obscure, IMO. If anything, "No such process" sounds better to me than
> "<PC register is not available>", because the former explicitly
> explains the reason in high-level terms understood by anyone, while
> the latter alludes to the PC register, which is a GDB abstraction, and
> the fact that some register is not available doesn't necessarily tell
> me what is wrong in practical terms.
>
> IOW, it seems to me that, when we catch the error, we ought to produce
> some meaningful message, and with this change we don't yet do that.
>
> Does this make sense?
We can remember the reason in regcache for why a particular register became
unavailable and use that reason in the error messages. With such a change,
the output for the first case would become
(gdb) start
...
(gdb) info inferior
Num Description Connection Executable
* 1 process 1017286 1 (native) /tmp/a.out
(gdb) shell kill -9 1017286
(gdb) maintenance flush register-cache
Register cache flushed.
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff7d87740 (LWP 1017286) "a.out" <Couldn't get registers: No such process>
(gdb) backtrace
#0 <Couldn't get registers: No such process>
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) inferior 1
[Switching to inferior 1 [process 1017286] (/tmp/a.out)]
[Switching to thread 1 (Thread 0x7ffff7d87740 (LWP 1017286))]
#0 <Couldn't get registers: No such process>
(gdb) thread
[Current thread is 1 (Thread 0x7ffff7d87740 (LWP 1017286))]
(gdb)
Does this look acceptable?
Regards
-Baris
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
@@ -223,7 +223,10 @@ amd64_linux_nat_target::fetch_registers (struct regcache *regcache, int regnum)
elf_gregset_t regs;
if (ptrace (PTRACE_GETREGS, tid, 0, (long) ®s) < 0)
- perror_with_name (_("Couldn't get registers"));
+ {
+ std::string msg = perror_string (_("Couldn't get registers"));
+ throw_error (NOT_AVAILABLE_ERROR, "%s", msg.c_str ());
+ }
amd64_supply_native_gregset (regcache, ®s, -1);
if (regnum != -1)
@@ -8760,9 +8760,13 @@ remote_target::fetch_register_using_p (struct regcache *regcache,
case PACKET_UNKNOWN:
return 0;
case PACKET_ERROR:
- error (_("Could not fetch register \"%s\"; remote failure reply '%s'"),
- gdbarch_register_name (regcache->arch (), reg->regnum),
- buf);
+ {
+ const char *regname = gdbarch_register_name (regcache->arch (),
+ reg->regnum);
+ throw_error (NOT_AVAILABLE_ERROR,
+ _("Could not fetch register \"%s\"; remote failure "
+ "reply '%s'"), regname, buf);
+ }
}
/* If this register is unfetchable, tell the regcache. */
@@ -8799,8 +8803,9 @@ remote_target::send_g_packet ()
putpkt (rs->buf);
getpkt (&rs->buf);
if (packet_check_result (rs->buf) == PACKET_ERROR)
- error (_("Could not read registers; remote failure reply '%s'"),
- rs->buf.data ());
+ throw_error (NOT_AVAILABLE_ERROR,
+ _("Could not read registers; remote failure reply '%s'"),
+ rs->buf.data ());
/* We can get out of synch in various cases. If the first character
in the buffer is not a hex character, assume that has happened
@@ -1098,7 +1098,38 @@ print_frame_info (const frame_print_options &fp_opts,
the next frame is a SIGTRAMP_FRAME or a DUMMY_FRAME, then the
next frame was not entered as the result of a call, and we want
to get the line containing FRAME->pc. */
- symtab_and_line sal = find_frame_sal (frame);
+ symtab_and_line sal;
+ try
+ {
+ sal = find_frame_sal (frame);
+ }
+ catch (const gdb_exception_error &ex)
+ {
+ if (ex.error == NOT_AVAILABLE_ERROR)
+ {
+ ui_out_emit_tuple tuple_emitter (uiout, "frame");
+
+ annotate_frame_begin (print_level ? frame_relative_level (frame) : 0,
+ gdbarch, 0);
+
+ if (print_level)
+ {
+ uiout->text ("#");
+ uiout->field_fmt_signed (2, ui_left, "level",
+ frame_relative_level (frame));
+ }
+
+ std::string frame_info = "<";
+ frame_info += ex.what ();
+ frame_info += ">";
+ uiout->field_string ("func", frame_info.c_str (),
+ metadata_style.style ());
+ uiout->text ("\n");
+ annotate_frame_end ();
+ return;
+ }
+ throw;
+ }
location_print = (print_what == LOCATION
|| print_what == SRC_AND_LOC
@@ -37,17 +37,15 @@ remote_exec target "kill -9 ${testpid}"
# Give it some time to die.
sleep 2
-set regs_msg "(Couldn't get registers|Unable to fetch general registers)"
-set no_such_process_msg "$regs_msg: No such process\."
+set error_msg "PC register is not available"
set killed_msg "Program terminated with signal SIGKILL, Killed\."
set no_longer_exists_msg "The program no longer exists\."
set not_being_run_msg "The program is not being run\."
gdb_test_multiple "continue" "prompt after first continue" {
- -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt " {
+ -re "Continuing\.\r\n$error_msg\r\n$gdb_prompt " {
pass $gdb_test_name
- # Saw $no_such_process_msg. The bug condition was triggered, go
- # check for it.
+ # The bug condition was triggered, go check for it.
gdb_test_multiple "" "messages" {
-re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
pass $gdb_test_name
new file mode 100644
@@ -0,0 +1,21 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+ Copyright 2020-2023 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+int main() {
+ int a = 42;
+ return 0; /* break-here */
+}
new file mode 100644
@@ -0,0 +1,86 @@
+# Copyright 2020-2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+# This is a regression test for PR gdb/26877.
+
+tuiterm_env
+
+standard_testfile
+
+if {[use_gdb_stub]} {
+ return 0
+}
+
+if {[build_executable "failed to prepare" ${testfile} ${srcfile}] == -1} {
+ return -1
+}
+
+# Make sure TUI is supported before continuing.
+with_test_prefix "initial check" {
+ Term::clean_restart 24 80 $testfile
+ if {![Term::enter_tui]} {
+ unsupported "TUI not supported"
+ return
+ }
+}
+
+Term::clean_restart 24 80 $testfile
+
+# Create a setting with two inferiors, where both are stopped
+# at a breakpoint at the end of main. Then resume both.
+set bp [gdb_get_line_number "break-here"]
+gdb_breakpoint "$bp"
+
+with_test_prefix "inferior 1" {
+ gdb_run_cmd
+ gdb_test "" ".*reakpoint \[^\r\n\]+${srcfile}.*" "run until bp"
+}
+
+with_test_prefix "inferior 2" {
+ gdb_test "add-inferior -exec [standard_output_file $testfile]" \
+ "Added inferior 2.*" "add inferior"
+ gdb_test "inferior 2" "Switching to inferior 2.*" "switch"
+ gdb_run_cmd
+ gdb_test "" ".*reakpoint \[^\r\n\]+${srcfile}.*" "run until bp"
+}
+
+gdb_test_no_output "set schedule-multiple on"
+gdb_continue_to_end
+
+# Find out which inferior is current. It is the inferior that exited.
+set exited_inf 1
+gdb_test_multiple "inferior" "current inferior" {
+ -re -wrap "Current inferior is ($decimal) .*" {
+ set exited_inf $expect_out(1,string)
+ pass $gdb_test_name
+ }
+}
+
+# Switch to the other inferior and remove the exited one.
+# Bad GDB used to crash when this is done under TUI.
+if {![Term::enter_tui]} {
+ unsupported "TUI not supported"
+ return
+}
+
+if {$exited_inf == 1} {
+ Term::command "inferior 2"
+} else {
+ Term::command "inferior 1"
+}
+
+Term::command "remove-inferiors $exited_inf"
+Term::command "info inferior $exited_inf"
+Term::check_contents "inferior is removed" "No inferiors."