gdb/unwinders: better support for $pc not saved

Message ID 4d3be44a23e2e5853d966c081bccbeb751004310.1706366387.git.aburgess@redhat.com
State New
Headers
Series gdb/unwinders: better support for $pc not saved |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed

Commit Message

Andrew Burgess Jan. 27, 2024, 3:26 p.m. UTC
  This started with a Red Hat bug report which can be seen here:

  https://bugzilla.redhat.com/show_bug.cgi?id=1850710

The problem reported here was using GDB on GNU/Linux for S390, the
user stepped into JIT generated code.  As they enter the JIT code GDB
would report 'PC not saved', and this same message would be reported
after each step/stepi.

Additionally, the user had 'set disassemble-next-line on', and once
they entered the JIT code this output was not displayed, nor were any
'display' directives displayed.

The user is not making use of the JIT plugin API to provide debug
information.  But that's OK, they aren't expecting any source level
debug here, they are happy to use 'stepi', but the missing 'display'
directives are a problem, as is the constant 'PC not saved' (error)
message.

What is happening here is that as GDB is failing to find any debug
information for the JIT generated code, it is falling back on to the
S390 prologue unwinder to try and unwind frame #0.  Unfortunately,
without being able to identify the function boundaries, the S390
prologue scanner can't help much, in fact, it doesn't even suggest an
arbitrary previous $pc value (some targets that use a link-register
will, by default, assume the link-register contains the previous $pc),
instead the S390 will just say, "sorry, I have no previous $pc value".

The result of this is that when GDB tries to find frame #1 we end
throwing an error from frame_unwind_pc (the 'PC not saved' error).
This error is not caught anywhere except at the top-level interpreter
loop, and so we end up skipping all the 'display' directive handling.

While thinking about this, I wondered, could I trigger the same error
using the Python Unwinder API?  What happens if a Python unwinder
claims a frame, but then fails to provide a previous $pc value?

Turns out that exactly the same thing happens, which is great, as that
means we now have a way to reproduce this bug on any target.  And so
the test included with this patch does just this.  I have a Python
unwinder that claims a frame, but doesn't provide any previous
register values.

I then do two tests, first I stop in the claimed frame (i.e. frame #0
is the frame that can't be unwound), I perform a few steps, and check
the backtrace.  And second, I stop in a child of the problem
frame (i.e. frame #1 is the frame that can't be unwound), and from
here I check the backtrace.

While all this is going on I have a 'display' directive in place, and
each time GDB stops I check that the display directive triggers.

Additionally, when checking the backtrace, I am checking that the
backtrace finishes with the message 'Backtrace stopped: frame did not
save the PC'.

As for the fix I chose to add a call to frame_unwind_pc directly to
get_prev_frame_always_1.  Calling frame_unwind_pc will cache the
unwound $pc value, so this doesn't add much additional work as
immediately after the new frame_unwind_pc call, we call
get_prev_frame_maybe_check_cycle, which actually generates the
previous frame, which will always (I think) require a call to
frame_unwind_pc anyway.

The reason for adding the frame_unwind_pc call into
get_prev_frame_always_1, is that if the frame_unwind_pc call fails we
want to set the frames 'stop_reason', and get_prev_frame_always_1
seems to be the place where this is done, so I wanted to keep the new
stop_reason setting code next to all the existing stop_reason setting
code.

Additionally, once we enter get_prev_frame_maybe_check_cycle we
actually create the previous frame, then, if it turns out that the
previous frame can't be created we need to remove the frame .. this
seemed more complex than just making the check in
get_prev_frame_always_1.

With this fix in place the original S390 bug is fixed, and also the
test added in this commit, that uses the Python API, is also fixed.
---
 gdb/frame.c                             | 32 +++++++++++
 gdb/testsuite/gdb.base/pc-not-saved.c   | 48 +++++++++++++++++
 gdb/testsuite/gdb.base/pc-not-saved.exp | 70 +++++++++++++++++++++++++
 gdb/testsuite/gdb.base/pc-not-saved.py  | 50 ++++++++++++++++++
 4 files changed, 200 insertions(+)
 create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.c
 create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.exp
 create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.py


base-commit: 81b6f191f71fe0af5dd7b1c7c5b7737c3d249a66
  

Comments

Kevin Buettner Jan. 28, 2024, 8:56 p.m. UTC | #1
On Sat, 27 Jan 2024 15:26:32 +0000
Andrew Burgess <aburgess@redhat.com> wrote:

> This started with a Red Hat bug report which can be seen here:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1850710
> 
> The problem reported here was using GDB on GNU/Linux for S390, the
> user stepped into JIT generated code.  As they enter the JIT code GDB
> would report 'PC not saved', and this same message would be reported
> after each step/stepi.
> 
> Additionally, the user had 'set disassemble-next-line on', and once
> they entered the JIT code this output was not displayed, nor were any
> 'display' directives displayed.
> 
> The user is not making use of the JIT plugin API to provide debug
> information.  But that's OK, they aren't expecting any source level
> debug here, they are happy to use 'stepi', but the missing 'display'
> directives are a problem, as is the constant 'PC not saved' (error)
> message.
> 
> What is happening here is that as GDB is failing to find any debug
> information for the JIT generated code, it is falling back on to the
> S390 prologue unwinder to try and unwind frame #0.  Unfortunately,
> without being able to identify the function boundaries, the S390
> prologue scanner can't help much, in fact, it doesn't even suggest an
> arbitrary previous $pc value (some targets that use a link-register
> will, by default, assume the link-register contains the previous $pc),
> instead the S390 will just say, "sorry, I have no previous $pc value".
> 
> The result of this is that when GDB tries to find frame #1 we end
> throwing an error from frame_unwind_pc (the 'PC not saved' error).
> This error is not caught anywhere except at the top-level interpreter
> loop, and so we end up skipping all the 'display' directive handling.
> 
> While thinking about this, I wondered, could I trigger the same error
> using the Python Unwinder API?  What happens if a Python unwinder
> claims a frame, but then fails to provide a previous $pc value?
> 
> Turns out that exactly the same thing happens, which is great, as that
> means we now have a way to reproduce this bug on any target.  And so
> the test included with this patch does just this.  I have a Python
> unwinder that claims a frame, but doesn't provide any previous
> register values.
> 
> I then do two tests, first I stop in the claimed frame (i.e. frame #0
> is the frame that can't be unwound), I perform a few steps, and check
> the backtrace.  And second, I stop in a child of the problem
> frame (i.e. frame #1 is the frame that can't be unwound), and from
> here I check the backtrace.
> 
> While all this is going on I have a 'display' directive in place, and
> each time GDB stops I check that the display directive triggers.
> 
> Additionally, when checking the backtrace, I am checking that the
> backtrace finishes with the message 'Backtrace stopped: frame did not
> save the PC'.
> 
> As for the fix I chose to add a call to frame_unwind_pc directly to
> get_prev_frame_always_1.  Calling frame_unwind_pc will cache the
> unwound $pc value, so this doesn't add much additional work as
> immediately after the new frame_unwind_pc call, we call
> get_prev_frame_maybe_check_cycle, which actually generates the
> previous frame, which will always (I think) require a call to
> frame_unwind_pc anyway.
> 
> The reason for adding the frame_unwind_pc call into
> get_prev_frame_always_1, is that if the frame_unwind_pc call fails we
> want to set the frames 'stop_reason', and get_prev_frame_always_1
> seems to be the place where this is done, so I wanted to keep the new
> stop_reason setting code next to all the existing stop_reason setting
> code.
> 
> Additionally, once we enter get_prev_frame_maybe_check_cycle we
> actually create the previous frame, then, if it turns out that the
> previous frame can't be created we need to remove the frame .. this
> seemed more complex than just making the check in
> get_prev_frame_always_1.
> 
> With this fix in place the original S390 bug is fixed, and also the
> test added in this commit, that uses the Python API, is also fixed.

Thanks for the detailed commit log!

This approach looks reasonable to me.

Reviewed-by: Kevin Buettner <kevinb@redhat.com>
  
Keith Seitz Jan. 29, 2024, 4:49 p.m. UTC | #2
On 1/27/24 07:26, Andrew Burgess wrote:
> This started with a Red Hat bug report which can be seen here:
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=1850710
> 
> The problem reported here was using GDB on GNU/Linux for S390, the
> user stepped into JIT generated code.  As they enter the JIT code GDB
> would report 'PC not saved', and this same message would be reported
> after each step/stepi.

I really appreciate that you've looked into this. For several years
now, any interested user has seen my hack workaround (posted in 28313)!

I haven't dug into this yet, but I wanted to point out that on s390x,
the new test file comes with a FAIL (on unix, native-gdbserver, and
native-extended-gdbserver boards):


		=== gdb Summary ===

# of expected passes		6
# of unexpected failures	1

The failure from gdb.log (I'm testing through our internal beaker
test infra on RHEL9, so all tests are run (unfortunately) as root):

bt
#0  other_func () at 
/root/test-fsf-master/gdb/build-s390x-redhat-linux-gnu/gdb/
testsuite/../../../gdb/testsuite/gdb.base/pc-not-saved.c:24
#1  0x00000000010006ae in break_bt_here () at 
/root/test-fsf-master/gdb/build-s3
90x-redhat-linux-gnu/gdb/testsuite/../../../gdb/testsuite/gdb.base/pc-not-saved.
c:38
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) FAIL: gdb.base/pc-not-saved.exp: backtrace from other_func function

Keith
  

Patch

diff --git a/gdb/frame.c b/gdb/frame.c
index d95d63eb0f6..118d84b980e 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -2419,6 +2419,38 @@  get_prev_frame_always_1 (frame_info_ptr this_frame)
 	}
     }
 
+  /* Ensure we can unwind the program counter of THIS_FRAME.  */
+  try
+    {
+      /* Calling frame_unwind_pc for the sentinel frame relies on the
+	 current_frame being set, which at this point it might not be if we
+	 are in the process of setting the current_frame after a stop (see
+	 get_current_frame).
+
+	 The point of this check is to ensure that the unwinder for
+	 THIS_FRAME can actually unwind the $pc, which we assume the
+	 sentinel frame unwinder can always do (it's just a read from the
+	 machine state), so we only call frame_unwind_pc for frames other
+	 than the sentinel (level -1) frame.
+
+	 Additionally, we don't actually care about the value of the
+	 unwound $pc, just that the call completed successfully.  */
+      if (this_frame->level >= 0)
+	frame_unwind_pc (this_frame);
+    }
+  catch (const gdb_exception_error &ex)
+    {
+      if (ex.error == NOT_AVAILABLE_ERROR || ex.error == OPTIMIZED_OUT_ERROR)
+	{
+	  frame_debug_printf ("  -> nullptr // no saved PC");
+	  this_frame->stop_reason = UNWIND_NO_SAVED_PC;
+	  this_frame->prev = nullptr;
+	  return nullptr;
+	}
+
+      throw;
+    }
+
   return get_prev_frame_maybe_check_cycle (this_frame);
 }
 
diff --git a/gdb/testsuite/gdb.base/pc-not-saved.c b/gdb/testsuite/gdb.base/pc-not-saved.c
new file mode 100644
index 00000000000..bc6632a97d7
--- /dev/null
+++ b/gdb/testsuite/gdb.base/pc-not-saved.c
@@ -0,0 +1,48 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2024 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+volatile int global_var = 0;
+
+void
+other_func (void)
+{
+  /* Nothing.  */
+}
+
+void
+break_bt_here (void)
+{
+  /* This is all nonsense; just filler so this function has a body.  */
+  if (global_var != 99)
+    global_var++;
+  if (global_var != 98)
+    global_var++;
+  if (global_var != 97)
+    global_var++;
+  if (global_var != 96)
+    global_var++;
+  other_func ();
+  if (global_var != 95)
+    global_var++;
+}
+
+int
+main (void)
+{
+  break_bt_here ();
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/pc-not-saved.exp b/gdb/testsuite/gdb.base/pc-not-saved.exp
new file mode 100644
index 00000000000..3a55a22c7c1
--- /dev/null
+++ b/gdb/testsuite/gdb.base/pc-not-saved.exp
@@ -0,0 +1,70 @@ 
+# Copyright 2024 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test how GDB handles a frame in which the previous-pc value is not
+# available.  Specifically, check that the backtrace correctly reports
+# why the backtrace is truncated, and ensure that 'display' directives
+# still work when 'stepi'-ing through the frame.
+
+require allow_python_tests
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}] } {
+    return
+}
+
+set remote_python_file \
+    [gdb_remote_download host "${srcdir}/${subdir}/${testfile}.py"]
+gdb_test_no_output "source ${remote_python_file}" "load python file"
+
+if { ![runto "break_bt_here"] } {
+    return
+}
+
+gdb_test "bt" \
+    [multi_line \
+	 "^#0  break_bt_here \\(\\) at \[^\r\n\]+" \
+	 "Backtrace stopped: frame did not save the PC"] \
+    "backtrace from break_bt_here function"
+
+gdb_test "stepi" \
+    [multi_line \
+	 "^(:?$hex in )?break_bt_here \\(\\) at \[^\r\n\]+" \
+	 "$decimal\\s+\[^\r\n\]+"] \
+    "stepi without a display in place"
+
+gdb_test "display/i \$pc" \
+    [multi_line \
+	 "^1: x/i \\\$pc" \
+	 "=> $hex <break_bt_here(:?\\+$decimal)?>:\\s+\[^\r\n\]+"]
+
+gdb_test "stepi" \
+    [multi_line \
+	 "^(:?$hex in )?break_bt_here \\(\\) at \[^\r\n\]+" \
+	 "$decimal\\s+\[^\r\n\]+" \
+	 "1: x/i \\\$pc" \
+	 "=> $hex <break_bt_here(:?\\+$decimal)?>:\\s+\[^\r\n\]+"] \
+    "stepi with a display in place"
+
+gdb_breakpoint other_func
+gdb_continue_to_breakpoint "continue to other_func"
+
+gdb_test "bt" \
+    [multi_line \
+	 "#0  other_func \\(\\) at \[^\r\n\]+" \
+	 "#1  (:?$hex in )?break_bt_here \\(\\) at \[^\r\n\]+" \
+	 "Backtrace stopped: frame did not save the PC"] \
+    "backtrace from other_func function"
diff --git a/gdb/testsuite/gdb.base/pc-not-saved.py b/gdb/testsuite/gdb.base/pc-not-saved.py
new file mode 100644
index 00000000000..d2fb7b1ef1b
--- /dev/null
+++ b/gdb/testsuite/gdb.base/pc-not-saved.py
@@ -0,0 +1,50 @@ 
+# Copyright (C) 2024 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+from gdb.unwinder import Unwinder, FrameId
+
+class break_unwinding(Unwinder):
+    """An unwinder for the function 'break_bt_here'.  This unwinder will
+    claim any frame for the function in question, but doesn't provide
+    any unwound register values.  Importantly, we don't provide a
+    previous $pc value, this means that if we are stopped in
+    'break_bt_here' then we should fail to unwind beyond frame #0."""
+
+    def __init__(self):
+        Unwinder.__init__(self, "break unwinding")
+
+    def __call__(self, pending_frame):
+        pc_desc = pending_frame.architecture ().registers ().find ("pc")
+        pc = pending_frame.read_register (pc_desc)
+
+        sp_desc = pending_frame.architecture ().registers ().find ("sp")
+        sp = pending_frame.read_register (sp_desc)
+
+        if pc.is_optimized_out or sp.is_optimized_out:
+            return None
+
+        block = gdb.block_for_pc (pc)
+        if block == None:
+            return None
+        func = block.function
+        if func == None:
+            return None
+        if str (func) != "break_bt_here":
+            return None
+        fid = FrameId (sp, pc)
+        return pending_frame.create_unwind_info (fid)
+
+gdb.unwinder.register_unwinder(None, break_unwinding(), True)