[PATCHv2] gdb/unwinders: better support for $pc not saved

Message ID 2137e3edccdc997be04f3235718d772b5be97727.1706885900.git.aburgess@redhat.com
State New
Headers
Series [PATCHv2] gdb/unwinders: better support for $pc not saved |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed

Commit Message

Andrew Burgess Feb. 2, 2024, 3:19 p.m. UTC
  In V2 I've addressed the test failure that Keith Pointed out.  The
only changes between V1 and V2 are in the testsuite, the GDB side of
things is unchanged.

The problem is, I suspect caused by an S390 compiler bug, though I'm
not 100% sure, maybe I'm just missing something.  If there is a bug
then it's a weird little issue related to debug information
generation, not a functionality bug.

The test as originally written created a Python unwinder which claimed
any frame for the function break_bt_here.  The frame-id the unwinder
creates for the break_bt_here frame is just $pc and $sp value unwound
from the next frame.

This of course is a bad frame-id.  A frame-id should remain the same
for every address within a function, which is why GDB usually uses the
$pc for the start of a function.

Still, for the sake of this test it didn't really matter.  Or so I
thought.

The failure Keith pointed out only happened when we had a stack like
this:

  main -> break_bt_here -> other_func

My Python unwinder would claim the break_bt_here frame and create a
frame-id by unwinding the $pc and $sp values from `other_func`.

As other_func is a trivial (empty) leaf function, not every
architecture is going to need to allocate a stack frame for this
function.  For example, on S390 the return address is placed into r14
and not onto the stack.

As such, my expectation, for a stack grows down architecture, is that
the $sp value in other_func would be equal to, or less than the $sp
value in break_bt_here.

In my experience the DWARF Call Frame Address (CFA) is usually the
stack pointer value on entry to a function, so my expectation was that
the CFA of other_func would be equal to, or less than the unwound $sp
value in break_bt_here, so (I thought) I'd be fine to use the unwound
$sp value as the frame-id stack address in break_bt_here.

However, on S390, the DWARF CIA that covers other_func looks like this:

  00000000 0000000000000014 00000000 CIE
    Version:               1
    Augmentation:          "zR"
    Code alignment factor: 1
    Data alignment factor: -8
    Return address column: 14
    Augmentation data:     1b
    DW_CFA_def_cfa: r15 ofs 160
    DW_CFA_undefined: r14
    DW_CFA_nop

While the FDE for other_func is:

  00000070 0000000000000018 00000048 FDE cie=0000002c pc=00000000010011a0..00000000010011b0
    DW_CFA_advance_loc: 4 to 00000000010011a4
    DW_CFA_register: r11 in r16 (f0)
    DW_CFA_advance_loc: 4 to 00000000010011a8
    DW_CFA_def_cfa_register: r11
    DW_CFA_advance_loc: 6 to 00000000010011ae
    DW_CFA_restore: r11
    DW_CFA_def_cfa_register: r15

Notice the 'DW_CFA_def_cfa: r15 ofs 160' in the CIE.  This sets up a
default offset of 160 bytes for the CFA.

I don't believe there's anything really "wrong" with this, but it does
seem weird.  It means that the stack-address within the frame-id for
other_func will be 160 bytes higher than the $sp value on entry to the
function other_func.

Now this isn't normally a problem as break_bt_here, when using the
DWARF unwinder rather than my custom Python unwinder will also have a
160 byte offset, so the frame-id for break_bt_here will still appear
to be for an earlier stack frame.

However, my custom Python unwinder doesn't understand this "magic" 160
byte offset, and instead just uses the unwound stack value.

Which means that when I have a stack like this:

  main -> break_bt_here -> other_func

The stack address in the frame-id for break_bt_here is actually less
than the stack address in the frame-id for other_func.  This then
triggers frame_id_inner check within get_prev_frame_always_1 and GDB
things something has gone wrong with the stack unwind, leading to the
failure Keith reported.

My solution to this problem is to run up to break_bt_here without the
Python unwinder loaded.  Use 'maint print frame-id' to capture the
frame-id generated from the DWARF unwinder, and then restart GDB and
load the Python unwinder.

I can then tell the Python unwinder the contents of the DWARF generate
frame-id, and the Python unwinder will report this as its frame-id.
For x86-64 very little changes.  But for S390 I now use a stack
address that includes the "magic" 160 byte offset, and everything is
fine.

---

This started with a Red Hat bug report which can be seen here:

  https://bugzilla.redhat.com/show_bug.cgi?id=1850710

The problem reported here was using GDB on GNU/Linux for S390, the
user stepped into JIT generated code.  As they enter the JIT code GDB
would report 'PC not saved', and this same message would be reported
after each step/stepi.

Additionally, the user had 'set disassemble-next-line on', and once
they entered the JIT code this output was not displayed, nor were any
'display' directives displayed.

The user is not making use of the JIT plugin API to provide debug
information.  But that's OK, they aren't expecting any source level
debug here, they are happy to use 'stepi', but the missing 'display'
directives are a problem, as is the constant 'PC not saved' (error)
message.

What is happening here is that as GDB is failing to find any debug
information for the JIT generated code, it is falling back on to the
S390 prologue unwinder to try and unwind frame #0.  Unfortunately,
without being able to identify the function boundaries, the S390
prologue scanner can't help much, in fact, it doesn't even suggest an
arbitrary previous $pc value (some targets that use a link-register
will, by default, assume the link-register contains the previous $pc),
instead the S390 will just say, "sorry, I have no previous $pc value".

The result of this is that when GDB tries to find frame #1 we end
throwing an error from frame_unwind_pc (the 'PC not saved' error).
This error is not caught anywhere except at the top-level interpreter
loop, and so we end up skipping all the 'display' directive handling.

While thinking about this, I wondered, could I trigger the same error
using the Python Unwinder API?  What happens if a Python unwinder
claims a frame, but then fails to provide a previous $pc value?

Turns out that exactly the same thing happens, which is great, as that
means we now have a way to reproduce this bug on any target.  And so
the test included with this patch does just this.  I have a Python
unwinder that claims a frame, but doesn't provide any previous
register values.

I then do two tests, first I stop in the claimed frame (i.e. frame #0
is the frame that can't be unwound), I perform a few steps, and check
the backtrace.  And second, I stop in a child of the problem
frame (i.e. frame #1 is the frame that can't be unwound), and from
here I check the backtrace.

While all this is going on I have a 'display' directive in place, and
each time GDB stops I check that the display directive triggers.

Additionally, when checking the backtrace, I am checking that the
backtrace finishes with the message 'Backtrace stopped: frame did not
save the PC'.

As for the fix I chose to add a call to frame_unwind_pc directly to
get_prev_frame_always_1.  Calling frame_unwind_pc will cache the
unwound $pc value, so this doesn't add much additional work as
immediately after the new frame_unwind_pc call, we call
get_prev_frame_maybe_check_cycle, which actually generates the
previous frame, which will always (I think) require a call to
frame_unwind_pc anyway.

The reason for adding the frame_unwind_pc call into
get_prev_frame_always_1, is that if the frame_unwind_pc call fails we
want to set the frames 'stop_reason', and get_prev_frame_always_1
seems to be the place where this is done, so I wanted to keep the new
stop_reason setting code next to all the existing stop_reason setting
code.

Additionally, once we enter get_prev_frame_maybe_check_cycle we
actually create the previous frame, then, if it turns out that the
previous frame can't be created we need to remove the frame .. this
seemed more complex than just making the check in
get_prev_frame_always_1.

With this fix in place the original S390 bug is fixed, and also the
test added in this commit, that uses the Python API, is also fixed.

Reviewed-By: Kevin Buettner <kevinb@redhat.com>
---
 gdb/frame.c                             |  32 +++++++
 gdb/testsuite/gdb.base/pc-not-saved.c   |  48 ++++++++++
 gdb/testsuite/gdb.base/pc-not-saved.exp | 113 ++++++++++++++++++++++++
 gdb/testsuite/gdb.base/pc-not-saved.py  |  71 +++++++++++++++
 4 files changed, 264 insertions(+)
 create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.c
 create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.exp
 create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.py


base-commit: 05d1b4b4ad7d74a64cc71c53d621241fc393fcb6
  

Comments

Andrew Burgess March 5, 2024, 4:41 p.m. UTC | #1
Ping!

If there's no further feedback then I plan to push this patch in a week
or so.

Thanks,
Andrew



Andrew Burgess <aburgess@redhat.com> writes:

> In V2 I've addressed the test failure that Keith Pointed out.  The
> only changes between V1 and V2 are in the testsuite, the GDB side of
> things is unchanged.
>
> The problem is, I suspect caused by an S390 compiler bug, though I'm
> not 100% sure, maybe I'm just missing something.  If there is a bug
> then it's a weird little issue related to debug information
> generation, not a functionality bug.
>
> The test as originally written created a Python unwinder which claimed
> any frame for the function break_bt_here.  The frame-id the unwinder
> creates for the break_bt_here frame is just $pc and $sp value unwound
> from the next frame.
>
> This of course is a bad frame-id.  A frame-id should remain the same
> for every address within a function, which is why GDB usually uses the
> $pc for the start of a function.
>
> Still, for the sake of this test it didn't really matter.  Or so I
> thought.
>
> The failure Keith pointed out only happened when we had a stack like
> this:
>
>   main -> break_bt_here -> other_func
>
> My Python unwinder would claim the break_bt_here frame and create a
> frame-id by unwinding the $pc and $sp values from `other_func`.
>
> As other_func is a trivial (empty) leaf function, not every
> architecture is going to need to allocate a stack frame for this
> function.  For example, on S390 the return address is placed into r14
> and not onto the stack.
>
> As such, my expectation, for a stack grows down architecture, is that
> the $sp value in other_func would be equal to, or less than the $sp
> value in break_bt_here.
>
> In my experience the DWARF Call Frame Address (CFA) is usually the
> stack pointer value on entry to a function, so my expectation was that
> the CFA of other_func would be equal to, or less than the unwound $sp
> value in break_bt_here, so (I thought) I'd be fine to use the unwound
> $sp value as the frame-id stack address in break_bt_here.
>
> However, on S390, the DWARF CIA that covers other_func looks like this:
>
>   00000000 0000000000000014 00000000 CIE
>     Version:               1
>     Augmentation:          "zR"
>     Code alignment factor: 1
>     Data alignment factor: -8
>     Return address column: 14
>     Augmentation data:     1b
>     DW_CFA_def_cfa: r15 ofs 160
>     DW_CFA_undefined: r14
>     DW_CFA_nop
>
> While the FDE for other_func is:
>
>   00000070 0000000000000018 00000048 FDE cie=0000002c pc=00000000010011a0..00000000010011b0
>     DW_CFA_advance_loc: 4 to 00000000010011a4
>     DW_CFA_register: r11 in r16 (f0)
>     DW_CFA_advance_loc: 4 to 00000000010011a8
>     DW_CFA_def_cfa_register: r11
>     DW_CFA_advance_loc: 6 to 00000000010011ae
>     DW_CFA_restore: r11
>     DW_CFA_def_cfa_register: r15
>
> Notice the 'DW_CFA_def_cfa: r15 ofs 160' in the CIE.  This sets up a
> default offset of 160 bytes for the CFA.
>
> I don't believe there's anything really "wrong" with this, but it does
> seem weird.  It means that the stack-address within the frame-id for
> other_func will be 160 bytes higher than the $sp value on entry to the
> function other_func.
>
> Now this isn't normally a problem as break_bt_here, when using the
> DWARF unwinder rather than my custom Python unwinder will also have a
> 160 byte offset, so the frame-id for break_bt_here will still appear
> to be for an earlier stack frame.
>
> However, my custom Python unwinder doesn't understand this "magic" 160
> byte offset, and instead just uses the unwound stack value.
>
> Which means that when I have a stack like this:
>
>   main -> break_bt_here -> other_func
>
> The stack address in the frame-id for break_bt_here is actually less
> than the stack address in the frame-id for other_func.  This then
> triggers frame_id_inner check within get_prev_frame_always_1 and GDB
> things something has gone wrong with the stack unwind, leading to the
> failure Keith reported.
>
> My solution to this problem is to run up to break_bt_here without the
> Python unwinder loaded.  Use 'maint print frame-id' to capture the
> frame-id generated from the DWARF unwinder, and then restart GDB and
> load the Python unwinder.
>
> I can then tell the Python unwinder the contents of the DWARF generate
> frame-id, and the Python unwinder will report this as its frame-id.
> For x86-64 very little changes.  But for S390 I now use a stack
> address that includes the "magic" 160 byte offset, and everything is
> fine.
>
> ---
>
> This started with a Red Hat bug report which can be seen here:
>
>   https://bugzilla.redhat.com/show_bug.cgi?id=1850710
>
> The problem reported here was using GDB on GNU/Linux for S390, the
> user stepped into JIT generated code.  As they enter the JIT code GDB
> would report 'PC not saved', and this same message would be reported
> after each step/stepi.
>
> Additionally, the user had 'set disassemble-next-line on', and once
> they entered the JIT code this output was not displayed, nor were any
> 'display' directives displayed.
>
> The user is not making use of the JIT plugin API to provide debug
> information.  But that's OK, they aren't expecting any source level
> debug here, they are happy to use 'stepi', but the missing 'display'
> directives are a problem, as is the constant 'PC not saved' (error)
> message.
>
> What is happening here is that as GDB is failing to find any debug
> information for the JIT generated code, it is falling back on to the
> S390 prologue unwinder to try and unwind frame #0.  Unfortunately,
> without being able to identify the function boundaries, the S390
> prologue scanner can't help much, in fact, it doesn't even suggest an
> arbitrary previous $pc value (some targets that use a link-register
> will, by default, assume the link-register contains the previous $pc),
> instead the S390 will just say, "sorry, I have no previous $pc value".
>
> The result of this is that when GDB tries to find frame #1 we end
> throwing an error from frame_unwind_pc (the 'PC not saved' error).
> This error is not caught anywhere except at the top-level interpreter
> loop, and so we end up skipping all the 'display' directive handling.
>
> While thinking about this, I wondered, could I trigger the same error
> using the Python Unwinder API?  What happens if a Python unwinder
> claims a frame, but then fails to provide a previous $pc value?
>
> Turns out that exactly the same thing happens, which is great, as that
> means we now have a way to reproduce this bug on any target.  And so
> the test included with this patch does just this.  I have a Python
> unwinder that claims a frame, but doesn't provide any previous
> register values.
>
> I then do two tests, first I stop in the claimed frame (i.e. frame #0
> is the frame that can't be unwound), I perform a few steps, and check
> the backtrace.  And second, I stop in a child of the problem
> frame (i.e. frame #1 is the frame that can't be unwound), and from
> here I check the backtrace.
>
> While all this is going on I have a 'display' directive in place, and
> each time GDB stops I check that the display directive triggers.
>
> Additionally, when checking the backtrace, I am checking that the
> backtrace finishes with the message 'Backtrace stopped: frame did not
> save the PC'.
>
> As for the fix I chose to add a call to frame_unwind_pc directly to
> get_prev_frame_always_1.  Calling frame_unwind_pc will cache the
> unwound $pc value, so this doesn't add much additional work as
> immediately after the new frame_unwind_pc call, we call
> get_prev_frame_maybe_check_cycle, which actually generates the
> previous frame, which will always (I think) require a call to
> frame_unwind_pc anyway.
>
> The reason for adding the frame_unwind_pc call into
> get_prev_frame_always_1, is that if the frame_unwind_pc call fails we
> want to set the frames 'stop_reason', and get_prev_frame_always_1
> seems to be the place where this is done, so I wanted to keep the new
> stop_reason setting code next to all the existing stop_reason setting
> code.
>
> Additionally, once we enter get_prev_frame_maybe_check_cycle we
> actually create the previous frame, then, if it turns out that the
> previous frame can't be created we need to remove the frame .. this
> seemed more complex than just making the check in
> get_prev_frame_always_1.
>
> With this fix in place the original S390 bug is fixed, and also the
> test added in this commit, that uses the Python API, is also fixed.
>
> Reviewed-By: Kevin Buettner <kevinb@redhat.com>
> ---
>  gdb/frame.c                             |  32 +++++++
>  gdb/testsuite/gdb.base/pc-not-saved.c   |  48 ++++++++++
>  gdb/testsuite/gdb.base/pc-not-saved.exp | 113 ++++++++++++++++++++++++
>  gdb/testsuite/gdb.base/pc-not-saved.py  |  71 +++++++++++++++
>  4 files changed, 264 insertions(+)
>  create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.c
>  create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.exp
>  create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.py
>
> diff --git a/gdb/frame.c b/gdb/frame.c
> index fae89cbbc0a..9ed4515d4d5 100644
> --- a/gdb/frame.c
> +++ b/gdb/frame.c
> @@ -2419,6 +2419,38 @@ get_prev_frame_always_1 (frame_info_ptr this_frame)
>  	}
>      }
>  
> +  /* Ensure we can unwind the program counter of THIS_FRAME.  */
> +  try
> +    {
> +      /* Calling frame_unwind_pc for the sentinel frame relies on the
> +	 current_frame being set, which at this point it might not be if we
> +	 are in the process of setting the current_frame after a stop (see
> +	 get_current_frame).
> +
> +	 The point of this check is to ensure that the unwinder for
> +	 THIS_FRAME can actually unwind the $pc, which we assume the
> +	 sentinel frame unwinder can always do (it's just a read from the
> +	 machine state), so we only call frame_unwind_pc for frames other
> +	 than the sentinel (level -1) frame.
> +
> +	 Additionally, we don't actually care about the value of the
> +	 unwound $pc, just that the call completed successfully.  */
> +      if (this_frame->level >= 0)
> +	frame_unwind_pc (this_frame);
> +    }
> +  catch (const gdb_exception_error &ex)
> +    {
> +      if (ex.error == NOT_AVAILABLE_ERROR || ex.error == OPTIMIZED_OUT_ERROR)
> +	{
> +	  frame_debug_printf ("  -> nullptr // no saved PC");
> +	  this_frame->stop_reason = UNWIND_NO_SAVED_PC;
> +	  this_frame->prev = nullptr;
> +	  return nullptr;
> +	}
> +
> +      throw;
> +    }
> +
>    return get_prev_frame_maybe_check_cycle (this_frame);
>  }
>  
> diff --git a/gdb/testsuite/gdb.base/pc-not-saved.c b/gdb/testsuite/gdb.base/pc-not-saved.c
> new file mode 100644
> index 00000000000..bc6632a97d7
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/pc-not-saved.c
> @@ -0,0 +1,48 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2024 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +volatile int global_var = 0;
> +
> +void
> +other_func (void)
> +{
> +  /* Nothing.  */
> +}
> +
> +void
> +break_bt_here (void)
> +{
> +  /* This is all nonsense; just filler so this function has a body.  */
> +  if (global_var != 99)
> +    global_var++;
> +  if (global_var != 98)
> +    global_var++;
> +  if (global_var != 97)
> +    global_var++;
> +  if (global_var != 96)
> +    global_var++;
> +  other_func ();
> +  if (global_var != 95)
> +    global_var++;
> +}
> +
> +int
> +main (void)
> +{
> +  break_bt_here ();
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.base/pc-not-saved.exp b/gdb/testsuite/gdb.base/pc-not-saved.exp
> new file mode 100644
> index 00000000000..f267a269f1a
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/pc-not-saved.exp
> @@ -0,0 +1,113 @@
> +# Copyright 2024 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test how GDB handles a frame in which the previous-pc value is not
> +# available.  Specifically, check that the backtrace correctly reports
> +# why the backtrace is truncated, and ensure that 'display' directives
> +# still work when 'stepi'-ing through the frame.
> +#
> +# We do this by registering a Python unwinder which doesn't provide
> +# any previous register values.
> +
> +require allow_python_tests
> +
> +standard_testfile
> +
> +if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}] } {
> +    return
> +}
> +
> +set remote_python_file \
> +    [gdb_remote_download host "${srcdir}/${subdir}/${testfile}.py"]
> +
> +if { ![runto "break_bt_here"] } {
> +    return
> +}
> +
> +# Figuring out the correct frame-id from a Python unwinder is hard.
> +# We need to know the function's start address (not too hard), and the
> +# stack address on entry to the function, which is much harder to
> +# figure out in a cross-target way.
> +#
> +# So instead we run without any Python unwinder in place and use
> +# 'maint print frame-id' to record the frame-id.  We then restart GDB,
> +# load the Python unwinder, and tell it to use the frame-id we
> +# recorded here.
> +set pc unknown
> +set cfa unknown
> +gdb_test_multiple "maintenance print frame-id" "store break_bt_here frame-id" {
> +    -re -wrap "frame-id for frame #0: \\{stack=($hex),code=($hex),\[^\}\]+\\}" {
> +	set cfa $expect_out(1,string)
> +	set pc $expect_out(1,string)
> +    }
> +}
> +gdb_assert { ![string equal $cfa unknown] } \
> +    "check we read the frame's CFA"
> +
> +gdb_assert { ![string equal $pc unknown] } \
> +    "check we read the frame's PC"
> +
> +# Restart and load the Python unwinder script.
> +clean_restart $binfile
> +gdb_test_no_output "source ${remote_python_file}" "load python file"
> +
> +# Tell the Python unwinder to use the frame-id we cached above.
> +gdb_test_no_output "python set_break_bt_here_frame_id($pc, $cfa)"
> +
> +# Run up to the function which the unwinder will claim.
> +if { ![runto "break_bt_here"] } {
> +    return
> +}
> +
> +# Print the backtrace.  Check that the reason for stopping the
> +# backtrace is that the previous $pc is not available.
> +gdb_test "bt" \
> +    [multi_line \
> +	 "^#0  break_bt_here \\(\\) at \[^\r\n\]+" \
> +	 "Backtrace stopped: frame did not save the PC"] \
> +    "backtrace from break_bt_here function"
> +
> +# Ensure we can stepi.
> +gdb_test "stepi" \
> +    "(:?$hex\\s+)?$decimal\\s+\[^\r\n\]+" \
> +    "stepi without a display in place"
> +
> +# Setup a 'display' directive.
> +gdb_test "display/i \$pc" \
> +    [multi_line \
> +	 "^1: x/i \\\$pc" \
> +	 "=> $hex <break_bt_here(:?\\+$decimal)?>:\\s+\[^\r\n\]+"]
> +
> +# Step again, check the 'display' directive is shown.
> +gdb_test "stepi" \
> +    [multi_line \
> +	 "(:?$hex\\s+)?$decimal\\s+\[^\r\n\]+" \
> +	 "1: x/i \\\$pc" \
> +	 "=> $hex <break_bt_here(:?\\+$decimal)?>:\\s+\[^\r\n\]+"] \
> +    "stepi with a display in place"
> +
> +# Continue to a function that is called from within break_bt_here.
> +# The Python unwinder will then be claiming frame #1.
> +gdb_breakpoint other_func
> +gdb_continue_to_breakpoint "continue to other_func"
> +
> +# Print the backtrace and check that the reason for stopping the
> +# backtrace is that the previous $pc is not available.
> +gdb_test "bt" \
> +    [multi_line \
> +	 "#0  other_func \\(\\) at \[^\r\n\]+" \
> +	 "#1  (:?$hex in )?break_bt_here \\(\\) at \[^\r\n\]+" \
> +	 "Backtrace stopped: frame did not save the PC"] \
> +    "backtrace from other_func function"
> diff --git a/gdb/testsuite/gdb.base/pc-not-saved.py b/gdb/testsuite/gdb.base/pc-not-saved.py
> new file mode 100644
> index 00000000000..65a8e764885
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/pc-not-saved.py
> @@ -0,0 +1,71 @@
> +# Copyright (C) 2024 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +import gdb
> +from gdb.unwinder import Unwinder, FrameId
> +
> +# Cached FrameId.  See set_break_bt_here_frame_id for details.
> +break_bt_here_frame_id = None
> +
> +
> +def set_break_bt_here_frame_id(pc, cfa):
> +    """Call this to pre-calculate the FrameId for the frame our unwinder
> +    is going to claim, this avoids us having to actually figure out a
> +    frame-id within the unwinder, something which is going to be hard
> +    to do in a cross-target way.
> +
> +    Instead we first run the test without the Python unwinder in
> +    place, use 'maint print frame-id' to record the frame-id, then,
> +    after loading this Python script, we all this function to record
> +    the frame-id that the unwinder should use."""
> +    global break_bt_here_frame_id
> +    break_bt_here_frame_id = FrameId(cfa, pc)
> +
> +
> +class break_unwinding(Unwinder):
> +
> +    """An unwinder for the function 'break_bt_here'.  This unwinder will
> +    claim any frame for the function in question, but doesn't provide
> +    any unwound register values.  Importantly, we don't provide a
> +    previous $pc value, this means that if we are stopped in
> +    'break_bt_here' then we should fail to unwind beyond frame #0."""
> +
> +    def __init__(self):
> +        Unwinder.__init__(self, "break unwinding")
> +
> +    def __call__(self, pending_frame):
> +        pc_desc = pending_frame.architecture().registers().find("pc")
> +        pc = pending_frame.read_register(pc_desc)
> +
> +        if pc.is_optimized_out:
> +            return None
> +
> +        block = gdb.block_for_pc(pc)
> +        if block == None:
> +            return None
> +        func = block.function
> +        if func == None:
> +            return None
> +        if str(func) != "break_bt_here":
> +            return None
> +
> +        global break_bt_here_frame_id
> +        if break_bt_here_frame_id is None:
> +            return None
> +
> +        return pending_frame.create_unwind_info(break_bt_here_frame_id)
> +
> +
> +gdb.unwinder.register_unwinder(None, break_unwinding(), True)
>
> base-commit: 05d1b4b4ad7d74a64cc71c53d621241fc393fcb6
> -- 
> 2.25.4
  
Keith Seitz March 7, 2024, 2:43 p.m. UTC | #2
On 3/5/24 08:41, Andrew Burgess wrote:
> 
> Ping!
> 
> If there's no further feedback then I plan to push this patch in a week
> or so.

Thank you for digging into the s390x failure. LGTM.

Ketih
  
Andrew Burgess March 11, 2024, 10:07 a.m. UTC | #3
Keith Seitz <keiths@redhat.com> writes:

> On 3/5/24 08:41, Andrew Burgess wrote:
>> 
>> Ping!
>> 
>> If there's no further feedback then I plan to push this patch in a week
>> or so.
>
> Thank you for digging into the s390x failure. LGTM.

I pushed this patch.

Thanks,
Andrew
  

Patch

diff --git a/gdb/frame.c b/gdb/frame.c
index fae89cbbc0a..9ed4515d4d5 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -2419,6 +2419,38 @@  get_prev_frame_always_1 (frame_info_ptr this_frame)
 	}
     }
 
+  /* Ensure we can unwind the program counter of THIS_FRAME.  */
+  try
+    {
+      /* Calling frame_unwind_pc for the sentinel frame relies on the
+	 current_frame being set, which at this point it might not be if we
+	 are in the process of setting the current_frame after a stop (see
+	 get_current_frame).
+
+	 The point of this check is to ensure that the unwinder for
+	 THIS_FRAME can actually unwind the $pc, which we assume the
+	 sentinel frame unwinder can always do (it's just a read from the
+	 machine state), so we only call frame_unwind_pc for frames other
+	 than the sentinel (level -1) frame.
+
+	 Additionally, we don't actually care about the value of the
+	 unwound $pc, just that the call completed successfully.  */
+      if (this_frame->level >= 0)
+	frame_unwind_pc (this_frame);
+    }
+  catch (const gdb_exception_error &ex)
+    {
+      if (ex.error == NOT_AVAILABLE_ERROR || ex.error == OPTIMIZED_OUT_ERROR)
+	{
+	  frame_debug_printf ("  -> nullptr // no saved PC");
+	  this_frame->stop_reason = UNWIND_NO_SAVED_PC;
+	  this_frame->prev = nullptr;
+	  return nullptr;
+	}
+
+      throw;
+    }
+
   return get_prev_frame_maybe_check_cycle (this_frame);
 }
 
diff --git a/gdb/testsuite/gdb.base/pc-not-saved.c b/gdb/testsuite/gdb.base/pc-not-saved.c
new file mode 100644
index 00000000000..bc6632a97d7
--- /dev/null
+++ b/gdb/testsuite/gdb.base/pc-not-saved.c
@@ -0,0 +1,48 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2024 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+volatile int global_var = 0;
+
+void
+other_func (void)
+{
+  /* Nothing.  */
+}
+
+void
+break_bt_here (void)
+{
+  /* This is all nonsense; just filler so this function has a body.  */
+  if (global_var != 99)
+    global_var++;
+  if (global_var != 98)
+    global_var++;
+  if (global_var != 97)
+    global_var++;
+  if (global_var != 96)
+    global_var++;
+  other_func ();
+  if (global_var != 95)
+    global_var++;
+}
+
+int
+main (void)
+{
+  break_bt_here ();
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/pc-not-saved.exp b/gdb/testsuite/gdb.base/pc-not-saved.exp
new file mode 100644
index 00000000000..f267a269f1a
--- /dev/null
+++ b/gdb/testsuite/gdb.base/pc-not-saved.exp
@@ -0,0 +1,113 @@ 
+# Copyright 2024 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test how GDB handles a frame in which the previous-pc value is not
+# available.  Specifically, check that the backtrace correctly reports
+# why the backtrace is truncated, and ensure that 'display' directives
+# still work when 'stepi'-ing through the frame.
+#
+# We do this by registering a Python unwinder which doesn't provide
+# any previous register values.
+
+require allow_python_tests
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}] } {
+    return
+}
+
+set remote_python_file \
+    [gdb_remote_download host "${srcdir}/${subdir}/${testfile}.py"]
+
+if { ![runto "break_bt_here"] } {
+    return
+}
+
+# Figuring out the correct frame-id from a Python unwinder is hard.
+# We need to know the function's start address (not too hard), and the
+# stack address on entry to the function, which is much harder to
+# figure out in a cross-target way.
+#
+# So instead we run without any Python unwinder in place and use
+# 'maint print frame-id' to record the frame-id.  We then restart GDB,
+# load the Python unwinder, and tell it to use the frame-id we
+# recorded here.
+set pc unknown
+set cfa unknown
+gdb_test_multiple "maintenance print frame-id" "store break_bt_here frame-id" {
+    -re -wrap "frame-id for frame #0: \\{stack=($hex),code=($hex),\[^\}\]+\\}" {
+	set cfa $expect_out(1,string)
+	set pc $expect_out(1,string)
+    }
+}
+gdb_assert { ![string equal $cfa unknown] } \
+    "check we read the frame's CFA"
+
+gdb_assert { ![string equal $pc unknown] } \
+    "check we read the frame's PC"
+
+# Restart and load the Python unwinder script.
+clean_restart $binfile
+gdb_test_no_output "source ${remote_python_file}" "load python file"
+
+# Tell the Python unwinder to use the frame-id we cached above.
+gdb_test_no_output "python set_break_bt_here_frame_id($pc, $cfa)"
+
+# Run up to the function which the unwinder will claim.
+if { ![runto "break_bt_here"] } {
+    return
+}
+
+# Print the backtrace.  Check that the reason for stopping the
+# backtrace is that the previous $pc is not available.
+gdb_test "bt" \
+    [multi_line \
+	 "^#0  break_bt_here \\(\\) at \[^\r\n\]+" \
+	 "Backtrace stopped: frame did not save the PC"] \
+    "backtrace from break_bt_here function"
+
+# Ensure we can stepi.
+gdb_test "stepi" \
+    "(:?$hex\\s+)?$decimal\\s+\[^\r\n\]+" \
+    "stepi without a display in place"
+
+# Setup a 'display' directive.
+gdb_test "display/i \$pc" \
+    [multi_line \
+	 "^1: x/i \\\$pc" \
+	 "=> $hex <break_bt_here(:?\\+$decimal)?>:\\s+\[^\r\n\]+"]
+
+# Step again, check the 'display' directive is shown.
+gdb_test "stepi" \
+    [multi_line \
+	 "(:?$hex\\s+)?$decimal\\s+\[^\r\n\]+" \
+	 "1: x/i \\\$pc" \
+	 "=> $hex <break_bt_here(:?\\+$decimal)?>:\\s+\[^\r\n\]+"] \
+    "stepi with a display in place"
+
+# Continue to a function that is called from within break_bt_here.
+# The Python unwinder will then be claiming frame #1.
+gdb_breakpoint other_func
+gdb_continue_to_breakpoint "continue to other_func"
+
+# Print the backtrace and check that the reason for stopping the
+# backtrace is that the previous $pc is not available.
+gdb_test "bt" \
+    [multi_line \
+	 "#0  other_func \\(\\) at \[^\r\n\]+" \
+	 "#1  (:?$hex in )?break_bt_here \\(\\) at \[^\r\n\]+" \
+	 "Backtrace stopped: frame did not save the PC"] \
+    "backtrace from other_func function"
diff --git a/gdb/testsuite/gdb.base/pc-not-saved.py b/gdb/testsuite/gdb.base/pc-not-saved.py
new file mode 100644
index 00000000000..65a8e764885
--- /dev/null
+++ b/gdb/testsuite/gdb.base/pc-not-saved.py
@@ -0,0 +1,71 @@ 
+# Copyright (C) 2024 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+from gdb.unwinder import Unwinder, FrameId
+
+# Cached FrameId.  See set_break_bt_here_frame_id for details.
+break_bt_here_frame_id = None
+
+
+def set_break_bt_here_frame_id(pc, cfa):
+    """Call this to pre-calculate the FrameId for the frame our unwinder
+    is going to claim, this avoids us having to actually figure out a
+    frame-id within the unwinder, something which is going to be hard
+    to do in a cross-target way.
+
+    Instead we first run the test without the Python unwinder in
+    place, use 'maint print frame-id' to record the frame-id, then,
+    after loading this Python script, we all this function to record
+    the frame-id that the unwinder should use."""
+    global break_bt_here_frame_id
+    break_bt_here_frame_id = FrameId(cfa, pc)
+
+
+class break_unwinding(Unwinder):
+
+    """An unwinder for the function 'break_bt_here'.  This unwinder will
+    claim any frame for the function in question, but doesn't provide
+    any unwound register values.  Importantly, we don't provide a
+    previous $pc value, this means that if we are stopped in
+    'break_bt_here' then we should fail to unwind beyond frame #0."""
+
+    def __init__(self):
+        Unwinder.__init__(self, "break unwinding")
+
+    def __call__(self, pending_frame):
+        pc_desc = pending_frame.architecture().registers().find("pc")
+        pc = pending_frame.read_register(pc_desc)
+
+        if pc.is_optimized_out:
+            return None
+
+        block = gdb.block_for_pc(pc)
+        if block == None:
+            return None
+        func = block.function
+        if func == None:
+            return None
+        if str(func) != "break_bt_here":
+            return None
+
+        global break_bt_here_frame_id
+        if break_bt_here_frame_id is None:
+            return None
+
+        return pending_frame.create_unwind_info(break_bt_here_frame_id)
+
+
+gdb.unwinder.register_unwinder(None, break_unwinding(), True)