Patchwork Relax ARM prologue unwinder assumption

login
register
mail settings
Submitter Luis Machado
Date Feb. 5, 2015, 7:06 p.m.
Message ID <54D3BF52.5070709@codesourcery.com>
Download mbox | patch
Permalink /patch/4928/
State New
Headers show

Comments

Luis Machado - Feb. 5, 2015, 7:06 p.m.
Just recently i ran into a problem with a bare metal ARM target where 
GDB would not allow some registers to be changed after the PC was 
manually set to some other value.

In reality the target had just started and one of its cores, out of 
four, was running the program, but the other ones were in some random state.

The PC of one of the other 3 cores was then adjusted to point to a known 
function address.

GDB's reaction to this change is to invalidate the regcache and frame 
and build a brand new chain and cache, while trying to retain the 
previously selected frame - #0 in this case.

What i noticed is that GDB was selecting frame #1 instead of #0 due to 
unfortunate coincidences with both frames' SP's being 0. And we can't 
modify some registers on non-innermost frames for obvious reasons.

Here's a brief log of what happens:


[Switching to thread 2 (Thread 2)]
#0  0x0093ff10 in ?? ()
(gdb) set $pc=functioncore2
(gdb) bt
#0  functioncore2 () at test.c:32
#1  0x0000fc44 in ?? ()
(gdb) frame
#1  0x0000fc44 in ?? ()
(gdb) set $sp=0x2030
Attempt to assign to an unmodifiable value.

I tracked this problem down to this old 
(https://sourceware.org/ml/gdb-patches/2003-08/msg00526.html) piece of 
code in arm-tdep.c:arm_prologue_this_id:

   /* If we've hit a wall, stop.  */
   if (cache->prev_sp == 0)
     return;

Due to the SP being 0 for this specific core, GDB returns early and does 
not set the frame's PC to the new value. That means we have a frame with 
PC==0x0 and SP==0x0, which ends up matching frame #1 in our case. But 
this is obviously wrong.

I looked up the patch that introduced this chunk of code and did not 
find any reasonable explanation for this check. Though it may make sense 
for non-bare metal targets, a bare-metal board attached to a probe can 
be stopped at any random place, so we should be able to set its 
registers freely without worrying about unwinding assumptions. This is 
generic code after all. I understand a valid workaround is to make sure 
the proper frame is selected, but GDB should be smart enough not to 
confuse things. Therefore this "wall" described in the comment seems too 
strict to be in generic code.

The attached patch removes this restriction and does not cause any 
regressions for ARM bare metal, but i'd like feedback.
Pedro Alves - Feb. 6, 2015, 3:25 p.m.
On 02/05/2015 07:06 PM, Luis Machado wrote:
> Just recently i ran into a problem with a bare metal ARM target where 
> GDB would not allow some registers to be changed after the PC was 
> manually set to some other value.
> 
> In reality the target had just started and one of its cores, out of 
> four, was running the program, but the other ones were in some random state.
> 
> The PC of one of the other 3 cores was then adjusted to point to a known 
> function address.
> 
> GDB's reaction to this change is to invalidate the regcache and frame 
> and build a brand new chain and cache, while trying to retain the 
> previously selected frame - #0 in this case.
> 
> What i noticed is that GDB was selecting frame #1 instead of #0 due to 
> unfortunate coincidences with both frames' SP's being 0. And we can't 
> modify some registers on non-innermost frames for obvious reasons.
> 
> Here's a brief log of what happens:
> 
> 
> [Switching to thread 2 (Thread 2)]
> #0  0x0093ff10 in ?? ()
> (gdb) set $pc=functioncore2
> (gdb) bt
> #0  functioncore2 () at test.c:32
> #1  0x0000fc44 in ?? ()

So in this case, frame #0's unwinder must now be the dwarf
unwinder, right?

> (gdb) frame
> #1  0x0000fc44 in ?? ()
> (gdb) set $sp=0x2030
> Attempt to assign to an unmodifiable value.
> 
> I tracked this problem down to this old 
> (https://sourceware.org/ml/gdb-patches/2003-08/msg00526.html) piece of 
> code in arm-tdep.c:arm_prologue_this_id:
> 
>    /* If we've hit a wall, stop.  */
>    if (cache->prev_sp == 0)
>      return;
> 
> Due to the SP being 0 for this specific core, GDB returns early and does 
> not set the frame's PC to the new value. That means we have a frame with 
> PC==0x0 and SP==0x0, which ends up matching frame #1 in our case. But 
> this is obviously wrong.
> 
> I looked up the patch that introduced this chunk of code and did not 
> find any reasonable explanation for this check. Though it may make sense 
> for non-bare metal targets, a bare-metal board attached to a probe can 
> be stopped at any random place, so we should be able to set its 
> registers freely without worrying about unwinding assumptions. 

A debugger is supposedly useful for badly behaved programs too.  :-)
So I don't see how GNU/Linux would be any different.  The user may
attach to a program that can be stopped at any random place too.


> This is
> generic code after all. I understand a valid workaround is to make sure 
> the proper frame is selected, but GDB should be smart enough not to 
> confuse things. Therefore this "wall" described in the comment seems too 
> strict to be in generic code.

The "wall" refers to not being possible to unwind past this frame if
there's no stack to unwind.  So it sounds like that after your patch,
"bt" will start trying to unwind past that, while before, we'd stop,
because arm_prologue_this_id returning early means that the frame's
ID defaults to output_frame_id (see compute_frame_id).  Stop on SP==0
may be an ABI thing, I haven't checked.

> The attached patch removes this restriction and does not cause any 
> regressions for ARM bare metal, but i'd like feedback.

Nowadays we have the frame_unwind_stop_reason hook to make it possible
to have different outermost frame ids, but still have the frame claim
that its the outermost, instead of forcing all outermost frame ids
use the outer_frame_id id.  See i386_frame_unwind_stop_reason.

Sounds like ARM could add an arm_frame_unwind_stop_reason that
returns UNWIND_OUTERMOST when prev_sp is 0.

And it looks like this bit here:

  /* This is meant to halt the backtrace at "_start".  */
  pc = get_frame_pc (this_frame);
  if (pc <= gdbarch_tdep (get_frame_arch (this_frame))->lowest_pc)
    return;

can well cause a fail in the same way.

(It may also make sense to think through the cases where we are
trying to restore the selected frame, and change that code.  It may be
that it never makes sense to go from frame #0 to any other frame
number, for instance.)

Thanks,
Pedro Alves

Patch

2015-02-05  Luis Machado  <lgustavo@codesourcery.com>

	* arm-tdep.c (arm_prologue_this_id): Remove check for the stack
	pointer being 0.

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index 8e9552a..771cbeb 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -2042,10 +2042,6 @@  arm_prologue_this_id (struct frame_info *this_frame,
   if (pc <= gdbarch_tdep (get_frame_arch (this_frame))->lowest_pc)
     return;
 
-  /* If we've hit a wall, stop.  */
-  if (cache->prev_sp == 0)
-    return;
-
   /* Use function start address as part of the frame ID.  If we cannot
      identify the start address (due to missing symbol information),
      fall back to just using the current PC.  */