RFA bug fix -- x86-64 stabs and deprecated fp register

Message ID 3508.1416603484@usendtaylorx2l
State New, archived
Headers

Commit Message

David Taylor Nov. 21, 2014, 8:58 p.m. UTC
  Sometimes when using STABS on x86-64 GNU/Linux, GDB does not know which
register to use for the frame pointer and as a result offsets from the
frame pointer are treated as absolute addresses rather than as
offsets...

This patch provides a default for when the debug information doesn't
specify which register to use.

We have seen this problem when debugging problems with a previous
release of our software (I believe it was built with GCC 4.5.x, if that
matters).

There were no regressions on x86-64 GNU/Linux.

2014-11-21  David Taylor  <dtaylor@emc.com>

	* amd64-tdep.c (amd64_init_abi): Set default frame pointer.

      The preferred debugging format for all known AMD64 targets is
  

Comments

Joel Brobecker Nov. 30, 2014, 3:20 p.m. UTC | #1
David,

On Fri, Nov 21, 2014 at 03:58:04PM -0500, David Taylor wrote:
> Sometimes when using STABS on x86-64 GNU/Linux, GDB does not know which
> register to use for the frame pointer and as a result offsets from the
> frame pointer are treated as absolute addresses rather than as
> offsets...
> 
> This patch provides a default for when the debug information doesn't
> specify which register to use.
> 
> We have seen this problem when debugging problems with a previous
> release of our software (I believe it was built with GCC 4.5.x, if that
> matters).
> 
> There were no regressions on x86-64 GNU/Linux.
> 
> 2014-11-21  David Taylor  <dtaylor@emc.com>
> 
> 	* amd64-tdep.c (amd64_init_abi): Set default frame pointer.

I don't think your patch is correct, as it is going to affect
the outcome of...

    (gdb) print $fp

... for frameless functions compiled with DWARF2. See
std-regs.c:value_of_builtin_frame_fp_reg.

If we were to apply your patch, it would unconditionally print
the contents of the %rbp register as the result, which is not
what we've been printing so far.

I wish I could suggest another approach but I haven't had to touch
stabs support in several years, now. If you provided a bit more details
as to what actually happened in the debugger and where, I might be
able to help you better.

> diff --git a/gdb/amd64-tdep.c b/gdb/amd64-tdep.c
> index e69da01..5a68c33 100644
> --- a/gdb/amd64-tdep.c
> +++ b/gdb/amd64-tdep.c
> @@ -3006,6 +3006,8 @@ amd64_init_abi (struct gdbarch_info info, struct gdbarch *
> gdbarch)
>    set_gdbarch_ps_regnum (gdbarch, AMD64_EFLAGS_REGNUM); /* %eflags */
>    set_gdbarch_fp0_regnum (gdbarch, AMD64_ST0_REGNUM); /* %st(0) */
>  
> +  set_gdbarch_deprecated_fp_regnum (gdbarch, AMD64_RBP_REGNUM); /* %rbp */
> +
>    /* The "default" register numbering scheme for AMD64 is referred to
>       as the "DWARF Register Number Mapping" in the System V psABI.
>       The preferred debugging format for all known AMD64 targets is
  
Mark Kettenis Nov. 30, 2014, 6:59 p.m. UTC | #2
> Date: Sun, 30 Nov 2014 19:20:36 +0400
> From: Joel Brobecker <brobecker@adacore.com>
> 
> David,
> 
> On Fri, Nov 21, 2014 at 03:58:04PM -0500, David Taylor wrote:
> > Sometimes when using STABS on x86-64 GNU/Linux, GDB does not know which
> > register to use for the frame pointer and as a result offsets from the
> > frame pointer are treated as absolute addresses rather than as
> > offsets...
> > 
> > This patch provides a default for when the debug information doesn't
> > specify which register to use.
> > 
> > We have seen this problem when debugging problems with a previous
> > release of our software (I believe it was built with GCC 4.5.x, if that
> > matters).
> > 
> > There were no regressions on x86-64 GNU/Linux.
> > 
> > 2014-11-21  David Taylor  <dtaylor@emc.com>
> > 
> > 	* amd64-tdep.c (amd64_init_abi): Set default frame pointer.
> 
> I don't think your patch is correct, as it is going to affect
> the outcome of...
> 
>     (gdb) print $fp
> 
> ... for frameless functions compiled with DWARF2. See
> std-regs.c:value_of_builtin_frame_fp_reg.
> 
> If we were to apply your patch, it would unconditionally print
> the contents of the %rbp register as the result, which is not
> what we've been printing so far.
> 
> I wish I could suggest another approach but I haven't had to touch
> stabs support in several years, now. If you provided a bit more details
> as to what actually happened in the debugger and where, I might be
> able to help you better.

But any effort is much better spent on switching to DWARF2.  STABS
support ws already considered obsolete when AMD64 support was added to
GDB more than a decade ago.
  
David Taylor Dec. 1, 2014, 4:30 p.m. UTC | #3
Joel Brobecker <brobecker@adacore.com> wrote:

> David,
> 
> On Fri, Nov 21, 2014 at 03:58:04PM -0500, David Taylor wrote:
> > Sometimes when using STABS on x86-64 GNU/Linux, GDB does not know which
> > register to use for the frame pointer and as a result offsets from the
> > frame pointer are treated as absolute addresses rather than as
> > offsets...
> > 
> > This patch provides a default for when the debug information doesn't
> > specify which register to use.
> > 
> > We have seen this problem when debugging problems with a previous
> > release of our software (I believe it was built with GCC 4.5.x, if that
> > matters).
> > 
> > There were no regressions on x86-64 GNU/Linux.
> > 
> > 2014-11-21  David Taylor  <dtaylor@emc.com>
> > 
> > 	* amd64-tdep.c (amd64_init_abi): Set default frame pointer.
> 
> I don't think your patch is correct, as it is going to affect
> the outcome of...
> 
>     (gdb) print $fp
> 
> ... for frameless functions compiled with DWARF2. See
> std-regs.c:value_of_builtin_frame_fp_reg.
> 
> If we were to apply your patch, it would unconditionally print
> the contents of the %rbp register as the result, which is not
> what we've been printing so far.

For a frameless function, I would expect that $fp would be either
<unavailable> or would be the frame pointer for the most recent function
that had a frame or would just be %rbp.

Since a frameless function has no frame, I would consider any of the
three to be reasonable.  I would also argue that it is an error to ask
for the frame pointer of a frameless function, but that an error should
not be thrown as the user shouldn't be expected to know that the
function is frameless.

But, my case is different from

    print $fp

.  It's more like:

    print foo

where the debug information says foo lives at $fp - 24.

There are functions where the debug information (STABS) specifies an
offset from the frame pointer, but does not specify which register is
the frame pointer.  GDB currently decides to not use any register.  It
treats the offset as an offset from zero -- put another way, it treats
it as an absolute address.

Trying to find a local function variable at address -32 or -28 or -24
does not work.  It is not mapped and even if it was, it wouldn't be the
right address.  Now, $rbp - 32 or $rbp - 28 is another matter entirely.

GDB is currently saying ``I've been asked for the value of $fp, but I
don't know which register is the frame pointer, so I don't have a value
for $fp, and I'll just use 0''.

The one line fix fixed real problems we had with STABS on our legacy
systems.  And has caused no problems with DWARF on our current systems.

> I wish I could suggest another approach but I haven't had to touch
> stabs support in several years, now. If you provided a bit more details
> as to what actually happened in the debugger and where, I might be
> able to help you better.

When I first posted this bug report and fix months ago, it drew zero
responses.  (I noticed recently that it wasn't in so I sent another
message.)  At that time I could have told you more.

I remember the symptom but not the details.  For selected functions if
you typed:

    print <name_local_variable>

you would get an error because GDB thought that the variable had an
address of -32 or -28 or -24 or similar (depending upon which local
variable you selected).

The ELF file was using STABS and when I debugged it, GDB said that the
variable's address was a small offset from the frame pointer.  It didn't
know what to use for the frame pointer and ended up using zero.

The fix merely decides what to do if you need a frame pointer and
haven't been able to otherwise figure out what to use for the frame
pointer -- use the traditional default frame pointer register %rbp.

If you already have a frame pointer or do not need a frame pointer,
nothing is changed.

> > diff --git a/gdb/amd64-tdep.c b/gdb/amd64-tdep.c
> > index e69da01..5a68c33 100644
> > --- a/gdb/amd64-tdep.c
> > +++ b/gdb/amd64-tdep.c
> > @@ -3006,6 +3006,8 @@ amd64_init_abi (struct gdbarch_info info, struct gdbarch *
> > gdbarch)
> >    set_gdbarch_ps_regnum (gdbarch, AMD64_EFLAGS_REGNUM); /* %eflags */
> >    set_gdbarch_fp0_regnum (gdbarch, AMD64_ST0_REGNUM); /* %st(0) */
> >  
> > +  set_gdbarch_deprecated_fp_regnum (gdbarch, AMD64_RBP_REGNUM); /* %rbp */
> > +
> >    /* The "default" register numbering scheme for AMD64 is referred to
> >       as the "DWARF Register Number Mapping" in the System V psABI.
> >       The preferred debugging format for all known AMD64 targets is
> 
> -- 
> Joel
  
David Taylor Dec. 1, 2014, 4:41 p.m. UTC | #4
Mark Kettenis <mark.kettenis@xs4all.nl> wrote:

> But any effort is much better spent on switching to DWARF2.  STABS
> support ws already considered obsolete when AMD64 support was added to
> GDB more than a decade ago.

For current systems we have earlier this year switched to DWARF, but we
will need to support STABS for legacy systems for a few years.

As far as it being obsolete, I asked about its status in January 2013 on
the gdb list.  My take away from that discussion was that it was not
deprecated nor obsolete and that it was being maintained; but, that new
development was focused on DWARF.

[As far as DWARF goes, we use DWARF 3+/4 and GCC 4.7+ as we use -g3 to
get macro debug information and want the strings to be consolidated.]
  
Joel Brobecker Dec. 3, 2014, 3:18 a.m. UTC | #5
> For current systems we have earlier this year switched to DWARF, but we
> will need to support STABS for legacy systems for a few years.

Can you say which systems? I am just curious to see what systems
out there still use stabs. Also, do you have an idea of long
you might have to continue supporting stabs (it's ok if you do not
know or do not want to say).

> As far as it being obsolete, I asked about its status in January 2013 on
> the gdb list.  My take away from that discussion was that it was not
> deprecated nor obsolete and that it was being maintained; but, that new
> development was focused on DWARF.

I think that's a fair understanding, although I don't know of any
maintainer that still uses DWARF, so maintenance is likely to
remain limited to integrating patch submissions such as yours.
  
Doug Evans Dec. 3, 2014, 3:43 a.m. UTC | #6
On Fri, Nov 21, 2014 at 12:58 PM, David Taylor <dtaylor@emc.com> wrote:
> Sometimes when using STABS on x86-64 GNU/Linux, GDB does not know which
> register to use for the frame pointer and as a result offsets from the
> frame pointer are treated as absolute addresses rather than as
> offsets...
>
> This patch provides a default for when the debug information doesn't
> specify which register to use.
>
> We have seen this problem when debugging problems with a previous
> release of our software (I believe it was built with GCC 4.5.x, if that
> matters).
>
> There were no regressions on x86-64 GNU/Linux.
>
> 2014-11-21  David Taylor  <dtaylor@emc.com>
>
>         * amd64-tdep.c (amd64_init_abi): Set default frame pointer.
>
> diff --git a/gdb/amd64-tdep.c b/gdb/amd64-tdep.c
> index e69da01..5a68c33 100644
> --- a/gdb/amd64-tdep.c
> +++ b/gdb/amd64-tdep.c
> @@ -3006,6 +3006,8 @@ amd64_init_abi (struct gdbarch_info info, struct gdbarch *
> gdbarch)
>    set_gdbarch_ps_regnum (gdbarch, AMD64_EFLAGS_REGNUM); /* %eflags */
>    set_gdbarch_fp0_regnum (gdbarch, AMD64_ST0_REGNUM); /* %st(0) */
>
> +  set_gdbarch_deprecated_fp_regnum (gdbarch, AMD64_RBP_REGNUM); /* %rbp */
> +
>    /* The "default" register numbering scheme for AMD64 is referred to
>       as the "DWARF Register Number Mapping" in the System V psABI.
>       The preferred debugging format for all known AMD64 targets is

Hi.
I haven't dug too deep to understand whether this is a good fix or not,
but I have a request.
Given the discussion, I think it's reasonable to assume a future
reader of the code will ask "Why is this code here?"
One of my pet peeves is having to spend too much time
answering that question, and while I would much rather have
such questions answered in the code, I also don't want to have
to read more than a few sentences (in general - certainly there
are cases where a paragraph or two in the code can be
invaluable).  At any rate ...
I'll let you decide whether to add something to the code or
to the commit message (or both), but at a minimum please add
a full description to the commit message that explains things.

Thanks!
  
Joel Brobecker Dec. 3, 2014, 3:50 a.m. UTC | #7
> For a frameless function, I would expect that $fp would be either
> <unavailable> or would be the frame pointer for the most recent function
> that had a frame or would just be %rbp.
> 
> Since a frameless function has no frame, I would consider any of the
> three to be reasonable.  I would also argue that it is an error to ask
> for the frame pointer of a frameless function, but that an error should
> not be thrown as the user shouldn't be expected to know that the
> function is frameless.

I don't know if it's me who used the wrong technical term, or if
the terminology is misleading, but for me, a frameless function
was meant to mean a function which does not use the frame pointer.
But, to me, functions always have a frame, even if maybe in some
case perhaps it is possible for their frame to have zero data in it.
But in any case, there is always a frame address, and that's what
$fp means to me.

> When I first posted this bug report and fix months ago, it drew zero
> responses.  (I noticed recently that it wasn't in so I sent another
> message.)  At that time I could have told you more.

Sorry about that, but I personally have been the busiest I have ever
been at work for the last few months. And I have a feeling that others
have been too. The only tip I can offer to prevent this from happening
is to ping us. Typical protocol we recommend is 2 weeks after submission
and every week thereafter. It's sad that it has to be this way, but
we have to remember that we are all volunteers trying to do their best.
At the moment, I am taking time away from work just to research further
and answer you.

> There are functions where the debug information (STABS) specifies an
> offset from the frame pointer, but does not specify which register is
> the frame pointer.  GDB currently decides to not use any register.  It
> treats the offset as an offset from zero -- put another way, it treats
> it as an absolute address.
[...]
> I remember the symptom but not the details.  For selected functions if
> you typed:
> 
>     print <name_local_variable>
> 
> you would get an error because GDB thought that the variable had an
> address of -32 or -28 or -24 or similar (depending upon which local
> variable you selected).
> 
> The ELF file was using STABS and when I debugged it, GDB said that the
> variable's address was a small offset from the frame pointer.  It didn't
> know what to use for the frame pointer and ended up using zero.
> 
> The fix merely decides what to do if you need a frame pointer and
> haven't been able to otherwise figure out what to use for the frame
> pointer -- use the traditional default frame pointer register %rbp.
> 
> If you already have a frame pointer or do not need a frame pointer,
> nothing is changed.

It's actually not a question of having a register or no register.
We were able to make this work without your patch before, and I just
tested again today and it worked for me.

I invite you to first take a look at findvar.c:default_read_var_value,
which eventually calls frame.c:get_frame_locals_address. This function
then "sniff"es the code to determine which base unwinder to use. And
normally in your case, it should probably find amd64-tdep.c:amd64_frame_base.
That's the vector which is going to use to compute the "frame_id"
object corresponding to your function's frame. And of particular
interest in your case is the "this_local" field which is expected
to query the frame to return the frame's base address. In your case,
it should probably be amd64_frame_base_address which, slightly
simplified, returns the frame->cache->base. And going back further,
you'll find that the frame cache is computed by amd64_frame_cache,
which is a wrapper around amd64_frame_cache_1, which uses prologue
parsing to determine the frame's base address. I am guessing that
this is where you need to look.
  
Doug Evans Dec. 3, 2014, 5:31 p.m. UTC | #8
On Tue, Dec 2, 2014 at 7:50 PM, Joel Brobecker <brobecker@adacore.com> wrote:
>> When I first posted this bug report and fix months ago, it drew zero
>> responses.  (I noticed recently that it wasn't in so I sent another
>> message.)  At that time I could have told you more.
>
> Sorry about that, but I personally have been the busiest I have ever
> been at work for the last few months. And I have a feeling that others
> have been too.

Yep! :-)

> The only tip I can offer to prevent this from happening
> is to ping us. Typical protocol we recommend is 2 weeks after submission

s/2 weeks/1 week/, right?
The ContributionChecklist page in the wiki says
"Please ping and keep pinging the patch on a weekly basis."
https://sourceware.org/gdb/wiki/ContributionChecklist

Also, pings can get buried, at least in my inbox, as all I scan is the
subject line.
We could add to the protocol the inclusion of "[PING]" in the subject
line or some such.
That will raise the visibility to me anyway.

> and every week thereafter. It's sad that it has to be this way, but
> we have to remember that we are all volunteers trying to do their best.
> At the moment, I am taking time away from work just to research further
> and answer you.

Yeah.  Anyone should feel free to ping once per week.
That still doesn't guarantee a timely response, but at least it'll
raise visibility.
  
Joel Brobecker Dec. 4, 2014, 3:20 a.m. UTC | #9
> > The only tip I can offer to prevent this from happening
> > is to ping us. Typical protocol we recommend is 2 weeks after submission
> 
> s/2 weeks/1 week/, right?
> The ContributionChecklist page in the wiki says
> "Please ping and keep pinging the patch on a weekly basis."
> https://sourceware.org/gdb/wiki/ContributionChecklist

I personally feel that 1 week after patch submission is a bit fast
for the first ping, but if that's the consensus, then I stand
corrected. Thanks for pointing this out!
  
Mark Kettenis Dec. 5, 2014, 1:38 p.m. UTC | #10
> Date: Mon, 01 Dec 2014 11:30:30 -0500
> From: David Taylor <dtaylor@usendtaylorx2l.lss.emc.com>
> 
> Joel Brobecker <brobecker@adacore.com> wrote:
> 
> > David,
> > 
> > On Fri, Nov 21, 2014 at 03:58:04PM -0500, David Taylor wrote:
> > > Sometimes when using STABS on x86-64 GNU/Linux, GDB does not know which
> > > register to use for the frame pointer and as a result offsets from the
> > > frame pointer are treated as absolute addresses rather than as
> > > offsets...
> > > 
> > > This patch provides a default for when the debug information doesn't
> > > specify which register to use.
> > > 
> > > We have seen this problem when debugging problems with a previous
> > > release of our software (I believe it was built with GCC 4.5.x, if that
> > > matters).
> > > 
> > > There were no regressions on x86-64 GNU/Linux.
> > > 
> > > 2014-11-21  David Taylor  <dtaylor@emc.com>
> > > 
> > > 	* amd64-tdep.c (amd64_init_abi): Set default frame pointer.
> > 
> > I don't think your patch is correct, as it is going to affect
> > the outcome of...
> > 
> >     (gdb) print $fp
> > 
> > ... for frameless functions compiled with DWARF2. See
> > std-regs.c:value_of_builtin_frame_fp_reg.
> > 
> > If we were to apply your patch, it would unconditionally print
> > the contents of the %rbp register as the result, which is not
> > what we've been printing so far.
> 
> For a frameless function, I would expect that $fp would be either
> <unavailable> or would be the frame pointer for the most recent function
> that had a frame or would just be %rbp.

We deprecated the concept of a frame pointer "register" more than a
decade ago.  There really isn't such a register on many architectures.
On amd64 %rbp is available for the compiler to use for whatever
purpose it sees fit.  Reintroducing a real frame pointer register in
the amd64-tdep.c code would be a mistake.  The
set_gdbarch_deprecated_fp_regnum() function has "deprecated" in its
name for a reason.

> Since a frameless function has no frame, I would consider any of the
> three to be reasonable.  I would also argue that it is an error to ask
> for the frame pointer of a frameless function, but that an error should
> not be thrown as the user shouldn't be expected to know that the
> function is frameless.

In GDB there always is a frame.  A frameless function is a function
that doesn't set up its own frame but re-uses the frame set up by its
parent.  So that matches "would be the frame pointer for the most
recent function that had a frame".

> But, my case is different from
> 
>     print $fp
> 
> .  It's more like:
> 
>     print foo
> 
> where the debug information says foo lives at $fp - 24.

The question here is how your debug format actually defines $fp here.
For DWARF2 this is reasonably well defined.  It has the concept of a
call frame address CFA and provides enough information to reconstruct
that address.  For STABS this isn't well defined.  Hopefully it
matches GDB's idea of "frame base", which is what $fp represents.

Few of us have really looked at the STABS code in GDB in the last
decae.  The mainstream Open Source world moved to DWARF2 long ago, and
there is a substantial amount of bitrot in the STABS-related code,
both on the producer (GCC) and consumer (GDB) side.  But judging from
what you write here:

> There are functions where the debug information (STABS) specifies an
> offset from the frame pointer, but does not specify which register is
> the frame pointer.  GDB currently decides to not use any register.  It
> treats the offset as an offset from zero -- put another way, it treats
> it as an absolute address.

This seems to suggest that the real bug is in GDB's STABS code.  It
should probably encode the address of your variable as being
LOC_LOCAL, in which case the address would be calculated as relative
to the "frame locals" address which for most architectures (including
amd64) is the same as "frame base".

...

> When I first posted this bug report and fix months ago, it drew zero
> responses.  (I noticed recently that it wasn't in so I sent another
> message.)  At that time I could have told you more.

The problem here is that by choosing to use STABS on amd64 for your
platform you're now so far behind the curve that there's basically
nobody around with enough knowledge to help you out.  All I can say to
you is that your suggested fix isn't the right approach.
  

Patch

diff --git a/gdb/amd64-tdep.c b/gdb/amd64-tdep.c
index e69da01..5a68c33 100644
--- a/gdb/amd64-tdep.c
+++ b/gdb/amd64-tdep.c
@@ -3006,6 +3006,8 @@  amd64_init_abi (struct gdbarch_info info, struct gdbarch *
gdbarch)
   set_gdbarch_ps_regnum (gdbarch, AMD64_EFLAGS_REGNUM); /* %eflags */
   set_gdbarch_fp0_regnum (gdbarch, AMD64_ST0_REGNUM); /* %st(0) */
 
+  set_gdbarch_deprecated_fp_regnum (gdbarch, AMD64_RBP_REGNUM); /* %rbp */
+
   /* The "default" register numbering scheme for AMD64 is referred to
      as the "DWARF Register Number Mapping" in the System V psABI.