Improving GDB's mechanism to check if function is GC'ed

Message ID 556DB1BB.50601@codesourcery.com
State New, archived
Headers

Commit Message

Taimoor Mirza June 2, 2015, 1:38 p.m. UTC
  Hi,

GDB currently uses following mechanism to check if function is GC'ed by 
the linker:
For any function whose address is 0x0, if 'textlow' field of partial 
symbol table is not zero, function is considered to be GC'ed by the 
linker. Below is the code doing this:

case DW_LNE_set_address:
   address = read_address (abfd, line_ptr, cu, &bytes_read);

   /* If address < lowpc then it's not a usable value, it's
      outside the pc range of the CU.  However, we restrict
      the test to only address values of zero to preserve
      GDB's previous behaviour which is to handle the specific
      case of a function being GC'd by the linker.  */
   if (address == 0 && address < lowpc)
     {
       /* This line table is for a function which has been
      GCd by the linker.  Ignore it.  PR gdb/12528 */


This change was done in 
https://sourceware.org/ml/gdb-patches/2014-08/msg00468.html

This does not work for cases where symbols are manually loaded using 
add-symbol-file command. For any incrementally loaded objfile whose 
symbols are added using add-symbol-file command can have function at 0x0 
in debug info and can have its lowpc non-zero because of add-symbol-file 
command that allows user to provide section addresses.

Current Problem
===============

We are currently using GDB to debug Nucleus based bare-metal system that 
also allows to dynamically load and unload Nucleus process modules 
during system execution.
We currently load symbols of a modules using add-symbol-file whenever a 
module is loaded at runtime. It is very common to have functions at 
address 0x0 in debug information and then lowpc in symbol table to be 
non-zero as it depends on section addresses given in add-symbol-file 
command.

With above mentioned GC'ing mechanism, GDB assumes that all these 
functions are GC'ed by linker. Because of this breakpoints do not work 
properly in debug session.

Possible Solution
=================

* Modify GC checking mechanism to mark any function GC'ed using above 
mentioned mechanism only if objfile is not dynamically loaded. So, for 
any function with address 0x0, it'll be marked GC'ed only if lowpc is 
not zero and objfile is main symbol file.

For this I have made following modifications in if condition:

if (address == 0 && address < lowpc
     && (objfile->flags & OBJF_MAINLINE))
     {

I have regression tested this change and it seems to work fine.
Only downside that it is possible (though not common) to load main 
symbol file using add-symbol-file command. In that case, GDB will not 
check for GC'ed functions.

Attached is patch to better highlight this solution.

I am open to any other suggestions to improve this GC'ing mechanism and 
solving this problem.

Thanks,
Taimoor
  

Comments

Taimoor Mirza June 9, 2015, 6:16 a.m. UTC | #1
ping.


Yao,

As you implemented current GC'ing mechanism, Can you kindly take a look 
at below mentioned problem and proposed solution.

Thanks,
Taimoor

On 06/02/2015 06:38 PM, Taimoor wrote:
> Hi,
>
> GDB currently uses following mechanism to check if function is GC'ed by
> the linker:
> For any function whose address is 0x0, if 'textlow' field of partial
> symbol table is not zero, function is considered to be GC'ed by the
> linker. Below is the code doing this:
>
> case DW_LNE_set_address:
>    address = read_address (abfd, line_ptr, cu, &bytes_read);
>
>    /* If address < lowpc then it's not a usable value, it's
>       outside the pc range of the CU.  However, we restrict
>       the test to only address values of zero to preserve
>       GDB's previous behaviour which is to handle the specific
>       case of a function being GC'd by the linker.  */
>    if (address == 0 && address < lowpc)
>      {
>        /* This line table is for a function which has been
>       GCd by the linker.  Ignore it.  PR gdb/12528 */
>
>
> This change was done in
> https://sourceware.org/ml/gdb-patches/2014-08/msg00468.html
>
> This does not work for cases where symbols are manually loaded using
> add-symbol-file command. For any incrementally loaded objfile whose
> symbols are added using add-symbol-file command can have function at 0x0
> in debug info and can have its lowpc non-zero because of add-symbol-file
> command that allows user to provide section addresses.
>
> Current Problem
> ===============
>
> We are currently using GDB to debug Nucleus based bare-metal system that
> also allows to dynamically load and unload Nucleus process modules
> during system execution.
> We currently load symbols of a modules using add-symbol-file whenever a
> module is loaded at runtime. It is very common to have functions at
> address 0x0 in debug information and then lowpc in symbol table to be
> non-zero as it depends on section addresses given in add-symbol-file
> command.
>
> With above mentioned GC'ing mechanism, GDB assumes that all these
> functions are GC'ed by linker. Because of this breakpoints do not work
> properly in debug session.
>
> Possible Solution
> =================
>
> * Modify GC checking mechanism to mark any function GC'ed using above
> mentioned mechanism only if objfile is not dynamically loaded. So, for
> any function with address 0x0, it'll be marked GC'ed only if lowpc is
> not zero and objfile is main symbol file.
>
> For this I have made following modifications in if condition:
>
> if (address == 0 && address < lowpc
>      && (objfile->flags & OBJF_MAINLINE))
>      {
>
> I have regression tested this change and it seems to work fine.
> Only downside that it is possible (though not common) to load main
> symbol file using add-symbol-file command. In that case, GDB will not
> check for GC'ed functions.
>
> Attached is patch to better highlight this solution.
>
> I am open to any other suggestions to improve this GC'ing mechanism and
> solving this problem.
>
> Thanks,
> Taimoor
  
Yao Qi June 9, 2015, 1:36 p.m. UTC | #2
Hi Taimoor,
Sorry for the delayed reply.

On 09/06/15 07:16, Taimoor wrote:
> As you implemented current GC'ing mechanism, Can you kindly take a look
> at below mentioned problem and proposed solution.

I'll take a look next week when I am back to office, if nobody
review this patch in this period.  I am in a travel for Linaro Toolchain
Sprint this week.
  
Yao Qi June 10, 2015, 8:53 a.m. UTC | #3
Taimoor <tmirza@codesourcery.com> writes:

Hi Taimoor,
I happen to have to some time today to read your patch, here are my
comments below,

>
> Current Problem
> ===============
>
> We are currently using GDB to debug Nucleus based bare-metal system
> that also allows to dynamically load and unload Nucleus process
> modules during system execution.
> We currently load symbols of a modules using add-symbol-file whenever
> a module is loaded at runtime. It is very common to have functions at
> address 0x0 in debug information and then lowpc in symbol table to be
> non-zero as it depends on section addresses given in add-symbol-file
> command.

GDB just uses some heuristics to determine whether the function is GC'ed
by linker, so they may not be perfect.  However, GDB doesn't support
Nucleus, so it isn't a valid case to me.  Do we have other cases that we
add-symbol-file in which function address is at 0x0 on platforms GDB
supports?

If the problem only exists on Nucleus, I am afraid I don't agree with
accepting this change, because GDB doesn't support Nucleus.  Sorry.

>
> Possible Solution
> =================
>
> * Modify GC checking mechanism to mark any function GC'ed using above
> mentioned mechanism only if objfile is not dynamically loaded. So, for
> any function with address 0x0, it'll be marked GC'ed only if lowpc is
> not zero and objfile is main symbol file.
>
> For this I have made following modifications in if condition:
>
> if (address == 0 && address < lowpc
>     && (objfile->flags & OBJF_MAINLINE))
>     {
>
> I have regression tested this change and it seems to work fine.
> Only downside that it is possible (though not common) to load main
> symbol file using add-symbol-file command. In that case, GDB will not
> check for GC'ed functions.

In this way, GDB doesn't ignore the GC'ed function from the dynamically
loaded objects, which is a regression to me.
  
Pedro Alves June 10, 2015, 10:17 a.m. UTC | #4
On 06/10/2015 09:53 AM, Yao Qi wrote:

> If the problem only exists on Nucleus, I am afraid I don't agree with
> accepting this change, because GDB doesn't support Nucleus.  Sorry.

Hmm, does it really need to, though?  We expose mechanisms like
add-symbol-file, xml library list with qXfer:libraries:read (the default
solib provider), xml target descriptions, "info os", etc., exactly so
that GDB doesn't have to learn about the myriad of random RTOS's out there.

That said, I don't really understand the patch.  How can you have
real code at address 0, but then _not_ have address 0 covered
by a section?

Thanks,
Pedro Alves
  
Luis Machado June 10, 2015, 11:18 a.m. UTC | #5
On 06/10/2015 05:53 AM, Yao Qi wrote:
> Taimoor <tmirza@codesourcery.com> writes:
>
> Hi Taimoor,
> I happen to have to some time today to read your patch, here are my
> comments below,
>
>>
>> Current Problem
>> ===============
>>
>> We are currently using GDB to debug Nucleus based bare-metal system
>> that also allows to dynamically load and unload Nucleus process
>> modules during system execution.
>> We currently load symbols of a modules using add-symbol-file whenever
>> a module is loaded at runtime. It is very common to have functions at
>> address 0x0 in debug information and then lowpc in symbol table to be
>> non-zero as it depends on section addresses given in add-symbol-file
>> command.
>
> GDB just uses some heuristics to determine whether the function is GC'ed
> by linker, so they may not be perfect.  However, GDB doesn't support
> Nucleus, so it isn't a valid case to me.  Do we have other cases that we
> add-symbol-file in which function address is at 0x0 on platforms GDB
> supports?
>
> If the problem only exists on Nucleus, I am afraid I don't agree with
> accepting this change, because GDB doesn't support Nucleus.  Sorry.

add-symbol-file can cause things to get weird with addresses given the 
user can specify the base address as it pleases. I don't think this is 
Nucleus-specific at all, but more generally bare-metal-specific.

I take it DWARF says 0x0 and GDB relocates the symbol file/addresses 
based on the provided base address? Taimoor?
  
Taimoor Mirza June 10, 2015, 11:45 a.m. UTC | #6
On 06/10/2015 04:18 PM, Luis Machado wrote:
> On 06/10/2015 05:53 AM, Yao Qi wrote:
>> Taimoor <tmirza@codesourcery.com> writes:
>>
>> Hi Taimoor,
>> I happen to have to some time today to read your patch, here are my
>> comments below,
>>
>>>
>>> Current Problem
>>> ===============
>>>
>>> We are currently using GDB to debug Nucleus based bare-metal system
>>> that also allows to dynamically load and unload Nucleus process
>>> modules during system execution.
>>> We currently load symbols of a modules using add-symbol-file whenever
>>> a module is loaded at runtime. It is very common to have functions at
>>> address 0x0 in debug information and then lowpc in symbol table to be
>>> non-zero as it depends on section addresses given in add-symbol-file
>>> command.
>>
>> GDB just uses some heuristics to determine whether the function is GC'ed
>> by linker, so they may not be perfect.  However, GDB doesn't support
>> Nucleus, so it isn't a valid case to me.  Do we have other cases that we
>> add-symbol-file in which function address is at 0x0 on platforms GDB
>> supports?
>>
>> If the problem only exists on Nucleus, I am afraid I don't agree with
>> accepting this change, because GDB doesn't support Nucleus.  Sorry.
>
> add-symbol-file can cause things to get weird with addresses given the
> user can specify the base address as it pleases. I don't think this is
> Nucleus-specific at all, but more generally bare-metal-specific.
>
> I take it DWARF says 0x0 and GDB relocates the symbol file/addresses
> based on the provided base address? Taimoor?

Yes. Its not something specific to Nucleus. Only issue is that function 
address in DWARF is 0x0 as this object file is loaded at runtime and its 
symbols are added using "add-symbol-file" command on the basis of 
address space where it is loaded.

IMO, for any dynamic relocatable object, GDB provides a mechanism to 
load its symbols using add-symbol-file. But if that object file has 
function at 0x0, current mechanism consider it GC'ed.

Thanks,
Taimoor
  
Taimoor Mirza June 10, 2015, 12:04 p.m. UTC | #7
On 06/10/2015 03:17 PM, Pedro Alves wrote:
> Hmm, does it really need to, though?  We expose mechanisms like
> add-symbol-file, xml library list with qXfer:libraries:read (the default
> solib provider), xml target descriptions, "info os", etc., exactly so
> that GDB doesn't have to learn about the myriad of random RTOS's out there.
>
> That said, I don't really understand the patch.  How can you have
> real code at address 0, but then_not_  have address 0 covered
> by a section?

This code is a relocatable object that is dynamically loaded by dynamic 
loader that is part of loaded image. So you load your main kernel image 
but you can load/unload relocatable objects to add further enhancements 
to your main kernel image. This allows RTOS application developers to 
add new features or provide bug fixes in their applications without 
rebuilding and reflashing everything.

We currently use add-symbol-file for these relocatable objects. Previous 
GC'ing mechanism of GDB was working fine for us as it was using flag 
has_section_at_zero to determine whether
address zero in debug info means the corresponding code has been
GC'ed. But this mechanism was changed in 
https://sourceware.org/ml/gdb-patches/2014-08/msg00468.html and now 
functions at address 0x0 in dwarf of relocatable object are marked GC'ed 
by GDB.

Thanks,
Taimoor
  
Yao Qi June 10, 2015, 12:06 p.m. UTC | #8
On 10/06/15 11:17, Pedro Alves wrote:
> Hmm, does it really need to, though?  We expose mechanisms like
> add-symbol-file, xml library list with qXfer:libraries:read (the default
> solib provider), xml target descriptions, "info os", etc., exactly so
> that GDB doesn't have to learn about the myriad of random RTOS's out there.

If these stuffs (add-symbol-file, xml library list, etc) works well for
the given RTOS, and GDB doesn't need to know about the RTOS, that is
fine.  However, in this case, Nucleus needs some changes in GDB c code
while GDB doesn't support Nucleus.  I can't see how this patch benefits
any targets GDB supported.  This is the reason I think this patch is
not acceptable.
  
Yao Qi June 10, 2015, 12:24 p.m. UTC | #9
On 10/06/15 12:45, Taimoor wrote:
>
> Yes. Its not something specific to Nucleus. Only issue is that function
> address in DWARF is 0x0 as this object file is loaded at runtime and its
> symbols are added using "add-symbol-file" command on the basis of
> address space where it is loaded.
>
> IMO, for any dynamic relocatable object, GDB provides a mechanism to
> load its symbols using add-symbol-file. But if that object file has
> function at 0x0, current mechanism consider it GC'ed.

Could you help me to understand how to reproduce this problem on any GDB
supported targets?  Your fix may not be Nucleus specific, but I think
the problem does only exist on Nucleus, please correct me if I am wrong.
  
Pedro Alves June 10, 2015, 2:43 p.m. UTC | #10
On 06/10/2015 01:06 PM, Yao Qi wrote:
> On 10/06/15 11:17, Pedro Alves wrote:
>> Hmm, does it really need to, though?  We expose mechanisms like
>> add-symbol-file, xml library list with qXfer:libraries:read (the default
>> solib provider), xml target descriptions, "info os", etc., exactly so
>> that GDB doesn't have to learn about the myriad of random RTOS's out there.
> 
> If these stuffs (add-symbol-file, xml library list, etc) works well for
> the given RTOS, and GDB doesn't need to know about the RTOS, that is
> fine.  However, in this case, Nucleus needs some changes in GDB c code
> while GDB doesn't support Nucleus.  I can't see how this patch benefits
> any targets GDB supported.  This is the reason I think this patch is
> not acceptable.

This strikes me as an odd position, given the whole point of those commands
and features is letting GDB support the target without builtin knowledge of
them, so it's natural that GDB didn't have built-in support for the target
thus far.  But now we broke one of the mechanisms for some use case.
Put another way, if we added support for --target=$cpu-nucleus, just as a
configure alias for --target=$cpu-elf, so that we could say we supported
Nucleus, how would we go about fixing this?

I think we need to look and understand _why_ Nucleous' binaries trigger
the problem.  If they're standard elf, it just looks to me that Nucleous
is a red herring.

IIUC, this triggers on use of add-symbol-file with relocatable
objects, when there's real code at address 0.  I think that's the angle
we should look at things.

I'd suspect that things are broken too when targets list relocatable objects
in the qXfer:libraries:read list (in which case the target will list "section"
elements instead of "segment" elements for each "library" element), and that's
definitely meant to work.  ("If the target supports dynamic linking of a
relocatable object file, its library XML element should instead include
a list of allocated sections.")

Thanks,
Pedro Alves
  
Yao Qi June 11, 2015, 8:30 a.m. UTC | #11
On 10/06/15 15:43, Pedro Alves wrote:
> This strikes me as an odd position, given the whole point of those commands
> and features is letting GDB support the target without builtin knowledge of
> them, so it's natural that GDB didn't have built-in support for the target
> thus far.  But now we broke one of the mechanisms for some use case.
> Put another way, if we added support for --target=$cpu-nucleus, just as a
> configure alias for --target=$cpu-elf, so that we could say we supported
> Nucleus, how would we go about fixing this?

Even Nucleus is supported in GDB, I don't know how to fix this problem
either.  If this problem only exist on Nucleus, do we need to fix this
problem?

>
> I think we need to look and understand_why_  Nucleous' binaries trigger
> the problem.  If they're standard elf, it just looks to me that Nucleous
> is a red herring.

I agree that we need look deep and understand why Nucleus binaries
trigger this problem, to see whether it is Nucleus specific or not.

>
> IIUC, this triggers on use of add-symbol-file with relocatable
> objects, when there's real code at address 0.  I think that's the angle
> we should look at things.
>

Yes, that is the problem.  GDB thinks the function is GC'ed by linker
but in fact it isn't.
  
Pedro Alves June 11, 2015, 8:54 a.m. UTC | #12
On 06/11/2015 09:30 AM, Yao Qi wrote:
>> >
>> > IIUC, this triggers on use of add-symbol-file with relocatable
>> > objects, when there's real code at address 0.  I think that's the angle
>> > we should look at things.
>> >
> Yes, that is the problem.  GDB thinks the function is GC'ed by linker
> but in fact it isn't.

The key point here is _relocatable_ objects.  These are special in
that they aren't fully linked, so they have relocations in the debug info,
and their sections on file start at address zero.  See default_symfile_offsets.

I still don't understand how lowpc doesn't end up as 0 in the case
at hand, meaning there's really code at 0.

Or does lowpc really end up as 0, but the real issue is that "lowpc==0"
is ambiguous, and we need to track a separate "have pc range" flag?

Taimoor, I'd expect you're passing a "0" to add-symbol-file, like:

  (gdb) add-symbol-file ... -s .some_section 0 ...

I think it'd help to see an example invocation of the command.

I'd think it possible to expand gdb.base/relocate.exp (or create
a new test based on it) to cover this scenario everywhere.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index f6b0c01..4f84b40 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -17665,7 +17665,8 @@  dwarf_decode_lines_1 (struct line_header *lh, struct dwarf2_cu *cu,
 		     the test to only address values of zero to preserve
 		     GDB's previous behaviour which is to handle the specific
 		     case of a function being GC'd by the linker.  */
-		  if (address == 0 && address < lowpc)
+		  if (address == 0 && address < lowpc
+		      && (objfile->flags & OBJF_MAINLINE))
 		    {
 		      /* This line table is for a function which has been
 			 GCd by the linker.  Ignore it.  PR gdb/12528 */