[RFC] fix gdb segv when objfile can't be opened

Message ID 8c08307a-94ad-92b8-9c8b-c713cad541fd@mathworks.com
State New, archived
Headers

Commit Message

Mike Gulick Oct. 19, 2017, 3:59 p.m. UTC
  I apologize for the improperly formatted patch -- I'm really struggling
to get thunderbird to behave as I want.

Here is an updated patch.  I would have sent it with git send-email, but
I could not figure out the proper way to add this preface before the
patch (without it looking like part of the commit message).

---
From 5dee04076518554e4baae864569d6f4faee9b685 Mon Sep 17 00:00:00 2001
From: Mike Gulick <mgulick@mathworks.com>
Date: Wed, 18 Oct 2017 16:04:27 -0400
Subject: [PATCH] fix gdb segv when objfile can't be opened

This fixes PR 16577.

This patch changes gdb_bfd_map_section to issue a warning rather than an
error if it is unable to read the object file, and sets the size of the
section/frame that it attempted to read to 0 on error.

The description of gdb_bfd_map_section states that it will try to read
or map the contents of the section SECT, and if successful, the section
data is returned and *SIZE is set to the size of the section data.  This
function was throwing an error and leaving *size as-is.  Setting the
section size to 0 indicates to dwarf2_build_frame_info that there is no
data to read, otherwise it will try to read from an invalid frame
pointer.

Changing the error to a warning allows this to be handled gracefully.
Additionally, the error was clobbering the breakpoint output indicating
the current frame (function name, arguments, source file, and line number).
E.g.

Thread 3 "foo" hit Breakpoint 1, BFD: reopening /tmp/jna-1013829440/jna2973250704389291330.tmp: No such file or directory

BFD: reopening /tmp/jna-1013829440/jna2973250704389291330.tmp: No such file or directory

(gdb)

While the "BFD: reopening ..." messages will still appear interspersed in the
breakpoint output, the current frame info is now displayed:

Thread 3 "foo" hit Breakpoint 1, BFD: reopening /tmp/jna-1013829440/jna1875755897659885075.tmp: No such file or directory

BFD: reopening /tmp/jna-1013829440/jna1875755897659885075.tmp: No such file or directory

warning: Can't read data for section '.eh_frame' in file '/tmp/jna-1013829440/jna1875755897659885075.tmp'
do_something () at file.cpp:80
80	{
(gdb)
---
 gdb/gdb_bfd.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)
  

Comments

Simon Marchi Oct. 19, 2017, 5:54 p.m. UTC | #1
On 2017-10-19 11:59, Mike Gulick wrote:
> I apologize for the improperly formatted patch -- I'm really struggling
> to get thunderbird to behave as I want.
> 
> Here is an updated patch.  I would have sent it with git send-email, 
> but
> I could not figure out the proper way to add this preface before the
> patch (without it looking like part of the commit message).

Hi Mike,

Thanks, I was able to apply this version correctly.

If I have a short comment that's not meant to be in the commit message, 
I usually
include it in brackets like this:

   [Re-sending this patch because the first try was not formatted 
correctly.]

If it's longer you can always end it with a line "Actual commit 
message:".  Either way, it's not really a big deal, as long as it's 
clear.  You can use the --annotate option of git-send-email to edit the 
message before sending it.

> ---
> From 5dee04076518554e4baae864569d6f4faee9b685 Mon Sep 17 00:00:00 2001
> From: Mike Gulick <mgulick@mathworks.com>
> Date: Wed, 18 Oct 2017 16:04:27 -0400
> Subject: [PATCH] fix gdb segv when objfile can't be opened
> 
> This fixes PR 16577.
> 
> This patch changes gdb_bfd_map_section to issue a warning rather than 
> an
> error if it is unable to read the object file, and sets the size of the
> section/frame that it attempted to read to 0 on error.
> 
> The description of gdb_bfd_map_section states that it will try to read
> or map the contents of the section SECT, and if successful, the section
> data is returned and *SIZE is set to the size of the section data.  
> This
> function was throwing an error and leaving *size as-is.  Setting the
> section size to 0 indicates to dwarf2_build_frame_info that there is no
> data to read, otherwise it will try to read from an invalid frame
> pointer.
> 
> Changing the error to a warning allows this to be handled gracefully.
> Additionally, the error was clobbering the breakpoint output indicating
> the current frame (function name, arguments, source file, and line 
> number).
> E.g.
> 
> Thread 3 "foo" hit Breakpoint 1, BFD: reopening
> /tmp/jna-1013829440/jna2973250704389291330.tmp: No such file or
> directory
> 
> BFD: reopening /tmp/jna-1013829440/jna2973250704389291330.tmp: No such
> file or directory

For some reason, I am not able to reproduce the crash using the 
instructions in the bug report, and gdb master.

(gdb) up
#1  0x00007ffff78d525a in sleep () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)
BFD: reopening ./badlib.so: No such file or directory

BFD: reopening ./badlib.so: No such file or directory

Can't read data for section '.eh_frame' in file './badlib.so'
(gdb)
Initial frame selected; you cannot go up.
(gdb)
Initial frame selected; you cannot go up.
(gdb)
Initial frame selected; you cannot go up.
(gdb) bt
#0  0x00007ffff78d52f0 in nanosleep () from 
/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff78d525a in sleep () from /lib/x86_64-linux-gnu/libc.so.6


Would you be able to create a test case to reproduce it?  We would need 
one to go in with the fix in the end anyway, and it's easier for 
reviewers if they can just run a test file rather than try to reproduce 
by hand.  You can start by copying an existing solib test, like 
gdb.base/solib-display.exp.  See here for more details about tests:

http://sourceware.org/gdb/wiki/TestingGDB
http://sourceware.org/gdb/wiki/GDBTestcaseCookbook

Don't hesitate to ask here or on IRC if you need assistance.

> (gdb)
> 
> While the "BFD: reopening ..." messages will still appear interspersed 
> in the
> breakpoint output, the current frame info is now displayed:
> 
> Thread 3 "foo" hit Breakpoint 1, BFD: reopening
> /tmp/jna-1013829440/jna1875755897659885075.tmp: No such file or
> directory
> 
> BFD: reopening /tmp/jna-1013829440/jna1875755897659885075.tmp: No such
> file or directory
> 
> warning: Can't read data for section '.eh_frame' in file
> '/tmp/jna-1013829440/jna1875755897659885075.tmp'
> do_something () at file.cpp:80
> 80	{
> (gdb)
> ---
>  gdb/gdb_bfd.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/gdb/gdb_bfd.c b/gdb/gdb_bfd.c
> index 29080b8..229f5ae 100644
> --- a/gdb/gdb_bfd.c
> +++ b/gdb/gdb_bfd.c
> @@ -705,9 +705,15 @@ gdb_bfd_map_section (asection *sectp, 
> bfd_size_type *size)
> 
>    data = NULL;
>    if (!bfd_get_full_section_contents (abfd, sectp, &data))
> -    error (_("Can't read data for section '%s' in file '%s'"),
> -	   bfd_get_section_name (abfd, sectp),
> -	   bfd_get_filename (abfd));
> +    {
> +      warning (_("Can't read data for section '%s' in file '%s'"),
> +	       bfd_get_section_name (abfd, sectp),
> +	       bfd_get_filename (abfd));
> +      /* Section is invalid -- set size to 0 and return NULL */
> +      descriptor->size = 0;
> +      *size = descriptor->size;
> +      return (const gdb_byte *) NULL;
> +    }
>    descriptor->data = data;
> 
>   done:

I don't know if it is really this function's responsibility to clear 
*size in case of error, or it would be the callers responsibility to 
properly check for errors.  But if the function doesn't throw anymore, 
the comment in gdb_bfd.h should be updated accordingly.

Thanks,

Simon
  
Mike Gulick Oct. 19, 2017, 7:39 p.m. UTC | #2
On 10/19/2017 01:54 PM, Simon Marchi wrote:
> On 2017-10-19 11:59, Mike Gulick wrote:
>> I apologize for the improperly formatted patch -- I'm really struggling
>> to get thunderbird to behave as I want.
>>
>> Here is an updated patch.  I would have sent it with git send-email, but
>> I could not figure out the proper way to add this preface before the
>> patch (without it looking like part of the commit message).
> 
> Hi Mike,
> 
> Thanks, I was able to apply this version correctly.
> 
> If I have a short comment that's not meant to be in the commit message, I usually
> include it in brackets like this:
> 
>   [Re-sending this patch because the first try was not formatted correctly.]
> 
> If it's longer you can always end it with a line "Actual commit message:".  Either way, it's not really a big deal, as long as it's clear.  You can use the --annotate option of git-send-email to edit the message before sending it.
> 
Thanks.

>> ---
>> From 5dee04076518554e4baae864569d6f4faee9b685 Mon Sep 17 00:00:00 2001
>> From: Mike Gulick <mgulick@mathworks.com>
>> Date: Wed, 18 Oct 2017 16:04:27 -0400
>> Subject: [PATCH] fix gdb segv when objfile can't be opened
>>
>> This fixes PR 16577.
>>
>> This patch changes gdb_bfd_map_section to issue a warning rather than an
>> error if it is unable to read the object file, and sets the size of the
>> section/frame that it attempted to read to 0 on error.
>>
>> The description of gdb_bfd_map_section states that it will try to read
>> or map the contents of the section SECT, and if successful, the section
>> data is returned and *SIZE is set to the size of the section data.  This
>> function was throwing an error and leaving *size as-is.  Setting the
>> section size to 0 indicates to dwarf2_build_frame_info that there is no
>> data to read, otherwise it will try to read from an invalid frame
>> pointer.
>>
>> Changing the error to a warning allows this to be handled gracefully.
>> Additionally, the error was clobbering the breakpoint output indicating
>> the current frame (function name, arguments, source file, and line number).
>> E.g.
>>
>> Thread 3 "foo" hit Breakpoint 1, BFD: reopening
>> /tmp/jna-1013829440/jna2973250704389291330.tmp: No such file or
>> directory
>>
>> BFD: reopening /tmp/jna-1013829440/jna2973250704389291330.tmp: No such
>> file or directory
> 
> For some reason, I am not able to reproduce the crash using the instructions in the bug report, and gdb master.
> 
> (gdb) up
> #1  0x00007ffff78d525a in sleep () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb)
> BFD: reopening ./badlib.so: No such file or directory
> 
> BFD: reopening ./badlib.so: No such file or directory
> 
> Can't read data for section '.eh_frame' in file './badlib.so'
> (gdb)
> Initial frame selected; you cannot go up.
> (gdb)
> Initial frame selected; you cannot go up.
> (gdb)
> Initial frame selected; you cannot go up.
> (gdb) bt
> #0  0x00007ffff78d52f0 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007ffff78d525a in sleep () from /lib/x86_64-linux-gnu/libc.so.6
> 
> 
> Would you be able to create a test case to reproduce it?  We would need one to go in with the fix in the end anyway, and it's easier for reviewers if they can just run a test file rather than try to reproduce by hand.  You can start by copying an existing solib test, like gdb.base/solib-display.exp.  See here for more details about tests:
> 
> http://sourceware.org/gdb/wiki/TestingGDB
> http://sourceware.org/gdb/wiki/GDBTestcaseCookbook
> 
> Don't hesitate to ask here or on IRC if you need assistance.
> 

I attached a new reproducer in the bug report that reproduces this
problem in the current git master (and more closely follows the problem
I was seeing in our large C++/Java application).  This causes the error
messages to be emitted when the breakpoint is hit, and the stepping
forward triggers the segfault.

I'll take a look at building a test case.

>> (gdb)
>>
>> While the "BFD: reopening ..." messages will still appear interspersed in the
>> breakpoint output, the current frame info is now displayed:
>>
>> Thread 3 "foo" hit Breakpoint 1, BFD: reopening
>> /tmp/jna-1013829440/jna1875755897659885075.tmp: No such file or
>> directory
>>
>> BFD: reopening /tmp/jna-1013829440/jna1875755897659885075.tmp: No such
>> file or directory
>>
>> warning: Can't read data for section '.eh_frame' in file
>> '/tmp/jna-1013829440/jna1875755897659885075.tmp'
>> do_something () at file.cpp:80
>> 80    {
>> (gdb)
>> ---
>>  gdb/gdb_bfd.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/gdb/gdb_bfd.c b/gdb/gdb_bfd.c
>> index 29080b8..229f5ae 100644
>> --- a/gdb/gdb_bfd.c
>> +++ b/gdb/gdb_bfd.c
>> @@ -705,9 +705,15 @@ gdb_bfd_map_section (asection *sectp, bfd_size_type *size)
>>
>>    data = NULL;
>>    if (!bfd_get_full_section_contents (abfd, sectp, &data))
>> -    error (_("Can't read data for section '%s' in file '%s'"),
>> -       bfd_get_section_name (abfd, sectp),
>> -       bfd_get_filename (abfd));
>> +    {
>> +      warning (_("Can't read data for section '%s' in file '%s'"),
>> +           bfd_get_section_name (abfd, sectp),
>> +           bfd_get_filename (abfd));
>> +      /* Section is invalid -- set size to 0 and return NULL */
>> +      descriptor->size = 0;
>> +      *size = descriptor->size;
>> +      return (const gdb_byte *) NULL;
>> +    }
>>    descriptor->data = data;
>>
>>   done:
> 
> I don't know if it is really this function's responsibility to clear *size in case of error, or it would be the callers responsibility to properly check for errors.  But if the function doesn't throw anymore, the comment in gdb_bfd.h should be updated accordingly.

I had trouble figuring out what the 'error' function actually does (I
couldn't find where it was defined).  When I'm stepping through GDB in
the debugger, the lines past 'error' never seem to get called.  It's
like 'error' throws an exception that is caught elsewhere.  I was also
unable to figure out why the error message isn't displayed.  The new
reproducer shows this issue.  I wasn't sure if setting *size or even
descriptor->size was the right thing to do, but it seemed reasonable to
me since the comment in gdb_bfd.h states that this function updates
*size.  There's currently only one caller for 'gdb_bfd_map_section', so
I have no problem updating *size there if that is preferred.

Thanks,
Mike
  
Simon Marchi Oct. 19, 2017, 8:09 p.m. UTC | #3
On 2017-10-19 15:39, Mike Gulick wrote:
> I had trouble figuring out what the 'error' function actually does (I
> couldn't find where it was defined).  When I'm stepping through GDB in
> the debugger, the lines past 'error' never seem to get called.  It's
> like 'error' throws an exception that is caught elsewhere.

Indeed, "error" throws an exception.  You should be able at least to 
step into the error function (although it's not particularly useful nor 
interesting).  It is defined in common/errors.c.

> I was also
> unable to figure out why the error message isn't displayed.  The new
> reproducer shows this issue.  I wasn't sure if setting *size or even
> descriptor->size was the right thing to do, but it seemed reasonable to
> me since the comment in gdb_bfd.h states that this function updates
> *size.  There's currently only one caller for 'gdb_bfd_map_section', so
> I have no problem updating *size there if that is preferred.

Actually it says "If successful, the section data is returned and *SIZE 
is set to the size of the section data;".  And this is what I would 
generally expect from functions.  Unless stated otherwise, the value of 
output parameters should be considered undefined if the function failed. 
  So I would lean towards blaming the caller for not taking enough 
precautions.  It trusts that gdb_bfd_map_section won't fail.

Simon
  
Mike Gulick Oct. 19, 2017, 10:13 p.m. UTC | #4
On 10/19/2017 04:09 PM, Simon Marchi wrote:
> Indeed, "error" throws an exception.  You should be able at least to step into the error function (although it's not particularly useful nor interesting).  It is defined in common/errors.c.
>

Thanks.  I looked at the stack when stopped on 'error', and I see that
there is a TRY/CATCH in 'frame_unwind_try_unwinder'.  That doesn't
handle the exception, and just re-throws it.  It looks like the
exception is ultimately caught in 'print_stack_frame', which explains
why the stack frame is never printed when the exception is thrown
(although it's still unclear why the message passed to 'error' isn't
printed).

Here is the top half of the stack:

#0  gdb_bfd_map_section (sectp=0x31e4eb8, size=0x31ddc58) at gdb_bfd.c:708
#1  0x0000000000628b15 in dwarf2_read_section (objfile=0x31c0be0, info=0x31ddc48) at dwarf2read.c:2553
#2  0x0000000000628e8b in dwarf2_get_section_info (objfile=0x31c0be0, sect=DWARF2_EH_FRAME, sectp=0x31ddf20, bufp=0x31ddf10, sizep=0x31ddf18) at dwarf2read.c:2634
#3  0x0000000000611616 in dwarf2_build_frame_info (objfile=0x31c0be0) at dwarf2-frame.c:2222
#4  0x0000000000610262 in dwarf2_frame_find_fde (pc=0x7ffd1fc37ad0, out_offset=0x0) at dwarf2-frame.c:1707
#5  0x000000000060f874 in dwarf2_frame_sniffer (self=0xa2c480 <dwarf2_frame_unwind>, this_frame=0x2fceb60, this_cache=0x2fceb78) at dwarf2-frame.c:1337
#6  0x00000000006913cc in frame_unwind_try_unwinder (this_frame=0x2fceb60, this_cache=0x2fceb78, unwinder=0xa2c480 <dwarf2_frame_unwind>) at frame-unwind.c:106
#7  0x0000000000691598 in frame_unwind_find_by_frame (this_frame=0x2fceb60, this_cache=0x2fceb78) at frame-unwind.c:164
#8  0x000000000068fe3c in get_frame_type (frame=0x2fceb60) at frame.c:2625
#9  0x000000000077404e in print_frame_info (frame=0x2fceb60, print_level=0, print_what=SRC_AND_LOC, print_args=1, set_current_sal=1) at stack.c:795
#10 0x00000000007729b0 in print_stack_frame (frame=0x2fceb60, print_level=0, print_what=SRC_AND_LOC, set_current_sal=1) at stack.c:177
#11 0x00000000006d3ee5 in print_stop_location (ws=0x7ffd1fc37db0) at infrun.c:8041
#12 0x00000000006d3f5b in print_stop_event (uiout=0x30f1900) at infrun.c:8058
...

>> I was also
>> unable to figure out why the error message isn't displayed.  The new
>> reproducer shows this issue.  I wasn't sure if setting *size or even
>> descriptor->size was the right thing to do, but it seemed reasonable to
>> me since the comment in gdb_bfd.h states that this function updates
>> *size.  There's currently only one caller for 'gdb_bfd_map_section', so
>> I have no problem updating *size there if that is preferred.
> 
> Actually it says "If successful, the section data is returned and *SIZE is set to the size of the section data;".  And this is what I would generally expect from functions.  Unless stated otherwise, the value of output parameters should be considered undefined if the function failed.  So I would lean towards blaming the caller for not taking enough precautions.  It trusts that gdb_bfd_map_section won't fail.
>

I'm not sure how the gdb_bfd_map_section caller can pre-determine that
it will fail.  It looks like there may be situations where
gdb_bfd_map_section doesn't actually need to read the file, so that
would mean that simply checking if the object file is readable before
calling gdb_bfd_map_section might not be a good way to address this.
Outside of that, I don't see what pre-conditions I can check to
determine if gdb_bfd_map_section is going to fail in this case.  Do you
have any suggestions?

To add a little more info, the segfault happens after trying to run
'next' in the debugger after this error is thrown.  Here is a snippet of
the stack:

#0  0x000000000083b3e1 in bfd_getl32 (p=0x0) at libbfd.c:557
#1  0x000000000060fb3e in read_initial_length (abfd=0x31805e0, buf=0x0, bytes_read_ptr=0x7ffd1fc3797c) at dwarf2-frame.c:1482
#2  0x0000000000610592 in decode_frame_entry_1 (unit=0x31ddf40, start=0x0, eh_frame_p=1, cie_table=0x7ffd1fc37b00, fde_table=0x7ffd1fc37af0, entry_type=EH_CIE_OR_FDE_TYPE_ID) at dwarf2-frame.c:1792
#3  0x00000000006111b5 in decode_frame_entry (unit=0x31ddf40, start=0x0, eh_frame_p=1, cie_table=0x7ffd1fc37b00, fde_table=0x7ffd1fc37af0, entry_type=EH_CIE_OR_FDE_TYPE_ID) at dwarf2-frame.c:2090
#4  0x00000000006116d1 in dwarf2_build_frame_info (objfile=0x31c0be0) at dwarf2-frame.c:2247
#5  0x0000000000610262 in dwarf2_frame_find_fde (pc=0x7ffd1fc37d30, out_offset=0x2fcec58) at dwarf2-frame.c:1707
#6  0x000000000060ead3 in dwarf2_frame_cache (this_frame=0x2fceb60, this_cache=0x2fceb78) at dwarf2-frame.c:1000
#7  0x000000000060f2d7 in dwarf2_frame_this_id (this_frame=0x2fceb60, this_cache=0x2fceb78, this_id=0x2fcebc0) at dwarf2-frame.c:1198
#8  0x000000000068baab in compute_frame_id (fi=0x2fceb60) at frame.c:505
#9  0x000000000068bbf7 in get_frame_id (fi=0x2fceb60) at frame.c:537
#10 0x00000000006bee4e in step_command_fsm_prepare (sm=0x3175e10, skip_subroutines=1, single_inst=0, count=1, thread=0x3162ac0) at infcmd.c:1056
#11 0x00000000006bef73 in step_1 (skip_subroutines=1, single_inst=0, count_string=0x0) at infcmd.c:1096
#12 0x00000000006bed3f in next_command (count_string=0x0, from_tty=1) at infcmd.c:969
...

dwarf2_build_frame_info first checks if unit->dwarf_frame_size is
non-zero (it is), then it calls decode_frame_entry on
unit->dwarf_frame_buffer.  unit->dwarf_frame_buffer is null, which
triggers the segfault.  Adding checks in dwarf2_build_frame_info to
check that dwarf_frame_buffer is non-zero (which needs to be done in two
places) also prevents the crash.  However, this doesn't address the
issue that the stack frame info isn't printed when the initial
breakpoint is hit (because gdb_bfd_map_section throws an error).  It
would be nice if this could be gracefully handled so that the current
frame and source code line are printed instead of just dumping you back
to a (gdb) prompt without knowing where you are.  Since my patch changes
the error to a warning, and updates the size pointer, it fixes both of
these issues.

I can imagine two alternatives:

- Change the error to a warning, and make the caller of
  gdb_bfd_map_section check if the returned pointer is NULL, and if so
  set *size to 0.  This requires the ugly-ish early return that I had to
  add in gdb_bfd_map_section.
  
- Add a TRY/CATCH around dwarf2_read_section in dwarf2_get_section_info,
  that sets *sectp, *bufp, and *sizep to NULL.  I'm not sure if I should
  then re-throw the exception.

Thanks,
Mike
  

Patch

diff --git a/gdb/gdb_bfd.c b/gdb/gdb_bfd.c
index 29080b8..229f5ae 100644
--- a/gdb/gdb_bfd.c
+++ b/gdb/gdb_bfd.c
@@ -705,9 +705,15 @@  gdb_bfd_map_section (asection *sectp, bfd_size_type *size)
 
   data = NULL;
   if (!bfd_get_full_section_contents (abfd, sectp, &data))
-    error (_("Can't read data for section '%s' in file '%s'"),
-	   bfd_get_section_name (abfd, sectp),
-	   bfd_get_filename (abfd));
+    {
+      warning (_("Can't read data for section '%s' in file '%s'"),
+	       bfd_get_section_name (abfd, sectp),
+	       bfd_get_filename (abfd));
+      /* Section is invalid -- set size to 0 and return NULL */
+      descriptor->size = 0;
+      *size = descriptor->size;
+      return (const gdb_byte *) NULL;
+    }
   descriptor->data = data;
 
  done: