[PR,gdb/22736,aarch64] gdb crashes on a conditional breakpoint with cast return type

Message ID F0AF0417-7A6F-48C6-B6E5-6999F9FEE3A7@arm.com
State New, archived
Headers

Commit Message

Alan Hayward March 1, 2018, 5:03 p.m. UTC
  On aarch64, the (int) casting in the following causes a gdb segfault:
$ ./gdb ./gdb
(gdb) b dwarf2_physname if (int)strcmp (name, "another_thread_local") == 0
(gdb) run a.out         // use any a.out

This is due to getting a null pointer from TYPE_TARGET_TYPE, and then
using it for language_pass_by_reference().

Fixed by adding a null check, similar to other occurrences in gdb.

Tested on aarch64 with make check using unix, native_gdbserver.

Alan.


2018-03-01  Alan Hayward  <alan.hayward@arm.com>

	* aarch64-tdep.c (aarch64_push_dummy_call): Check for null
	return_type.
  

Comments

Joel Brobecker March 2, 2018, 3:32 a.m. UTC | #1
On Thu, Mar 01, 2018 at 05:03:44PM +0000, Alan Hayward wrote:
> On aarch64, the (int) casting in the following causes a gdb segfault:
> $ ./gdb ./gdb
> (gdb) b dwarf2_physname if (int)strcmp (name, "another_thread_local") == 0
> (gdb) run a.out         // use any a.out
> 
> This is due to getting a null pointer from TYPE_TARGET_TYPE, and then
> using it for language_pass_by_reference().
> 
> Fixed by adding a null check, similar to other occurrences in gdb.
> 
> Tested on aarch64 with make check using unix, native_gdbserver.
> 
> Alan.
> 
> 
> 2018-03-01  Alan Hayward  <alan.hayward@arm.com>
> 
> 	* aarch64-tdep.c (aarch64_push_dummy_call): Check for null
> 	return_type.

The patch looks good to me, but do you think you could add a test
for it? Intuitively, I think this should be fairly easily doable,
but can you confirm?

> diff --git a/gdb/aarch64-tdep.c b/gdb/aarch64-tdep.c
> index f08945ea07101e1cd7906ca640c023ac7d189dd9..ef982c78fe64ceef3c7c378fd22d76604bf81c31 100644
> --- a/gdb/aarch64-tdep.c
> +++ b/gdb/aarch64-tdep.c
> @@ -1382,7 +1382,7 @@ aarch64_push_dummy_call (struct gdbarch *gdbarch, struct value *function,
>    struct aarch64_call_info info;
>    struct type *func_type;
>    struct type *return_type;
> -  int lang_struct_return;
> +  int lang_struct_return = 0;
> 
>    memset (&info, 0, sizeof (info));
> 
> @@ -1424,7 +1424,8 @@ aarch64_push_dummy_call (struct gdbarch *gdbarch, struct value *function,
>       given an additional initial argument, a hidden pointer to the
>       return slot in memory.  */
>    return_type = TYPE_TARGET_TYPE (func_type);
> -  lang_struct_return = language_pass_by_reference (return_type);
> +  if (return_type != nullptr)
> +    lang_struct_return = language_pass_by_reference (return_type);
> 
>    /* Set the return address.  For the AArch64, the return breakpoint
>       is always at BP_ADDR.  */
>
  
Yao Qi March 2, 2018, 10:07 a.m. UTC | #2
On Thu, Mar 1, 2018 at 5:03 PM, Alan Hayward <Alan.Hayward@arm.com> wrote:
> 2018-03-01  Alan Hayward  <alan.hayward@arm.com>
>
>         * aarch64-tdep.c (aarch64_push_dummy_call): Check for null
>         return_type.

Add "PR gdb/22736" in ChangeLog entry.

Any idea on why TYPE_TARGET_TYPE (func_type) is NULL? (because
there is no strcmp debug info?)
  
Alan Hayward March 2, 2018, 12:09 p.m. UTC | #3
> On 2 Mar 2018, at 03:32, Joel Brobecker <brobecker@adacore.com> wrote:

> 

> On Thu, Mar 01, 2018 at 05:03:44PM +0000, Alan Hayward wrote:

>> On aarch64, the (int) casting in the following causes a gdb segfault:

>> $ ./gdb ./gdb

>> (gdb) b dwarf2_physname if (int)strcmp (name, "another_thread_local") == 0

>> (gdb) run a.out         // use any a.out

>> 

>> This is due to getting a null pointer from TYPE_TARGET_TYPE, and then

>> using it for language_pass_by_reference().

>> 

>> Fixed by adding a null check, similar to other occurrences in gdb.

>> 

>> Tested on aarch64 with make check using unix, native_gdbserver.

>> 

>> Alan.

>> 

>> 

>> 2018-03-01  Alan Hayward  <alan.hayward@arm.com>

>> 

>> 	* aarch64-tdep.c (aarch64_push_dummy_call): Check for null

>> 	return_type.

> 

> The patch looks good to me, but do you think you could add a test

> for it? Intuitively, I think this should be fairly easily doable,

> but can you confirm?


Agreed, should be easy enough.
I’ve not added anything to the .exp files yet, so this is a good excuse for me to
look into them a bit more :)

Thanks for the review.


> On 2 Mar 2018, at 10:07, Yao Qi <qiyaoltc@gmail.com> wrote:

> 

> On Thu, Mar 1, 2018 at 5:03 PM, Alan Hayward <Alan.Hayward@arm.com> wrote:

>> 2018-03-01  Alan Hayward  <alan.hayward@arm.com>

>> 

>>        * aarch64-tdep.c (aarch64_push_dummy_call): Check for null

>>        return_type.

> 

> Add "PR gdb/22736" in ChangeLog entry.

> 


Will add.

> Any idea on why TYPE_TARGET_TYPE (func_type) is NULL? (because

> there is no strcmp debug info?)

> 


The cast to (int) is causing this - remove the cast and it finds the type.
I’m assuming that’s causing it to drop the debug info.


Also, thanks for the review.


Alan.
  
Yao Qi March 2, 2018, 12:21 p.m. UTC | #4
On Fri, Mar 2, 2018 at 12:09 PM, Alan Hayward <Alan.Hayward@arm.com> wrote:
>> Any idea on why TYPE_TARGET_TYPE (func_type) is NULL? (because
>> there is no strcmp debug info?)
>>
>
> The cast to (int) is causing this - remove the cast and it finds the type.
> I’m assuming that’s causing it to drop the debug info.

It sounds like a bug to me.  If this bug "TYPE_TARGET_TYPE (func_type) becomes
NULL caused by cast" is fixed, the GDB segfault will go away accordingly.

To be clear, your patch here is fine to me.  My suggestion is that we'd better
dig it deeper.
  
Alan Hayward March 2, 2018, 2:05 p.m. UTC | #5
> On 2 Mar 2018, at 12:21, Yao Qi <qiyaoltc@gmail.com> wrote:

> 

> On Fri, Mar 2, 2018 at 12:09 PM, Alan Hayward <Alan.Hayward@arm.com> wrote:

>>> Any idea on why TYPE_TARGET_TYPE (func_type) is NULL? (because

>>> there is no strcmp debug info?)

>>> 

>> 

>> The cast to (int) is causing this - remove the cast and it finds the type.

>> I’m assuming that’s causing it to drop the debug info.

> 

> It sounds like a bug to me.  If this bug "TYPE_TARGET_TYPE (func_type) becomes

> NULL caused by cast" is fixed, the GDB segfault will go away accordingly.

> 

> To be clear, your patch here is fine to me.  My suggestion is that we'd better

> dig it deeper.

> 


Agreed. I’ll raise a new bug, and have a look into it too whilst I’m in the area.


Alan.
  
Joel Brobecker March 2, 2018, 3:18 p.m. UTC | #6
> >> The cast to (int) is causing this - remove the cast and it finds the type.
> >> I’m assuming that’s causing it to drop the debug info.
> > 
> > It sounds like a bug to me.  If this bug "TYPE_TARGET_TYPE
> > (func_type) becomes NULL caused by cast" is fixed, the GDB segfault
> > will go away accordingly.
> > 
> > To be clear, your patch here is fine to me.  My suggestion is that
> > we'd better dig it deeper.
> 
> Agreed. I’ll raise a new bug, and have a look into it too whilst I’m
> in the area.

Having read what Yao said, and looking at the example you gave,
I'm now thinking that one could very well be the cause of the other;
it seems like the cast might indeed be returning a value with
a struct type that's missing the return type. Even a subprogram
which returns nothing has a TYPE_TARGET_TYPE set to a TYPE_CODE_VOID,
right?
  
Alan Hayward March 5, 2018, 3:57 p.m. UTC | #7
> On 2 Mar 2018, at 15:18, Joel Brobecker <brobecker@adacore.com> wrote:

> 

>>>> The cast to (int) is causing this - remove the cast and it finds the type.

>>>> I’m assuming that’s causing it to drop the debug info.

>>> 

>>> It sounds like a bug to me.  If this bug "TYPE_TARGET_TYPE

>>> (func_type) becomes NULL caused by cast" is fixed, the GDB segfault

>>> will go away accordingly.

>>> 

>>> To be clear, your patch here is fine to me.  My suggestion is that

>>> we'd better dig it deeper.

>> 

>> Agreed. I’ll raise a new bug, and have a look into it too whilst I’m

>> in the area.

> 

> Having read what Yao said, and looking at the example you gave,

> I'm now thinking that one could very well be the cause of the other;

> it seems like the cast might indeed be returning a value with

> a struct type that's missing the return type. Even a subprogram

> which returns nothing has a TYPE_TARGET_TYPE set to a TYPE_CODE_VOID,

> right?


Been debugging this a little more (and learnt quite a few things about gdb along the way!)
Not sure if this reply is the best place to discuss, or if it should go onto the bug.
But….

On x86, for strcmp the type and it's target types, as received in amd64_push_dummy_call are:
TYPE_CODE_FUNC -> TYPE_CODE_INT -> 0x0
Whereas on aarch64 in aarch64_push_dummy_call we get:
TYPE_CODE_FUNC -> 0x0

It turns out this occurs because strcmp has IFUNC vs FUNC differences:

X86:
$ readelf -s /lib/x86_64-linux-gnu/libc-2.19.so | grep strcmp
  2085: 0000000000087380    60 IFUNC   GLOBAL DEFAULT   12 strcmp@@GLIBC_2.2.5

Aarch64:
$ readelf -s /lib/aarch64-linux-gnu/libc-2.23.so | grep strcmp
  2043: 0000000000076380   164 FUNC    GLOBAL DEFAULT   12 strcmp@@GLIBC_2.17


I decided to test this on X86 using a FUNC….

$ readelf -s /lib/x86_64-linux-gnu/libc-2.19.so | grep strlen
   769: 0000000000088dd0   436 FUNC    GLOBAL DEFAULT   12 strlen@@GLIBC_2.2.5

Changed my test to do:
(gdb) b foo if (int)strlen(name) == 3

...Now the type received by the target code is TYPE_CODE_FUNC -> 0, the same as aarch64.

However, x86 does not segfault. This is because amd64_push_dummy_call doesn’t bother to
check the function parameter.


Looking into the code, when reading a FUNC, readlf.c : elf_symtab_read() hits:

		  if (sym->flags & BSF_GNU_INDIRECT_FUNCTION)
		    ms_type = mst_text_gnu_ifunc;
		  else
		    ms_type = mst_text;

Setting the type to mst_text, which eventually causes find_minsym_type_and_address() to
set the type to objfile_type (objfile)->nodebug_text_symbol
Whereas an IFUNC gets set to objfile_type (objfile)->nodebug_text_gnu_ifunc_symbol.

So, I think probably either:
* Everything is correct
* mst_text needs handling differently
* FUNCs need a new minimal symbol type (mst_text_gnu_func?)
(I’m not familiar enough with this part of the code to give a definitive answer).


Alan.
  
Pedro Alves March 5, 2018, 4:45 p.m. UTC | #8
On 03/05/2018 03:57 PM, Alan Hayward wrote:
> 
> 
>> On 2 Mar 2018, at 15:18, Joel Brobecker <brobecker@adacore.com> wrote:
>>
>>>>> The cast to (int) is causing this - remove the cast and it finds the type.
>>>>> I’m assuming that’s causing it to drop the debug info.
>>>>
>>>> It sounds like a bug to me.  If this bug "TYPE_TARGET_TYPE
>>>> (func_type) becomes NULL caused by cast" is fixed, the GDB segfault
>>>> will go away accordingly.
>>>>
>>>> To be clear, your patch here is fine to me.  My suggestion is that
>>>> we'd better dig it deeper.
>>>
>>> Agreed. I’ll raise a new bug, and have a look into it too whilst I’m
>>> in the area.
>>
>> Having read what Yao said, and looking at the example you gave,
>> I'm now thinking that one could very well be the cause of the other;
>> it seems like the cast might indeed be returning a value with
>> a struct type that's missing the return type. Even a subprogram
>> which returns nothing has a TYPE_TARGET_TYPE set to a TYPE_CODE_VOID,
>> right?
> 
> Been debugging this a little more (and learnt quite a few things about gdb along the way!)
> Not sure if this reply is the best place to discuss, or if it should go onto the bug.
> But….
> 
> On x86, for strcmp the type and it's target types, as received in amd64_push_dummy_call are:
> TYPE_CODE_FUNC -> TYPE_CODE_INT -> 0x0
> Whereas on aarch64 in aarch64_push_dummy_call we get:
> TYPE_CODE_FUNC -> 0x0
> 
> It turns out this occurs because strcmp has IFUNC vs FUNC differences:
> 
> X86:
> $ readelf -s /lib/x86_64-linux-gnu/libc-2.19.so | grep strcmp
>   2085: 0000000000087380    60 IFUNC   GLOBAL DEFAULT   12 strcmp@@GLIBC_2.2.5
> 
> Aarch64:
> $ readelf -s /lib/aarch64-linux-gnu/libc-2.23.so | grep strcmp
>   2043: 0000000000076380   164 FUNC    GLOBAL DEFAULT   12 strcmp@@GLIBC_2.17
> 
> 
> I decided to test this on X86 using a FUNC….
> 
> $ readelf -s /lib/x86_64-linux-gnu/libc-2.19.so | grep strlen
>    769: 0000000000088dd0   436 FUNC    GLOBAL DEFAULT   12 strlen@@GLIBC_2.2.5
> 
> Changed my test to do:
> (gdb) b foo if (int)strlen(name) == 3
> 
> ...Now the type received by the target code is TYPE_CODE_FUNC -> 0, the same as aarch64.
> 
> However, x86 does not segfault. This is because amd64_push_dummy_call doesn’t bother to
> check the function parameter.
> 
> 
> Looking into the code, when reading a FUNC, readlf.c : elf_symtab_read() hits:
> 
> 		  if (sym->flags & BSF_GNU_INDIRECT_FUNCTION)
> 		    ms_type = mst_text_gnu_ifunc;
> 		  else
> 		    ms_type = mst_text;
> 
> Setting the type to mst_text, which eventually causes find_minsym_type_and_address() to
> set the type to objfile_type (objfile)->nodebug_text_symbol
> Whereas an IFUNC gets set to objfile_type (objfile)->nodebug_text_gnu_ifunc_symbol.
> 
> So, I think probably either:
> * Everything is correct
> * mst_text needs handling differently
> * FUNCs need a new minimal symbol type (mst_text_gnu_func?)
> (I’m not familiar enough with this part of the code to give a definitive answer).

Funny enough, I started working on fixing ifunc-related problems early
last week, but then got sidetracked into other high priority things.  I've not
run into a GDB crash, but just in case, maybe this branch helps:

 https://github.com/palves/gdb/commits/palves/ifunc

The gnu-ifunc.exp test doesn't pass cleanly on that branch
because I'm midway through augmenting that testcase to cover
different scenarios and evaluating whether it's the test that
needs fixing, or gdb.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/aarch64-tdep.c b/gdb/aarch64-tdep.c
index f08945ea07101e1cd7906ca640c023ac7d189dd9..ef982c78fe64ceef3c7c378fd22d76604bf81c31 100644
--- a/gdb/aarch64-tdep.c
+++ b/gdb/aarch64-tdep.c
@@ -1382,7 +1382,7 @@  aarch64_push_dummy_call (struct gdbarch *gdbarch, struct value *function,
   struct aarch64_call_info info;
   struct type *func_type;
   struct type *return_type;
-  int lang_struct_return;
+  int lang_struct_return = 0;

   memset (&info, 0, sizeof (info));

@@ -1424,7 +1424,8 @@  aarch64_push_dummy_call (struct gdbarch *gdbarch, struct value *function,
      given an additional initial argument, a hidden pointer to the
      return slot in memory.  */
   return_type = TYPE_TARGET_TYPE (func_type);
-  lang_struct_return = language_pass_by_reference (return_type);
+  if (return_type != nullptr)
+    lang_struct_return = language_pass_by_reference (return_type);

   /* Set the return address.  For the AArch64, the return breakpoint
      is always at BP_ADDR.  */