gdb: change type of `general_symbol_info::m_section` to int

Message ID 20240912122744.8603-1-simon.marchi@efficios.com
State New
Headers
Series gdb: change type of `general_symbol_info::m_section` to int |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 fail Patch failed to apply
linaro-tcwg-bot/tcwg_gdb_check--master-arm fail Test failed

Commit Message

Simon Marchi Sept. 12, 2024, 12:27 p.m. UTC
  The binary provided with bug 32165 [1] has 36139 ELF sections.  GDB
crashes on it with (note that my GDB is build with -D_GLIBCXX_DEBUG=1:

    $ ./gdb  -nx -q --data-directory=data-directory ./vmlinux
    Reading symbols from ./vmlinux...
    (No debugging symbols found in ./vmlinux)
    (gdb) info func
    /usr/include/c++/14.2.1/debug/vector:508:
    In function:
        std::debug::vector<_Tp, _Allocator>::reference std::debug::vector<_Tp,
        _Allocator>::operator[](size_type) [with _Tp = long unsigned int;
        _Allocator = std::allocator<long unsigned int>; reference = long
        unsigned int&; size_type = long unsigned int]

    Error: attempt to subscript container with out-of-bounds index -29445, but
    container only holds 36110 elements.

    Objects involved in the operation:
        sequence "this" @ 0x514000007340 {
          type = std::debug::vector<unsigned long, std::allocator<unsigned long> >;
        }

The crash occurs here:

    #3  0x00007ffff5e334c3 in __GI_abort () at abort.c:79
    #4  0x00007ffff689afc4 in __gnu_debug::_Error_formatter::_M_error (this=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:1320
    #5  0x0000555561119a16 in std::__debug::vector<unsigned long, std::allocator<unsigned long> >::operator[] (this=0x514000007340, __n=18446744073709522171)
        at /usr/include/c++/14.2.1/debug/vector:508
    #6  0x0000555562e288e8 in minimal_symbol::value_address (this=0x5190000bb698, objfile=0x514000007240) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:517
    #7  0x0000555562e5a131 in global_symbol_searcher::expand_symtabs (this=0x7ffff0f5c340, objfile=0x514000007240, preg=std::optional [no contained value])
        at /home/smarchi/src/binutils-gdb/gdb/symtab.c:4983
    #8  0x0000555562e5d2ed in global_symbol_searcher::search (this=0x7ffff0f5c340) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5189
    #9  0x0000555562e5ffa4 in symtab_symbol_info (quiet=false, exclude_minsyms=false, regexp=0x0, kind=FUNCTION_DOMAIN, t_regexp=0x0, from_tty=1)
        at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5361
    #10 0x0000555562e6131b in info_functions_command (args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5525

That is, at this line of `minimal_symbol::value_address`, where
`objfile->section_offsets` is an `std::vector`:

    return (CORE_ADDR (this->unrelocated_address ())
	    + objfile->section_offsets[this->section_index ()]);

A section index of -29445 is suspicious.  The minimal_symbol at play
here is:

    (top-gdb) p m_name
    $1 = 0x521001de10af "_sinittext"

So I restarted debugging, breaking on:

   (top-gdb) b general_symbol_info::set_section_index if $_streq("_sinittext", m_name)

And I see that weird -29445 value:

    (top-gdb) frame
    #0  general_symbol_info::set_section_index (this=0x525000082390, idx=-29445) at /home/smarchi/src/binutils-gdb/gdb/symtab.h:611
    611       { m_section = idx; }

But going up one frame, the section index is 36091:

    (top-gdb) frame
    #1  0x0000555562426526 in minimal_symbol_reader::record_full (this=0x7ffff0ead560, name="_sinittext", copy_name=false,
        address=-2111475712, ms_type=mst_text, section=36091) at /home/smarchi/src/binutils-gdb/gdb/minsyms.c:1228
    1228      msymbol->set_section_index (section);

It seems like the problem is just that the type used for the section
index (short) is not big enough.  Change from short to int.  If somebody
insists, we could even go long long / int64_t, but I doubt it's
necessary.

With that fixed, I get:

    (gdb) info func
    All defined functions:

    Non-debugging symbols:
    0xffffffff81000000  _stext
    0xffffffff82257000  _sinittext
    0xffffffff822b4ebb  _einittext

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=32165

Change-Id: Icb1c3de9474ff5adef7e0bbbf5e0b67b279dee04
---
 gdb/symtab.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


base-commit: 4290b2c07e2d9bc1e1661a4ad5e343e3eb307770
  

Comments

Tom de Vries Sept. 12, 2024, 1:28 p.m. UTC | #1
On 9/12/24 14:27, Simon Marchi wrote:
> The binary provided with bug 32165 [1] has 36139 ELF sections.  GDB
> crashes on it with (note that my GDB is build with -D_GLIBCXX_DEBUG=1:
> 
>      $ ./gdb  -nx -q --data-directory=data-directory ./vmlinux
>      Reading symbols from ./vmlinux...
>      (No debugging symbols found in ./vmlinux)
>      (gdb) info func
>      /usr/include/c++/14.2.1/debug/vector:508:
>      In function:
>          std::debug::vector<_Tp, _Allocator>::reference std::debug::vector<_Tp,
>          _Allocator>::operator[](size_type) [with _Tp = long unsigned int;
>          _Allocator = std::allocator<long unsigned int>; reference = long
>          unsigned int&; size_type = long unsigned int]
> 
>      Error: attempt to subscript container with out-of-bounds index -29445, but
>      container only holds 36110 elements.
> 
>      Objects involved in the operation:
>          sequence "this" @ 0x514000007340 {
>            type = std::debug::vector<unsigned long, std::allocator<unsigned long> >;
>          }
> 
> The crash occurs here:
> 
>      #3  0x00007ffff5e334c3 in __GI_abort () at abort.c:79
>      #4  0x00007ffff689afc4 in __gnu_debug::_Error_formatter::_M_error (this=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:1320
>      #5  0x0000555561119a16 in std::__debug::vector<unsigned long, std::allocator<unsigned long> >::operator[] (this=0x514000007340, __n=18446744073709522171)
>          at /usr/include/c++/14.2.1/debug/vector:508
>      #6  0x0000555562e288e8 in minimal_symbol::value_address (this=0x5190000bb698, objfile=0x514000007240) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:517
>      #7  0x0000555562e5a131 in global_symbol_searcher::expand_symtabs (this=0x7ffff0f5c340, objfile=0x514000007240, preg=std::optional [no contained value])
>          at /home/smarchi/src/binutils-gdb/gdb/symtab.c:4983
>      #8  0x0000555562e5d2ed in global_symbol_searcher::search (this=0x7ffff0f5c340) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5189
>      #9  0x0000555562e5ffa4 in symtab_symbol_info (quiet=false, exclude_minsyms=false, regexp=0x0, kind=FUNCTION_DOMAIN, t_regexp=0x0, from_tty=1)
>          at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5361
>      #10 0x0000555562e6131b in info_functions_command (args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5525
> 
> That is, at this line of `minimal_symbol::value_address`, where
> `objfile->section_offsets` is an `std::vector`:
> 
>      return (CORE_ADDR (this->unrelocated_address ())
> 	    + objfile->section_offsets[this->section_index ()]);
> 
> A section index of -29445 is suspicious.  The minimal_symbol at play
> here is:
> 
>      (top-gdb) p m_name
>      $1 = 0x521001de10af "_sinittext"
> 
> So I restarted debugging, breaking on:
> 
>     (top-gdb) b general_symbol_info::set_section_index if $_streq("_sinittext", m_name)
> 
> And I see that weird -29445 value:
> 
>      (top-gdb) frame
>      #0  general_symbol_info::set_section_index (this=0x525000082390, idx=-29445) at /home/smarchi/src/binutils-gdb/gdb/symtab.h:611
>      611       { m_section = idx; }
> 
> But going up one frame, the section index is 36091:
> 
>      (top-gdb) frame
>      #1  0x0000555562426526 in minimal_symbol_reader::record_full (this=0x7ffff0ead560, name="_sinittext", copy_name=false,
>          address=-2111475712, ms_type=mst_text, section=36091) at /home/smarchi/src/binutils-gdb/gdb/minsyms.c:1228
>      1228      msymbol->set_section_index (section);
> 
> It seems like the problem is just that the type used for the section
> index (short) is not big enough.  Change from short to int.  If somebody
> insists, we could even go long long / int64_t, but I doubt it's
> necessary.
> 
> With that fixed, I get:
> 
>      (gdb) info func
>      All defined functions:
> 
>      Non-debugging symbols:
>      0xffffffff81000000  _stext
>      0xffffffff82257000  _sinittext
>      0xffffffff822b4ebb  _einittext
> 

Hi Simon,

LGTM.

I wonder if using auto instead of int in set_section_index and 
section_index would be cleaner.

Reviewed-By: Tom de Vries <tdevries@suse.de>

Thanks,
- Tom

> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=32165
> 
> Change-Id: Icb1c3de9474ff5adef7e0bbbf5e0b67b279dee04
> ---
>   gdb/symtab.h | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gdb/symtab.h b/gdb/symtab.h
> index d615fdf1e52b..a273f32ddd0d 100644
> --- a/gdb/symtab.h
> +++ b/gdb/symtab.h
> @@ -601,20 +601,20 @@ struct general_symbol_info
>        section_offsets for this objfile.  Negative means that the symbol
>        does not get relocated relative to a section.  */
>   
> -  short m_section;
> +  int m_section;
>   
>     /* Set the index into the obj_section list (within the containing
>        objfile) for the section that contains this symbol.  See M_SECTION
>        for more details.  */
>   
> -  void set_section_index (short idx)
> +  void set_section_index (int idx)
>     { m_section = idx; }
>   
>     /* Return the index into the obj_section list (within the containing
>        objfile) for the section that contains this symbol.  See M_SECTION
>        for more details.  */
>   
> -  short section_index () const
> +  int section_index () const
>     { return m_section; }
>   
>     /* Return the obj_section from OBJFILE for this symbol.  The symbol
> 
> base-commit: 4290b2c07e2d9bc1e1661a4ad5e343e3eb307770
  
Keith Seitz Sept. 12, 2024, 1:28 p.m. UTC | #2
On 9/12/24 5:27 AM, Simon Marchi wrote:
> It seems like the problem is just that the type used for the section
> index (short) is not big enough.  Change from short to int.  If somebody
> insists, we could even go long long / int64_t, but I doubt it's
> necessary.

Amazingly, everywhere I looked where set_section_index was used ALREADY
used/assumed `int'! Nice catch!

> With that fixed, I get:
> 
>      (gdb) info func
>      All defined functions:
> 
>      Non-debugging symbols:
>      0xffffffff81000000  _stext
>      0xffffffff82257000  _sinittext
>      0xffffffff822b4ebb  _einittext
> 
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=32165

Reviewed-by: Keith Seitz <keiths@redhat.com>

Keith
  
Simon Marchi Sept. 12, 2024, 2:41 p.m. UTC | #3
On 9/12/24 9:28 AM, Tom de Vries wrote:
> Hi Simon,
> 
> LGTM.
> 
> I wonder if using auto instead of int in set_section_index and section_index would be cleaner.

For section_index, I can see, so that the return type of that method
will follow the type of the m_section field.  I will make that change.
But for set_section_index, I don't really see how I can use auto.

Simon
  
Simon Marchi Sept. 12, 2024, 2:42 p.m. UTC | #4
On 9/12/24 9:28 AM, Keith Seitz wrote:
> On 9/12/24 5:27 AM, Simon Marchi wrote:
>> It seems like the problem is just that the type used for the section
>> index (short) is not big enough.  Change from short to int.  If somebody
>> insists, we could even go long long / int64_t, but I doubt it's
>> necessary.
> 
> Amazingly, everywhere I looked where set_section_index was used ALREADY
> used/assumed `int'! Nice catch!

Cool, I hadn't checked them all, thanks for doing it.

>> With that fixed, I get:
>>
>>      (gdb) info func
>>      All defined functions:
>>
>>      Non-debugging symbols:
>>      0xffffffff81000000  _stext
>>      0xffffffff82257000  _sinittext
>>      0xffffffff822b4ebb  _einittext
>>
>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=32165
> 
> Reviewed-by: Keith Seitz <keiths@redhat.com>

Thanks,

Simon
  
Tom de Vries Sept. 12, 2024, 2:59 p.m. UTC | #5
On 9/12/24 16:41, Simon Marchi wrote:
> On 9/12/24 9:28 AM, Tom de Vries wrote:
>> Hi Simon,
>>
>> LGTM.
>>
>> I wonder if using auto instead of int in set_section_index and section_index would be cleaner.
> 
> For section_index, I can see, so that the return type of that method
> will follow the type of the m_section field.  I will make that change.
> But for set_section_index, I don't really see how I can use auto.

I tried, and it seems to build:
...
diff --git a/gdb/symtab.h b/gdb/symtab.h
index d615fdf1e52..8f2e2af2d8b 100644
--- a/gdb/symtab.h
+++ b/gdb/symtab.h
@@ -607,14 +607,14 @@ struct general_symbol_info
       objfile) for the section that contains this symbol.  See M_SECTION
       for more details.  */

-  void set_section_index (short idx)
+  void set_section_index (auto idx)
    { m_section = idx; }

    /* Return the index into the obj_section list (within the containing
       objfile) for the section that contains this symbol.  See M_SECTION
       for more details.  */

-  short section_index () const
+  auto section_index () const
    { return m_section; }

    /* Return the obj_section from OBJFILE for this symbol.  The symbol
...

Thanks,
- Tom
  
Simon Marchi Sept. 12, 2024, 3:02 p.m. UTC | #6
On 9/12/24 10:59 AM, Tom de Vries wrote:
> On 9/12/24 16:41, Simon Marchi wrote:
>> On 9/12/24 9:28 AM, Tom de Vries wrote:
>>> Hi Simon,
>>>
>>> LGTM.
>>>
>>> I wonder if using auto instead of int in set_section_index and section_index would be cleaner.
>>
>> For section_index, I can see, so that the return type of that method
>> will follow the type of the m_section field.  I will make that change.
>> But for set_section_index, I don't really see how I can use auto.
> 
> I tried, and it seems to build:
> ...
> diff --git a/gdb/symtab.h b/gdb/symtab.h
> index d615fdf1e52..8f2e2af2d8b 100644
> --- a/gdb/symtab.h
> +++ b/gdb/symtab.h
> @@ -607,14 +607,14 @@ struct general_symbol_info
>       objfile) for the section that contains this symbol.  See M_SECTION
>       for more details.  */
> 
> -  void set_section_index (short idx)
> +  void set_section_index (auto idx)
>    { m_section = idx; }

Wow, I didn't know you could do that.  Unfortunately:

/home/smarchi/src/binutils-gdb/gdb/symtab.h:610:27: error: use of ‘auto’ in parameter declaration only available with ‘-std=c++20’ or ‘-fconcepts’ [-Werror]

So we can't use it right now.

> 
>    /* Return the index into the obj_section list (within the containing
>       objfile) for the section that contains this symbol.  See M_SECTION
>       for more details.  */
> 
> -  short section_index () const
> +  auto section_index () const
>    { return m_section; }

I will push the patch with that "auto".  Thanks!

Simon
  
Tom de Vries Sept. 12, 2024, 4:03 p.m. UTC | #7
On 9/12/24 17:02, Simon Marchi wrote:
> On 9/12/24 10:59 AM, Tom de Vries wrote:
>> On 9/12/24 16:41, Simon Marchi wrote:
>>> On 9/12/24 9:28 AM, Tom de Vries wrote:
>>>> Hi Simon,
>>>>
>>>> LGTM.
>>>>
>>>> I wonder if using auto instead of int in set_section_index and section_index would be cleaner.
>>>
>>> For section_index, I can see, so that the return type of that method
>>> will follow the type of the m_section field.  I will make that change.
>>> But for set_section_index, I don't really see how I can use auto.
>>
>> I tried, and it seems to build:
>> ...
>> diff --git a/gdb/symtab.h b/gdb/symtab.h
>> index d615fdf1e52..8f2e2af2d8b 100644
>> --- a/gdb/symtab.h
>> +++ b/gdb/symtab.h
>> @@ -607,14 +607,14 @@ struct general_symbol_info
>>        objfile) for the section that contains this symbol.  See M_SECTION
>>        for more details.  */
>>
>> -  void set_section_index (short idx)
>> +  void set_section_index (auto idx)
>>     { m_section = idx; }
> 
> Wow, I didn't know you could do that.  Unfortunately:
> 
> /home/smarchi/src/binutils-gdb/gdb/symtab.h:610:27: error: use of ‘auto’ in parameter declaration only available with ‘-std=c++20’ or ‘-fconcepts’ [-Werror]
> 
> So we can't use it right now.
> 

Interesting.  What seems to work with a minimal example is:
- gcc 7 -std=c++14.
- gcc 8-13 -std=c++14 -fconcepts,
- gcc 10-13 -std=c++20

So maybe it being available in gcc 7 is a bug, not sure.

Thanks,
- Tom

>>
>>     /* Return the index into the obj_section list (within the containing
>>        objfile) for the section that contains this symbol.  See M_SECTION
>>        for more details.  */
>>
>> -  short section_index () const
>> +  auto section_index () const
>>     { return m_section; }
> 
> I will push the patch with that "auto".  Thanks!
> 
> Simon
  
Tom de Vries Sept. 12, 2024, 4:08 p.m. UTC | #8
On 9/12/24 17:02, Simon Marchi wrote:
>> -  void set_section_index (short idx)
>> +  void set_section_index (auto idx)
>>     { m_section = idx; }
> Wow, I didn't know you could do that.  Unfortunately:
> 
> /home/smarchi/src/binutils-gdb/gdb/symtab.h:610:27: error: use of ‘auto’ in parameter declaration only available with ‘-std=c++20’ or ‘-fconcepts’ [-Werror]
> 
> So we can't use it right now.


FWIW, we could use typeof:
...
-  void set_section_index (short idx)
+  void set_section_index (typeof (m_section) idx)
...
but I'm not sure that's an improvement.

Thanks,
- Tom
  
Simon Marchi Sept. 12, 2024, 8:09 p.m. UTC | #9
On 9/12/24 12:08 PM, Tom de Vries wrote:
> On 9/12/24 17:02, Simon Marchi wrote:
>>> -  void set_section_index (short idx)
>>> +  void set_section_index (auto idx)
>>>     { m_section = idx; }
>> Wow, I didn't know you could do that.  Unfortunately:
>>
>> /home/smarchi/src/binutils-gdb/gdb/symtab.h:610:27: error: use of ‘auto’ in parameter declaration only available with ‘-std=c++20’ or ‘-fconcepts’ [-Werror]
>>
>> So we can't use it right now.
> 
> 
> FWIW, we could use typeof:
> ...
> -  void set_section_index (short idx)
> +  void set_section_index (typeof (m_section) idx)

Or decltype, which would be more standard C++.

I read a bit on that use of "auto", it's actually a shorthand way to
define a template function

https://en.cppreference.com/w/cpp/language/function_template#Abbreviated_function_template

It would be the same as defining it as:

template <typename T>
void set_section_index(T idx)
{
  ...
}

I think that using decltype could make sense, to bind the type of the
setter parameter to the type of the member, but I also think it's not
really a big problem as it is.

Simon
  
Tom Tromey Sept. 13, 2024, 2:10 a.m. UTC | #10
>>>>> "Simon" == Simon Marchi <simon.marchi@efficios.com> writes:

>> FWIW, we could use typeof:
>> ...
>> -  void set_section_index (short idx)
>> +  void set_section_index (typeof (m_section) idx)

Simon> Or decltype, which would be more standard C++.

I think using "auto" just for the getter method is not really that
great.  My reason is that suppose we actually do need to change the
type: in this case, this one method will be ok, but essentially every
other user in the program will still need to be updated.

Better would be to introduce a new typedef and then use it consistently.
C++ doesn't really provide a whole lot of type safety this way -- like,
unless we use some kind of enum-based newtype, one could still just
write "short section = mumble" and it would compile.  Still, this would
be a little better; and in important cases we could use a newtype, since
after all we did this for DWARF section offsets, which we were
frequently confusing with other offsets.

Just my $.02 on this, I will forget about this "auto" relatively soon.

Speaking of the type laxity of C++, I found some spots still using
"short" here.  I will send a patch momentarily.

thanks,
Tom
  
Simon Marchi Sept. 13, 2024, 9:50 a.m. UTC | #11
On 9/12/24 10:10 PM, Tom Tromey wrote:
>>>>>> "Simon" == Simon Marchi <simon.marchi@efficios.com> writes:
> 
>>> FWIW, we could use typeof:
>>> ...
>>> -  void set_section_index (short idx)
>>> +  void set_section_index (typeof (m_section) idx)
> 
> Simon> Or decltype, which would be more standard C++.
> 
> I think using "auto" just for the getter method is not really that
> great.  My reason is that suppose we actually do need to change the
> type: in this case, this one method will be ok, but essentially every
> other user in the program will still need to be updated.

Agreed, it sounds nice in theory, but in practice I doubt we'd really
forget to change that one in particular.  But I don't think it's harmful
either.

> Better would be to introduce a new typedef and then use it consistently.
> C++ doesn't really provide a whole lot of type safety this way -- like,
> unless we use some kind of enum-based newtype, one could still just
> write "short section = mumble" and it would compile.  Still, this would
> be a little better; and in important cases we could use a newtype, since
> after all we did this for DWARF section offsets, which we were
> frequently confusing with other offsets.

Even if not perfect, a typedef can help make things more readable.

> Just my $.02 on this, I will forget about this "auto" relatively soon.
> 
> Speaking of the type laxity of C++, I found some spots still using
> "short" here.  I will send a patch momentarily.

Thanks for checking.

Simon
  

Patch

diff --git a/gdb/symtab.h b/gdb/symtab.h
index d615fdf1e52b..a273f32ddd0d 100644
--- a/gdb/symtab.h
+++ b/gdb/symtab.h
@@ -601,20 +601,20 @@  struct general_symbol_info
      section_offsets for this objfile.  Negative means that the symbol
      does not get relocated relative to a section.  */
 
-  short m_section;
+  int m_section;
 
   /* Set the index into the obj_section list (within the containing
      objfile) for the section that contains this symbol.  See M_SECTION
      for more details.  */
 
-  void set_section_index (short idx)
+  void set_section_index (int idx)
   { m_section = idx; }
 
   /* Return the index into the obj_section list (within the containing
      objfile) for the section that contains this symbol.  See M_SECTION
      for more details.  */
 
-  short section_index () const
+  int section_index () const
   { return m_section; }
 
   /* Return the obj_section from OBJFILE for this symbol.  The symbol