Don't return "(null)" from bfd_elf_sym_name

Message ID Zv4dM8MuxF8EnrVh@squeak.grove.modra.org
State New
Headers
Series Don't return "(null)" from bfd_elf_sym_name |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm warning Patch is already merged
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 warning Patch is already merged

Commit Message

Alan Modra Oct. 3, 2024, 4:27 a.m. UTC
  A NULL return from bfd_elf_string_from_elf_section indicates an error.
That shouldn't be masked by bfd_elf_sym_name but rather passed up to
callers such as group_signature.  If we want to print "(null)" then
that should be done at a higher level.  That's what this patch does,
except that I chose to print "<null>" instead, like readelf.  If we
see "(null)" we're probably passing a NULL to printf.  I haven't
changed aoutx.h or pdp11.c print_symbol functions because they already
handle NULL names by omitting the name.  I also haven't changed
mach-o.c, mmo.c, som.c, srec.c, tekhex.c, vms-alpha.c and
wasm-module.c print_symbol function because it looks like they will
never have NULL symbol names.

bfd/
	* elf.c (bfd_elf_sym_name): Don't turn a NULL name into a
	pointer to "(null)".
	(bfd_elf_print_symbol): Print "<null>" for NULL symbol names.
	* coffgen.c (coff_print_symbol): Likewise.
	* ecoff.c (_bfd_ecoff_print_symbol): Likewise.
	* pef.c (bfd_pef_print_symbol): Likewise.
	* syms.c (bfd_symbol_info): Return "<null>" in symbol_info.name
	if symbol name is NULL.
ld/
	* ldlang.c (ld_is_local_symbol): Don't check for "(null)"
	symbol name.
  

Comments

Thiago Jung Bauermann Oct. 4, 2024, 12:45 a.m. UTC | #1
Hello,

Alan Modra <amodra@gmail.com> writes:

> A NULL return from bfd_elf_string_from_elf_section indicates an error.
> That shouldn't be masked by bfd_elf_sym_name but rather passed up to
> callers such as group_signature.  If we want to print "(null)" then
> that should be done at a higher level.  That's what this patch does,
> except that I chose to print "<null>" instead, like readelf.  If we
> see "(null)" we're probably passing a NULL to printf.  I haven't
> changed aoutx.h or pdp11.c print_symbol functions because they already
> handle NULL names by omitting the name.  I also haven't changed
> mach-o.c, mmo.c, som.c, srec.c, tekhex.c, vms-alpha.c and
> wasm-module.c print_symbol function because it looks like they will
> never have NULL symbol names.

This patch causes GDB to segfault in a test with a mangled symbol table:

Running gdb.base/bfd-errors.exp ...
ERROR: GDB process no longer exists
UNRESOLVED: gdb.base/bfd-errors.exp: load library with add-symbol-file

This is how the test corrupts the symbol table:

  # [...] a shared object with at least four symbols is
  # created.  The .dynsym section is extracted and offsets which should
  # refer to strings in the .dynstr section are changed to be
  # larger than the size of the .dynstr section.

GDB crashed because it wasn't prepared to get a NULL result from
bfd_asymbol_name () while building the dynamic symbols table.  I did the
obvious fix:

diff --git a/gdb/elfread.c b/gdb/elfread.c
index e959d3a2f9d3..6907d4b7e3e3 100644
--- a/gdb/elfread.c
+++ b/gdb/elfread.c
@@ -612,6 +612,8 @@ elf_rel_plt_read (minimal_symbol_reader &reader,
       const size_t got_suffix_len = strlen (SYMBOL_GOT_PLT_SUFFIX);

       name = bfd_asymbol_name (*relplt->relocation[reloc].sym_ptr_ptr);
+      if (name == nullptr)
+	name = "(null)";
       address = relplt->relocation[reloc].address;

       asection *msym_section;

But then I get a crash within BFD itself, when GDB passes the dynamic
symbols table it built to bfd_get_synthetic_symtab (). The relevant part
of the backtrace is:

#6  <signal handler called>
#7  __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex-base.S:81
#8  0x00005802042ce8d1 in _bfd_x86_elf_get_synthetic_symtab
      (abfd=0x58020840f8e0, count=2, relsize=72, got_addr=0,
       plts=0x7ffcefa985e0, dynsyms=0x580208466f60, ret=0x7ffcefa98718)
      at /home/bauermann/src/binutils-gdb-wt-2/bfd/elfxx-x86.c:3713
#9  0x00005802042c4c41 in elf_x86_64_get_synthetic_symtab
      (abfd=0x58020840f8e0, symcount=28, syms=0x580208457480,
       dynsymcount=9, dynsyms=0x580208466f60, ret=0x7ffcefa98718)
      at /home/bauermann/src/binutils-gdb-wt-2/bfd/elf64-x86-64.c:5427
#10 0x0000580203989601 in elf_read_minimal_symbols
      (objfile=0x58020840bfd0, symfile_flags=2, ei=0x7ffcefa98810)
      at /home/bauermann/src/binutils-gdb-wt-2/gdb/elfread.c:1160

Frame 8's line is:

3713	      size += strlen ((*p->sym_ptr_ptr)->name) + sizeof ("@plt");

Where:

(top-gdb) p **p->sym_ptr_ptr
$8 = {
  the_bfd = 0x58020840f8e0,
  name = 0x0,
  value = 0,
  flags = 4227073,
  section = 0x580205cfcad8 <_bfd_std_section+280>,
  udata = {
    p = 0x0,
    i = 0
  }
}

So it looks like bfd_get_synthetic_symtab () isn't prepared to deal with
the existance of symbols without a name. Should it? Or should GDB fix up
the section contents so that they'd point to a "(null)" string?
  
Alan Modra Oct. 4, 2024, 2:48 a.m. UTC | #2
On Thu, Oct 03, 2024 at 09:45:56PM -0300, Thiago Jung Bauermann wrote:
> Hello,
> 
> Alan Modra <amodra@gmail.com> writes:
> 
> > A NULL return from bfd_elf_string_from_elf_section indicates an error.
> > That shouldn't be masked by bfd_elf_sym_name but rather passed up to
> > callers such as group_signature.  If we want to print "(null)" then
> > that should be done at a higher level.  That's what this patch does,
> > except that I chose to print "<null>" instead, like readelf.  If we
> > see "(null)" we're probably passing a NULL to printf.  I haven't
> > changed aoutx.h or pdp11.c print_symbol functions because they already
> > handle NULL names by omitting the name.  I also haven't changed
> > mach-o.c, mmo.c, som.c, srec.c, tekhex.c, vms-alpha.c and
> > wasm-module.c print_symbol function because it looks like they will
> > never have NULL symbol names.
> 
> This patch causes GDB to segfault in a test with a mangled symbol table:
> 
> Running gdb.base/bfd-errors.exp ...
> ERROR: GDB process no longer exists
> UNRESOLVED: gdb.base/bfd-errors.exp: load library with add-symbol-file

Yes, I was just looking at this and had written the following:

From a88c23dda6360e247fa758f3482ee7cf661a29cf Mon Sep 17 00:00:00 2001
From: Alan Modra <amodra@gmail.com>
Date: Fri, 4 Oct 2024 07:47:05 +0930
Subject: gdb segv in elfread.c:elf_rel_plt_read

After commit 68bbe1183379, ELF symbols read via bfd_canonicalize_symtab
and similar functions which have bad st_name fields will have NULL in
the name rather than "(null)".  gdb.base/bfd-errors.exp deliberately
creates a faulty shared library with st_name pointing outside of
.dynsym for two symbols, and thus now results in NULL symbol names.
This triggers a segv on string_buffer.assign(name).  Fix that.

diff --git a/gdb/elfread.c b/gdb/elfread.c
index e959d3a2f9d..2e68b0dba1a 100644
--- a/gdb/elfread.c
+++ b/gdb/elfread.c
@@ -612,6 +612,8 @@ elf_rel_plt_read (minimal_symbol_reader &reader,
       const size_t got_suffix_len = strlen (SYMBOL_GOT_PLT_SUFFIX);
 
       name = bfd_asymbol_name (*relplt->relocation[reloc].sym_ptr_ptr);
+      if (!name)
+	continue;
       address = relplt->relocation[reloc].address;
 
       asection *msym_section;

I think it's better to ignore symbols with NULL names rather than turn
the name into "(null)" as you do in your patch, since you can't really
do anything sensible when looking up "(null)@got.plt" later.

> This is how the test corrupts the symbol table:
> 
>   # [...] a shared object with at least four symbols is
>   # created.  The .dynsym section is extracted and offsets which should
>   # refer to strings in the .dynstr section are changed to be
>   # larger than the size of the .dynstr section.
> 
> GDB crashed because it wasn't prepared to get a NULL result from
> bfd_asymbol_name () while building the dynamic symbols table.  I did the
> obvious fix:
> 
> diff --git a/gdb/elfread.c b/gdb/elfread.c
> index e959d3a2f9d3..6907d4b7e3e3 100644
> --- a/gdb/elfread.c
> +++ b/gdb/elfread.c
> @@ -612,6 +612,8 @@ elf_rel_plt_read (minimal_symbol_reader &reader,
>        const size_t got_suffix_len = strlen (SYMBOL_GOT_PLT_SUFFIX);
> 
>        name = bfd_asymbol_name (*relplt->relocation[reloc].sym_ptr_ptr);
> +      if (name == nullptr)
> +	name = "(null)";
>        address = relplt->relocation[reloc].address;
> 
>        asection *msym_section;
> 
> But then I get a crash within BFD itself, when GDB passes the dynamic
> symbols table it built to bfd_get_synthetic_symtab (). The relevant part
> of the backtrace is:
> 
> #6  <signal handler called>
> #7  __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex-base.S:81
> #8  0x00005802042ce8d1 in _bfd_x86_elf_get_synthetic_symtab
>       (abfd=0x58020840f8e0, count=2, relsize=72, got_addr=0,
>        plts=0x7ffcefa985e0, dynsyms=0x580208466f60, ret=0x7ffcefa98718)
>       at /home/bauermann/src/binutils-gdb-wt-2/bfd/elfxx-x86.c:3713
> #9  0x00005802042c4c41 in elf_x86_64_get_synthetic_symtab
>       (abfd=0x58020840f8e0, symcount=28, syms=0x580208457480,
>        dynsymcount=9, dynsyms=0x580208466f60, ret=0x7ffcefa98718)
>       at /home/bauermann/src/binutils-gdb-wt-2/bfd/elf64-x86-64.c:5427
> #10 0x0000580203989601 in elf_read_minimal_symbols
>       (objfile=0x58020840bfd0, symfile_flags=2, ei=0x7ffcefa98810)
>       at /home/bauermann/src/binutils-gdb-wt-2/gdb/elfread.c:1160
> 
> Frame 8's line is:
> 
> 3713	      size += strlen ((*p->sym_ptr_ptr)->name) + sizeof ("@plt");
> 
> Where:
> 
> (top-gdb) p **p->sym_ptr_ptr
> $8 = {
>   the_bfd = 0x58020840f8e0,
>   name = 0x0,
>   value = 0,
>   flags = 4227073,
>   section = 0x580205cfcad8 <_bfd_std_section+280>,
>   udata = {
>     p = 0x0,
>     i = 0
>   }
> }
> 
> So it looks like bfd_get_synthetic_symtab () isn't prepared to deal with
> the existance of symbols without a name. Should it? Or should GDB fix up
> the section contents so that they'd point to a "(null)" string?

gdb definitely shouldn't be trying to fix anything here.  The fix
belongs in bfd.
  
Thiago Jung Bauermann Oct. 4, 2024, 5 a.m. UTC | #3
Alan Modra <amodra@gmail.com> writes:

> On Thu, Oct 03, 2024 at 09:45:56PM -0300, Thiago Jung Bauermann wrote:
>> Hello,
>> 
>> Alan Modra <amodra@gmail.com> writes:
>> 
>> > A NULL return from bfd_elf_string_from_elf_section indicates an error.
>> > That shouldn't be masked by bfd_elf_sym_name but rather passed up to
>> > callers such as group_signature.  If we want to print "(null)" then
>> > that should be done at a higher level.  That's what this patch does,
>> > except that I chose to print "<null>" instead, like readelf.  If we
>> > see "(null)" we're probably passing a NULL to printf.  I haven't
>> > changed aoutx.h or pdp11.c print_symbol functions because they already
>> > handle NULL names by omitting the name.  I also haven't changed
>> > mach-o.c, mmo.c, som.c, srec.c, tekhex.c, vms-alpha.c and
>> > wasm-module.c print_symbol function because it looks like they will
>> > never have NULL symbol names.
>> 
>> This patch causes GDB to segfault in a test with a mangled symbol table:
>> 
>> Running gdb.base/bfd-errors.exp ...
>> ERROR: GDB process no longer exists
>> UNRESOLVED: gdb.base/bfd-errors.exp: load library with add-symbol-file
>
> Yes, I was just looking at this and had written the following:
>
> From a88c23dda6360e247fa758f3482ee7cf661a29cf Mon Sep 17 00:00:00 2001
> From: Alan Modra <amodra@gmail.com>
> Date: Fri, 4 Oct 2024 07:47:05 +0930
> Subject: gdb segv in elfread.c:elf_rel_plt_read
>
> After commit 68bbe1183379, ELF symbols read via bfd_canonicalize_symtab
> and similar functions which have bad st_name fields will have NULL in
> the name rather than "(null)".  gdb.base/bfd-errors.exp deliberately
> creates a faulty shared library with st_name pointing outside of
> .dynsym for two symbols, and thus now results in NULL symbol names.
> This triggers a segv on string_buffer.assign(name).  Fix that.
>
> diff --git a/gdb/elfread.c b/gdb/elfread.c
> index e959d3a2f9d..2e68b0dba1a 100644
> --- a/gdb/elfread.c
> +++ b/gdb/elfread.c
> @@ -612,6 +612,8 @@ elf_rel_plt_read (minimal_symbol_reader &reader,
>        const size_t got_suffix_len = strlen (SYMBOL_GOT_PLT_SUFFIX);
>  
>        name = bfd_asymbol_name (*relplt->relocation[reloc].sym_ptr_ptr);
> +      if (!name)
> +	continue;
>        address = relplt->relocation[reloc].address;
>  
>        asection *msym_section;
>
> I think it's better to ignore symbols with NULL names rather than turn
> the name into "(null)" as you do in your patch, since you can't really
> do anything sensible when looking up "(null)@got.plt" later.

I agree that makes more sense. Thank you for investigating and fixing
the problem. Will you submit the patch to GDB?
  

Patch

diff --git a/bfd/coffgen.c b/bfd/coffgen.c
index cc1c655738b..ff382a7e9c9 100644
--- a/bfd/coffgen.c
+++ b/bfd/coffgen.c
@@ -2161,11 +2161,12 @@  coff_print_symbol (bfd *abfd,
 		   bfd_print_symbol_type how)
 {
   FILE * file = (FILE *) filep;
+  const char *symname = symbol->name ? symbol->name : "<null>";
 
   switch (how)
     {
     case bfd_print_symbol_name:
-      fprintf (file, "%s", symbol->name);
+      fprintf (file, "%s", symname);
       break;
 
     case bfd_print_symbol_more:
@@ -2189,7 +2190,7 @@  coff_print_symbol (bfd *abfd,
 	  if (combined < obj_raw_syments (abfd)
 	      || combined >= obj_raw_syments (abfd) + obj_raw_syment_count (abfd))
 	    {
-	      fprintf (file, _("<corrupt info> %s"), symbol->name);
+	      fprintf (file, _("<corrupt info> %s"), symname);
 	      break;
 	    }
 
@@ -2207,7 +2208,7 @@  coff_print_symbol (bfd *abfd,
 		   combined->u.syment.n_sclass,
 		   combined->u.syment.n_numaux);
 	  bfd_fprintf_vma (abfd, file, val);
-	  fprintf (file, " %s", symbol->name);
+	  fprintf (file, " %s", symname);
 
 	  for (aux = 0; aux < combined->u.syment.n_numaux; aux++)
 	    {
@@ -2297,7 +2298,8 @@  coff_print_symbol (bfd *abfd,
 
 	  if (l)
 	    {
-	      fprintf (file, "\n%s :", l->u.sym->name);
+	      fprintf (file, "\n%s :",
+		       l->u.sym->name ? l->u.sym->name : "<null>");
 	      l++;
 	      while (l->line_number)
 		{
@@ -2317,7 +2319,7 @@  coff_print_symbol (bfd *abfd,
 		   symbol->section->name,
 		   coffsymbol (symbol)->native ? "n" : "g",
 		   coffsymbol (symbol)->lineno ? "l" : " ",
-		   symbol->name);
+		   symname);
 	}
     }
 }
diff --git a/bfd/ecoff.c b/bfd/ecoff.c
index 5ee7ffaf489..93b93f39be1 100644
--- a/bfd/ecoff.c
+++ b/bfd/ecoff.c
@@ -1452,11 +1452,12 @@  _bfd_ecoff_print_symbol (bfd *abfd,
   const struct ecoff_debug_swap * const debug_swap
     = &ecoff_backend (abfd)->debug_swap;
   FILE *file = (FILE *)filep;
+  const char *symname = symbol->name ? symbol->name : "<null>";
 
   switch (how)
     {
     case bfd_print_symbol_name:
-      fprintf (file, "%s", symbol->name);
+      fprintf (file, "%s", symname);
       break;
     case bfd_print_symbol_more:
       if (ecoffsymbol (symbol)->local)
@@ -1526,7 +1527,7 @@  _bfd_ecoff_print_symbol (bfd *abfd,
 		 (unsigned) ecoff_ext.asym.sc,
 		 (unsigned) ecoff_ext.asym.index,
 		 jmptbl, cobol_main, weakext,
-		 symbol->name);
+		 symname);
 
 	if (ecoffsymbol (symbol)->fdr != NULL
 	    && ecoff_ext.asym.index != indexNil)
diff --git a/bfd/elf.c b/bfd/elf.c
index c882a66ab5c..7d3d2063130 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -549,9 +549,7 @@  bfd_elf_sym_name (bfd *abfd,
     }
 
   name = bfd_elf_string_from_elf_section (abfd, shindex, iname);
-  if (name == NULL)
-    name = "(null)";
-  else if (sym_sec && *name == '\0')
+  if (sym_sec && name && *name == '\0')
     name = bfd_section_name (sym_sec);
 
   return name;
@@ -2314,10 +2312,12 @@  bfd_elf_print_symbol (bfd *abfd,
 		      bfd_print_symbol_type how)
 {
   FILE *file = (FILE *) filep;
+  const char *symname = symbol->name ? symbol->name : "<null>";
+
   switch (how)
     {
     case bfd_print_symbol_name:
-      fprintf (file, "%s", symbol->name);
+      fprintf (file, "%s", symname);
       break;
     case bfd_print_symbol_more:
       fprintf (file, "elf ");
@@ -2340,11 +2340,10 @@  bfd_elf_print_symbol (bfd *abfd,
 	if (bed->elf_backend_print_symbol_all)
 	  name = (*bed->elf_backend_print_symbol_all) (abfd, filep, symbol);
 
-	if (name == NULL)
-	  {
-	    name = symbol->name;
-	    bfd_print_symbol_vandf (abfd, file, symbol);
-	  }
+	if (name != NULL)
+	  symname = name;
+	else
+	  bfd_print_symbol_vandf (abfd, file, symbol);
 
 	fprintf (file, " %s\t", section_name);
 	/* Print the "other" value for a symbol.  For common symbols,
@@ -2391,7 +2390,7 @@  bfd_elf_print_symbol (bfd *abfd,
 	    fprintf (file, " 0x%02x", (unsigned int) st_other);
 	  }
 
-	fprintf (file, " %s", name);
+	fprintf (file, " %s", symname);
       }
       break;
     }
diff --git a/bfd/pef.c b/bfd/pef.c
index f330b92e821..324adb33d69 100644
--- a/bfd/pef.c
+++ b/bfd/pef.c
@@ -210,16 +210,17 @@  bfd_pef_print_symbol (bfd *abfd,
 		      bfd_print_symbol_type how)
 {
   FILE *file = (FILE *) afile;
+  const char *symname = symbol->name ? symbol->name : "<null>";
 
   switch (how)
     {
     case bfd_print_symbol_name:
-      fprintf (file, "%s", symbol->name);
+      fprintf (file, "%s", symname);
       break;
     default:
       bfd_print_symbol_vandf (abfd, (void *) file, symbol);
-      fprintf (file, " %-5s %s", symbol->section->name, symbol->name);
-      if (startswith (symbol->name, "__traceback_"))
+      fprintf (file, " %-5s %s", symbol->section->name, symname);
+      if (startswith (symname, "__traceback_"))
 	{
 	  unsigned char *buf;
 	  size_t offset = symbol->value + 4;
diff --git a/bfd/syms.c b/bfd/syms.c
index b370a3375d9..816296b68a3 100644
--- a/bfd/syms.c
+++ b/bfd/syms.c
@@ -777,7 +777,7 @@  bfd_symbol_info (asymbol *symbol, symbol_info *ret)
   else
     ret->value = symbol->value + symbol->section->vma;
 
-  ret->name = symbol->name;
+  ret->name = symbol->name ? symbol->name : "<null>";
 }
 
 /*
diff --git a/ld/ldlang.c b/ld/ldlang.c
index 7f9e3d2b119..343c4de53f4 100644
--- a/ld/ldlang.c
+++ b/ld/ldlang.c
@@ -4895,9 +4895,6 @@  ld_is_local_symbol (asymbol * sym)
   if (name == NULL || *name == 0)
     return false;
 
-  if (strcmp (name, "(null)") == 0)
-    return false;
-
   /* Skip .Lxxx and such like.  */
   if (bfd_is_local_label (link_info.output_bfd, sym))
     return false;