elf: Remove fallback to the start of DT_STRTAB for dladdr

Message ID 20220501215049.2143788-1-maskray@google.com
State Committed
Headers
Series elf: Remove fallback to the start of DT_STRTAB for dladdr |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Fangrui Song May 1, 2022, 9:50 p.m. UTC
  When neither DT_HASH nor DT_GNU_HASH is present, the code scans
[DT_SYMTAB, DT_STRTAB). However, there is no guarantee that .dynstr
immediately follows .dynsym (e.g. lld typically places .gnu.version
after .dynsym).

In the absence of a hash table, symbol lookup will always fail
(map->l_nbuckets == 0), so it seems appropriate to just bail out.
---
 elf/dl-addr.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)
  

Comments

Florian Weimer May 2, 2022, 6:56 a.m. UTC | #1
* Fangrui Song via Libc-alpha:

> When neither DT_HASH nor DT_GNU_HASH is present, the code scans
> [DT_SYMTAB, DT_STRTAB). However, there is no guarantee that .dynstr
> immediately follows .dynsym (e.g. lld typically places .gnu.version
> after .dynsym).

The code is compatible with lld because it always generates a hash
table.  Maybe it was added to support old binaries without a hash table.
So we would have to check if such binaries exist from the early
libc.so.6 days (or check if binutils every generated ELF binaries
without a hash table).  The glibc comment isn't clear if this was added
because it was required at the time, or just because it seemed a good
idea.

I couldn't find any binaries with DT_SYMTAB, but without DT_HASH or
DT_GNU_HASH in my collection, but doesn't mean that they don't exist.

Thanks,
Florian
  
Fangrui Song May 2, 2022, 7:04 a.m. UTC | #2
On 2022-05-02, Florian Weimer wrote:
>* Fangrui Song via Libc-alpha:
>
>> When neither DT_HASH nor DT_GNU_HASH is present, the code scans
>> [DT_SYMTAB, DT_STRTAB). However, there is no guarantee that .dynstr
>> immediately follows .dynsym (e.g. lld typically places .gnu.version
>> after .dynsym).
>
>The code is compatible with lld because it always generates a hash
>table.  Maybe it was added to support old binaries without a hash table.
>So we would have to check if such binaries exist from the early
>libc.so.6 days (or check if binutils every generated ELF binaries
>without a hash table).  The glibc comment isn't clear if this was added
>because it was required at the time, or just because it seemed a good
>idea.
>
>I couldn't find any binaries with DT_SYMTAB, but without DT_HASH or
>DT_GNU_HASH in my collection, but doesn't mean that they don't exist.
>
>Thanks,
>Florian

With a linker script .hash and .gnu.hash can be removed.
But such an object behaves as if it has no symbol: symbol search will fail.
It makes sense for dladdr to not return a symbol for it.

% bmake
cc -pipe -g   -fuse-ld=bfd -g -fpic -shared -Wl,--version-script=b.ver b.c -o b.so
cc -pipe -g   -fuse-ld=bfd -g a.c -Wl,--no-as-needed -fno-pie -no-pie -Wl,-rpath=/tmp/d b.so -ldl -o a
% ./a
42

% cat b.lds   # GNU ld doesn't have this yet: https://sourceware.org/bugzilla/show_bug.cgi?id=26404
OVERWRITE_SECTIONS {
   /DISCARD/ : { *(.hash) *(.gnu.hash) }
}
% clang -fpic -fuse-ld=lld -shared b.c -Wl,b.lds -o b.so
% ./a
./a: symbol lookup error: ./a: undefined symbol: var

---

GNU ld doesn't seem to allow discarding both .gnu.hash and .hash:
  /DISCARD/ : { *(.hash) *(.gnu.hash) *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) }

/usr/bin/ld.bfd: could not find section .hash
  
Florian Weimer May 2, 2022, 7:21 a.m. UTC | #3
* Fangrui Song:

> On 2022-05-02, Florian Weimer wrote:
>>* Fangrui Song via Libc-alpha:
>>
>>> When neither DT_HASH nor DT_GNU_HASH is present, the code scans
>>> [DT_SYMTAB, DT_STRTAB). However, there is no guarantee that .dynstr
>>> immediately follows .dynsym (e.g. lld typically places .gnu.version
>>> after .dynsym).
>>
>>The code is compatible with lld because it always generates a hash
>>table.  Maybe it was added to support old binaries without a hash table.
>>So we would have to check if such binaries exist from the early
>>libc.so.6 days (or check if binutils every generated ELF binaries
>>without a hash table).  The glibc comment isn't clear if this was added
>>because it was required at the time, or just because it seemed a good
>>idea.
>>
>>I couldn't find any binaries with DT_SYMTAB, but without DT_HASH or
>>DT_GNU_HASH in my collection, but doesn't mean that they don't exist.
>>
>>Thanks,
>>Florian
>
> With a linker script .hash and .gnu.hash can be removed.
> But such an object behaves as if it has no symbol: symbol search will fail.
> It makes sense for dladdr to not return a symbol for it.
>
> % bmake
> cc -pipe -g   -fuse-ld=bfd -g -fpic -shared -Wl,--version-script=b.ver b.c -o b.so
> cc -pipe -g   -fuse-ld=bfd -g a.c -Wl,--no-as-needed -fno-pie -no-pie -Wl,-rpath=/tmp/d b.so -ldl -o a
> % ./a
> 42
>
> % cat b.lds   # GNU ld doesn't have this yet: https://sourceware.org/bugzilla/show_bug.cgi?id=26404
> OVERWRITE_SECTIONS {
>   /DISCARD/ : { *(.hash) *(.gnu.hash) }
> }
> % clang -fpic -fuse-ld=lld -shared b.c -Wl,b.lds -o b.so
> % ./a
> ./a: symbol lookup error: ./a: undefined symbol: var

This looks like it might be an lld bug.  DT_HASH is mandatory in the ELF
specification.  We ignore that requirement in the GNU ABI and use
DT_GNU_HASH instead, even for static PIE binaries.

Do you want to drop symbol tables from static PIE binaries?

Thanks,
Florian
  
Fangrui Song May 2, 2022, 7:30 a.m. UTC | #4
On 2022-05-02, Florian Weimer wrote:
>* Fangrui Song:
>
>> On 2022-05-02, Florian Weimer wrote:
>>>* Fangrui Song via Libc-alpha:
>>>
>>>> When neither DT_HASH nor DT_GNU_HASH is present, the code scans
>>>> [DT_SYMTAB, DT_STRTAB). However, there is no guarantee that .dynstr
>>>> immediately follows .dynsym (e.g. lld typically places .gnu.version
>>>> after .dynsym).
>>>
>>>The code is compatible with lld because it always generates a hash
>>>table.  Maybe it was added to support old binaries without a hash table.
>>>So we would have to check if such binaries exist from the early
>>>libc.so.6 days (or check if binutils every generated ELF binaries
>>>without a hash table).  The glibc comment isn't clear if this was added
>>>because it was required at the time, or just because it seemed a good
>>>idea.
>>>
>>>I couldn't find any binaries with DT_SYMTAB, but without DT_HASH or
>>>DT_GNU_HASH in my collection, but doesn't mean that they don't exist.
>>>
>>>Thanks,
>>>Florian
>>
>> With a linker script .hash and .gnu.hash can be removed.
>> But such an object behaves as if it has no symbol: symbol search will fail.
>> It makes sense for dladdr to not return a symbol for it.
>>
>> % bmake
>> cc -pipe -g   -fuse-ld=bfd -g -fpic -shared -Wl,--version-script=b.ver b.c -o b.so
>> cc -pipe -g   -fuse-ld=bfd -g a.c -Wl,--no-as-needed -fno-pie -no-pie -Wl,-rpath=/tmp/d b.so -ldl -o a
>> % ./a
>> 42
>>
>> % cat b.lds   # GNU ld doesn't have this yet: https://sourceware.org/bugzilla/show_bug.cgi?id=26404
>> OVERWRITE_SECTIONS {
>>   /DISCARD/ : { *(.hash) *(.gnu.hash) }
>> }
>> % clang -fpic -fuse-ld=lld -shared b.c -Wl,b.lds -o b.so
>> % ./a
>> ./a: symbol lookup error: ./a: undefined symbol: var
>
>This looks like it might be an lld bug.  DT_HASH is mandatory in the ELF
>specification.  We ignore that requirement in the GNU ABI and use
>DT_GNU_HASH instead, even for static PIE binaries.

I am not sure supporting discarding .hash/.gnu.hash is a bug.
The linker can even discard .dynsym/.dynstr/.dynamic .

In the Linux kernel, there is something like

         /DISCARD/ : {
                 *(.interp .dynamic)
                 *(.dynsym .dynstr .hash .gnu.hash)
         }

If we argue that DT_HASH is mandatory, then --hash-style=gnu breaks the ELF
specification as well.

My point is that a normal dynamic linking object always has either .hash or .gnu.hash .
If neither is present, it seems pretty fair for dladdr to not give a symbol,
just like symbol search (including relocation resolving and dl(v)sym) won't give any result.

>Do you want to drop symbol tables from static PIE binaries?

I don't have the plan.
  
Florian Weimer May 2, 2022, 8:26 a.m. UTC | #5
* Fangrui Song:

> If we argue that DT_HASH is mandatory, then --hash-style=gnu breaks the ELF
> specification as well.

For the GNU ABI, one of them must be available for dynamic linking.

> My point is that a normal dynamic linking object always has either
> .hash or .gnu.hash .  If neither is present, it seems pretty fair for
> dladdr to not give a symbol, just like symbol search (including
> relocation resolving and dl(v)sym) won't give any result.

Okay, let me have another look at the patch then.

Thanks,
Florian
  
Florian Weimer May 2, 2022, 8:34 a.m. UTC | #6
* Fangrui Song via Libc-alpha:

> +  /* In the absence of a hash table, as if the object has no symbol.  */

Could you rephrase that a bit, maybe “treat the object as if it has no
symbol”?

Rest looks okay to me.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian
  
Fangrui Song May 2, 2022, 4:06 p.m. UTC | #7
On 2022-05-02, Florian Weimer wrote:
>* Fangrui Song via Libc-alpha:
>
>> +  /* In the absence of a hash table, as if the object has no symbol.  */
>
>Could you rephrase that a bit, maybe “treat the object as if it has no
>symbol”?
>
>Rest looks okay to me.
>
>Reviewed-by: Florian Weimer <fweimer@redhat.com>
>
>Thanks,
>Florian
>

Thanks for the suggestion!
  

Patch

diff --git a/elf/dl-addr.c b/elf/dl-addr.c
index e3c5598e1a..6f5f8eac5c 100644
--- a/elf/dl-addr.c
+++ b/elf/dl-addr.c
@@ -71,18 +71,10 @@  determine_info (const ElfW(Addr) addr, struct link_map *match, Dl_info *info,
 	    }
 	}
     }
-  else
+  else if (match->l_info[DT_HASH] != NULL)
     {
-      const ElfW(Sym) *symtabend;
-      if (match->l_info[DT_HASH] != NULL)
-	symtabend = (symtab
-		     + ((Elf_Symndx *) D_PTR (match, l_info[DT_HASH]))[1]);
-      else
-	/* There is no direct way to determine the number of symbols in the
-	   dynamic symbol table and no hash table is present.  The ELF
-	   binary is ill-formed but what shall we do?  Use the beginning of
-	   the string table which generally follows the symbol table.  */
-	symtabend = (const ElfW(Sym) *) strtab;
+      const ElfW (Sym) *symtabend
+	  = (symtab + ((Elf_Symndx *) D_PTR (match, l_info[DT_HASH]))[1]);
 
       for (; (void *) symtab < (void *) symtabend; ++symtab)
 	if ((ELFW(ST_BIND) (symtab->st_info) == STB_GLOBAL
@@ -96,6 +88,7 @@  determine_info (const ElfW(Addr) addr, struct link_map *match, Dl_info *info,
 	    && symtab->st_name < strtabsize)
 	  matchsym = (ElfW(Sym) *) symtab;
     }
+  /* In the absence of a hash table, as if the object has no symbol.  */
 
   if (mapp)
     *mapp = match;