elf: Fix dlclose of an empty namespace in auditing mode (bug 26076)

Message ID 87zh9kuuw1.fsf@oldenburg2.str.redhat.com
State Superseded
Headers
Series elf: Fix dlclose of an empty namespace in auditing mode (bug 26076) |

Commit Message

Florian Weimer June 3, 2020, 1:43 p.m. UTC
  ns->_ns_loaded is NULL if nothing has been loaded into the namespace.

It seems difficult to hit this bug reliably, so this change does not
come with a test case.  It was trigger by accident, due to TLS
exhaustion.

---
 elf/dl-close.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Carlos O'Donell June 3, 2020, 8:28 p.m. UTC | #1
On 6/3/20 9:43 AM, Florian Weimer via Libc-alpha wrote:
> ns->_ns_loaded is NULL if nothing has been loaded into the namespace.
> 
> It seems difficult to hit this bug reliably, so this change does not
> come with a test case.  It was trigger by accident, due to TLS
> exhaustion.

I think this should fail catastrophically and quickly.

> ---
>  elf/dl-close.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/elf/dl-close.c b/elf/dl-close.c
> index 73b2817bbf..896e59e42e 100644
> --- a/elf/dl-close.c
> +++ b/elf/dl-close.c
> @@ -782,7 +782,7 @@ _dl_close_worker (struct link_map *map, bool force)
>      {
>        struct link_map *head = ns->_ns_loaded;
>        /* Do not call the functions for any auditing object.  */
> -      if (head->l_auditing == 0)
> +      if (head != NULL && head->l_auditing == 0)
>  	{
>  	  struct audit_ifaces *afct = GLRO(dl_audit);
>  	  for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
> 

Use _dl_signal_error to indicate an internal error?
  
Florian Weimer June 3, 2020, 9 p.m. UTC | #2
* Carlos O'Donell:

> On 6/3/20 9:43 AM, Florian Weimer via Libc-alpha wrote:
>> ns->_ns_loaded is NULL if nothing has been loaded into the namespace.
>> 
>> It seems difficult to hit this bug reliably, so this change does not
>> come with a test case.  It was trigger by accident, due to TLS
>> exhaustion.
>
> I think this should fail catastrophically and quickly.

Why should this be fatal to the process?

>>  elf/dl-close.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/elf/dl-close.c b/elf/dl-close.c
>> index 73b2817bbf..896e59e42e 100644
>> --- a/elf/dl-close.c
>> +++ b/elf/dl-close.c
>> @@ -782,7 +782,7 @@ _dl_close_worker (struct link_map *map, bool force)
>>      {
>>        struct link_map *head = ns->_ns_loaded;
>>        /* Do not call the functions for any auditing object.  */
>> -      if (head->l_auditing == 0)
>> +      if (head != NULL && head->l_auditing == 0)
>>  	{
>>  	  struct audit_ifaces *afct = GLRO(dl_audit);
>>  	  for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
>> 
>
> Use _dl_signal_error to indicate an internal error?

This *is* on the cleanup path after an error already happened.  This is
the backtrace I saw:

(gdb) bt
#0  _dl_close_worker (map=<optimized out>, force=force@entry=true)
    at dl-close.c:785
#1  0x00007fbe468d3eb3 in _dl_open (file=0x7ffde8ed3f30 "\253\003\aE\276\177", 
    mode=mode@entry=-1946157055, 
    caller_dlopen=caller_dlopen@entry=0x7fbe468c2330 <dl_main>, nsid=9, 
    nsid@entry=-1, argc=1, argv=<optimized out>, env=0x7ffde8ed4550)
    at dl-open.c:906
#2  0x00007fbe468c120e in dlmopen_doit (a=a@entry=0x7ffde8ed41e0) at rtld.c:660
#3  0x00007fbe468dbd5e in _dl_catch_exception (
    exception=exception@entry=0x7ffde8ed4120, 
    operate=operate@entry=0x7fbe468c11d0 <dlmopen_doit>, 
    args=args@entry=0x7ffde8ed41e0) at dl-error-skeleton.c:208
#4  0x00007fbe468dbe03 in _dl_catch_error (objname=objname@entry=0x7ffde8ed41d0, 
    errstring=errstring@entry=0x7ffde8ed41d8, 
    mallocedp=mallocedp@entry=0x7ffde8ed41cf, 
    operate=operate@entry=0x7fbe468c11d0 <dlmopen_doit>, 
    args=args@entry=0x7ffde8ed41e0) at dl-error-skeleton.c:227
#5  0x00007fbe468c48e4 in load_audit_module (last_audit=<synthetic pointer>, 
    name=0x7ffde8ed42b8 "tst-auditmanymod9.so") at rtld.c:973
#6  load_audit_modules (audit_list=0x7ffde8ed4220, main_map=<optimized out>)
    at rtld.c:1114
#7  dl_main (phdr=<optimized out>, phnum=<optimized out>, 
    user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1679
#8  0x00007fbe468da84b in _dl_sysdep_start (
    start_argptr=start_argptr@entry=0x7ffde8ed4520, 
    dl_main=dl_main@entry=0x7fbe468c2330 <dl_main>) at ../elf/dl-sysdep.c:252
#9  0x00007fbe468c1e81 in _dl_start_final (arg=0x7ffde8ed4520) at rtld.c:489
#10 _dl_start (arg=0x7ffde8ed4520) at rtld.c:582
#11 0x00007fbe468c1098 in _start ()
   from /home/fweimer/src/gnu/glibc/build/elf/ld-linux-x86-64.so.2

(gdb) print exception 
$1 = {objname = 0x7fbe450703ab "/home/fweimer/src/gnu/glibc/build/libc.so.6", 
  errstring = 0x7fbe45070380 "cannot allocate memory in static TLS block", 
  message_buffer = 0x0}

We cannot report another error at this point.

Thanks,
Florian
  
Carlos O'Donell June 3, 2020, 10:06 p.m. UTC | #3
On 6/3/20 5:00 PM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> On 6/3/20 9:43 AM, Florian Weimer via Libc-alpha wrote:
>>> ns->_ns_loaded is NULL if nothing has been loaded into the namespace.
>>>
>>> It seems difficult to hit this bug reliably, so this change does not
>>> come with a test case.  It was trigger by accident, due to TLS
>>> exhaustion.
>>
>> I think this should fail catastrophically and quickly.
> 
> Why should this be fatal to the process?

Your backtrace does a good job explaining how you got to this point.
I appreciate it, it is very illuminating.

>>>  elf/dl-close.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/elf/dl-close.c b/elf/dl-close.c
>>> index 73b2817bbf..896e59e42e 100644
>>> --- a/elf/dl-close.c
>>> +++ b/elf/dl-close.c
>>> @@ -782,7 +782,7 @@ _dl_close_worker (struct link_map *map, bool force)
>>>      {
>>>        struct link_map *head = ns->_ns_loaded;
>>>        /* Do not call the functions for any auditing object.  */
>>> -      if (head->l_auditing == 0)
>>> +      if (head != NULL && head->l_auditing == 0)
>>>  	{
>>>  	  struct audit_ifaces *afct = GLRO(dl_audit);
>>>  	  for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
>>>
>>
>> Use _dl_signal_error to indicate an internal error?
> 
> This *is* on the cleanup path after an error already happened.  This is
> the backtrace I saw:

I'm going to ignore the oddity of auditors auditing themselves, I guess it's
logically consistent that the first loaded auditor won't see any of this, but
that subsequently loaded auditors will.

Once auditor loading fails in your case we begin unloading that object. What I
don't understand though is why is GLRO(dl_naudit) non-zero?

If this is the first loaded auditor we don't increment GLRO(dl_naudit) until
after we return from load_audit_module, so do_audit is false, and we never
get here

If this is the second loaded auditor we do set do_audit to true, but this
means we already dereferenced ns->_ns_loaded->l_auditing already once to
compute do_audit. What has subsequently happened to ns->_ns_loaded? For it
to be null means we removed *every* object from the namespace, but we just
established we had at least one already loaded successfully?

Why would we remove more than just the current object that failed to load?

Why would the current audit modules failure result in ns->_ns_loaded to be null?

I *do* see other code that checks for nulls, for example:

elf/dl-object.c:

 28 /* Add the new link_map NEW to the end of the namespace list.  */
 29 void
 30 _dl_add_to_namespace_list (struct link_map *new, Lmid_t nsid)
 31 {
 32   /* We modify the list of loaded objects.  */
 33   __rtld_lock_lock_recursive (GL(dl_load_write_lock));
 34 
 35   if (GL(dl_ns)[nsid]._ns_loaded != NULL)
 36     {
 37       struct link_map *l = GL(dl_ns)[nsid]._ns_loaded;
 38       while (l->l_next != NULL)
 39         l = l->l_next;
 40       new->l_prev = l;
 41       /* new->l_next = NULL;   Would be necessary but we use calloc.  */
 42       l->l_next = new;
 43     }
 44   else
 45     GL(dl_ns)[nsid]._ns_loaded = new;

Here we check to see if _ns_loaded is non-null and so something different,
but here we might expect that it's the first link map being added to this
namespace.

I'd like to hear what you think might be going wrong here.

 
> (gdb) bt
> #0  _dl_close_worker (map=<optimized out>, force=force@entry=true)
>     at dl-close.c:785
> #1  0x00007fbe468d3eb3 in _dl_open (file=0x7ffde8ed3f30 "\253\003\aE\276\177", 
>     mode=mode@entry=-1946157055, 
>     caller_dlopen=caller_dlopen@entry=0x7fbe468c2330 <dl_main>, nsid=9, 
>     nsid@entry=-1, argc=1, argv=<optimized out>, env=0x7ffde8ed4550)
>     at dl-open.c:906
> #2  0x00007fbe468c120e in dlmopen_doit (a=a@entry=0x7ffde8ed41e0) at rtld.c:660
> #3  0x00007fbe468dbd5e in _dl_catch_exception (
>     exception=exception@entry=0x7ffde8ed4120, 
>     operate=operate@entry=0x7fbe468c11d0 <dlmopen_doit>, 
>     args=args@entry=0x7ffde8ed41e0) at dl-error-skeleton.c:208
> #4  0x00007fbe468dbe03 in _dl_catch_error (objname=objname@entry=0x7ffde8ed41d0, 
>     errstring=errstring@entry=0x7ffde8ed41d8, 
>     mallocedp=mallocedp@entry=0x7ffde8ed41cf, 
>     operate=operate@entry=0x7fbe468c11d0 <dlmopen_doit>, 
>     args=args@entry=0x7ffde8ed41e0) at dl-error-skeleton.c:227
> #5  0x00007fbe468c48e4 in load_audit_module (last_audit=<synthetic pointer>, 
>     name=0x7ffde8ed42b8 "tst-auditmanymod9.so") at rtld.c:973
> #6  load_audit_modules (audit_list=0x7ffde8ed4220, main_map=<optimized out>)
>     at rtld.c:1114
> #7  dl_main (phdr=<optimized out>, phnum=<optimized out>, 
>     user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1679
> #8  0x00007fbe468da84b in _dl_sysdep_start (
>     start_argptr=start_argptr@entry=0x7ffde8ed4520, 
>     dl_main=dl_main@entry=0x7fbe468c2330 <dl_main>) at ../elf/dl-sysdep.c:252
> #9  0x00007fbe468c1e81 in _dl_start_final (arg=0x7ffde8ed4520) at rtld.c:489
> #10 _dl_start (arg=0x7ffde8ed4520) at rtld.c:582
> #11 0x00007fbe468c1098 in _start ()
>    from /home/fweimer/src/gnu/glibc/build/elf/ld-linux-x86-64.so.2
> 
> (gdb) print exception 
> $1 = {objname = 0x7fbe450703ab "/home/fweimer/src/gnu/glibc/build/libc.so.6", 
>   errstring = 0x7fbe45070380 "cannot allocate memory in static TLS block", 
>   message_buffer = 0x0}
> 
> We cannot report another error at this point.

Agreed. Thanks.
  
Florian Weimer June 4, 2020, 12:57 p.m. UTC | #4
* Carlos O'Donell:

>>>>  elf/dl-close.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/elf/dl-close.c b/elf/dl-close.c
>>>> index 73b2817bbf..896e59e42e 100644
>>>> --- a/elf/dl-close.c
>>>> +++ b/elf/dl-close.c
>>>> @@ -782,7 +782,7 @@ _dl_close_worker (struct link_map *map, bool force)
>>>>      {
>>>>        struct link_map *head = ns->_ns_loaded;
>>>>        /* Do not call the functions for any auditing object.  */
>>>> -      if (head->l_auditing == 0)
>>>> +      if (head != NULL && head->l_auditing == 0)
>>>>  	{
>>>>  	  struct audit_ifaces *afct = GLRO(dl_audit);
>>>>  	  for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
>>>>
>>>
>>> Use _dl_signal_error to indicate an internal error?
>> 
>> This *is* on the cleanup path after an error already happened.  This is
>> the backtrace I saw:
>
> I'm going to ignore the oddity of auditors auditing themselves, I guess it's
> logically consistent that the first loaded auditor won't see any of this, but
> that subsequently loaded auditors will.
>
> Once auditor loading fails in your case we begin unloading that object. What I
> don't understand though is why is GLRO(dl_naudit) non-zero?
>
> If this is the first loaded auditor we don't increment GLRO(dl_naudit) until
> after we return from load_audit_module, so do_audit is false, and we never
> get here

Agreed.

> If this is the second loaded auditor we do set do_audit to true, but this
> means we already dereferenced ns->_ns_loaded->l_auditing already once to
> compute do_audit. What has subsequently happened to ns->_ns_loaded? For it
> to be null means we removed *every* object from the namespace, but we just
> established we had at least one already loaded successfully?

It looks like the namespace is not actually empty initially:

(gdb) print _rtld_global._dl_ns[9]
$10 = {_ns_loaded = 0x7ffff7ee7590, _ns_nloaded = 3, 
  _ns_main_searchlist = 0x0, _ns_global_scope_alloc = 0, 
  _ns_global_scope_pending_adds = 0, libc_map = 0x0, _ns_unique_sym_table = {
    lock = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, 
          __nusers = 0, __kind = 1, __spins = 0, __elision = 0, __list = {
            __prev = 0x0, __next = 0x0}}, 
        __size = '\000' <repeats 16 times>, "\001", '\000' <repeats 22 times>, 
        __align = 0}}, entries = 0x0, size = 0, n_elements = 0, free = 0x0}, 
  _ns_debug = {r_version = 1, r_map = 0x7ffff7ee7590, r_brk = 140737351924736, 
    r_state = RT_CONSISTENT, r_ldbase = 140737351860224}}

(gdb) print _rtld_global._dl_ns[9]._ns_loaded->l_name
$13 = 0x7ffff7ee7570 "./elf/tst-auditmanymod9.so"
(gdb) print _rtld_global._dl_ns[9]._ns_loaded->l_next->l_name
$14 = 0x7ffff7ee7b70 "./libc.so.6"
(gdb) print _rtld_global._dl_ns[9]._ns_loaded->l_next->l_next->l_name
$15 = 0x7ffff7ee8140 "./elf/ld-linux-x86-64.so.2"

So the commit message is wrong: the namespace doesn't start out as
empty, it becomes empty during dlclose.  It should be fairly easy to
write a test for this.  I'll post a new patch shortly.

Thanks,
Florian
  

Patch

diff --git a/elf/dl-close.c b/elf/dl-close.c
index 73b2817bbf..896e59e42e 100644
--- a/elf/dl-close.c
+++ b/elf/dl-close.c
@@ -782,7 +782,7 @@  _dl_close_worker (struct link_map *map, bool force)
     {
       struct link_map *head = ns->_ns_loaded;
       /* Do not call the functions for any auditing object.  */
-      if (head->l_auditing == 0)
+      if (head != NULL && head->l_auditing == 0)
 	{
 	  struct audit_ifaces *afct = GLRO(dl_audit);
 	  for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)