[PATCHv2] Increase robustness of internal dlopen() by using RTLD_NOW [BZ #22766]

Message ID 20180425192637.5977-1-tuliom@linux.ibm.com
State Superseded
Delegated to: Carlos O'Donell
Headers

Commit Message

Tulio Magno Quites Machado Filho April 25, 2018, 7:26 p.m. UTC
  Carlos O'Donell <carlos@redhat.com> writes:

> On 04/25/2018 07:42 AM, Tulio Magno Quites Machado Filho wrote:
>> Prevent random runtime crashes due to missing symbols caused by mixed
>> libnss_* versions.
>
> You are arguing that this should fail at the first call into the NSS
> service, which causes the initial dlopen? That is OK with me.

Yes.  I'm agreeing with Szabolcs' proposal [1].

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=22766

> An even further patch would be to load all NSS modules early, and that's
> something Florian and I have discussed. However, I'm not recommending
> you do that now, but I wanted to be clear about a direction that this
> code might take.

Ack.  Makes sense.

> We should cleanup more.
>
> sysdeps/gnu/unwind-resume.c:  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
> sysdeps/nptl/unwind-forcedunwind.c:  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);

Done.

> Since we are now using RTLD_NOW:
>
> * Switch back to __libc_dlopen (effectively reverting part of 
>   Florian's 08c6e95234c, but leave the comments)

Done.

> * Add a big comment in dlfcn.h to explain why RTLD_NOW is needed:
>   * Same comment as in commit 08c6e95234c, plus NSS case.

Done.

--- 8< ---

Prevent random runtime crashes due to missing symbols caused by mixed
libnss_* versions.

2018-04-25  Tulio Magno Quites Machado Filho  <tuliom@linux.ibm.com>

	[BZ #22766]
	* include/dlfcn.h [__libc_dl_open]: Replace RTLD_LAZY with RTLD_NOW.
	* sysdeps/gnu/unwind-resume.c (__lib_gcc_s_init): Replace
	__libc_dlopen_mode() using RTLD_NOW with __libc_dlopen.
	* sysdeps/nptl/unwind-forcedunwind.c: Likewise.

Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
---
 include/dlfcn.h                    | 7 ++++++-
 sysdeps/gnu/unwind-resume.c        | 2 +-
 sysdeps/nptl/unwind-forcedunwind.c | 3 ++-
 3 files changed, 9 insertions(+), 3 deletions(-)
  

Comments

Carlos O'Donell April 26, 2018, 1:01 a.m. UTC | #1
On 04/25/2018 03:26 PM, Tulio Magno Quites Machado Filho wrote:
> Carlos O'Donell <carlos@redhat.com> writes:
> 
>> On 04/25/2018 07:42 AM, Tulio Magno Quites Machado Filho wrote:
>>> Prevent random runtime crashes due to missing symbols caused by mixed
>>> libnss_* versions.
>>
>> You are arguing that this should fail at the first call into the NSS
>> service, which causes the initial dlopen? That is OK with me.
> 
> Yes.  I'm agreeing with Szabolcs' proposal [1].
> 
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=22766
> 
>> An even further patch would be to load all NSS modules early, and that's
>> something Florian and I have discussed. However, I'm not recommending
>> you do that now, but I wanted to be clear about a direction that this
>> code might take.
> 
> Ack.  Makes sense.
> 
>> We should cleanup more.
>>
>> sysdeps/gnu/unwind-resume.c:  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
>> sysdeps/nptl/unwind-forcedunwind.c:  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
> 
> Done.
> 
>> Since we are now using RTLD_NOW:
>>
>> * Switch back to __libc_dlopen (effectively reverting part of 
>>   Florian's 08c6e95234c, but leave the comments)
> 
> Done.
> 
>> * Add a big comment in dlfcn.h to explain why RTLD_NOW is needed:
>>   * Same comment as in commit 08c6e95234c, plus NSS case.
> 
> Done.
> 
> --- 8< ---
> 
> Prevent random runtime crashes due to missing symbols caused by mixed
> libnss_* versions.
> 
> 2018-04-25  Tulio Magno Quites Machado Filho  <tuliom@linux.ibm.com>
> 
> 	[BZ #22766]
> 	* include/dlfcn.h [__libc_dl_open]: Replace RTLD_LAZY with RTLD_NOW.
> 	* sysdeps/gnu/unwind-resume.c (__lib_gcc_s_init): Replace
> 	__libc_dlopen_mode() using RTLD_NOW with __libc_dlopen.
> 	* sysdeps/nptl/unwind-forcedunwind.c: Likewise.
> 
> Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>

OK with the changes below.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  include/dlfcn.h                    | 7 ++++++-
>  sysdeps/gnu/unwind-resume.c        | 2 +-
>  sysdeps/nptl/unwind-forcedunwind.c | 3 ++-
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/include/dlfcn.h b/include/dlfcn.h
> index 12ef913e19..558c9bfe17 100644
> --- a/include/dlfcn.h
> +++ b/include/dlfcn.h
> @@ -31,8 +31,13 @@ extern char **__libc_argv attribute_hidden;
>  
>  /* Now define the internal interfaces.  */
>  
> +/* Use RTLD_NOW here because:
> +   1. For consistency between __libgcc_s_init and pthread_cancel_init;
> +   2. It allows libnss_* to provide a fallback code at dlopen() time in order
> +      to prevent missing symbols caused by mixed glibc versions.  Using
> +      RTLD_LAZY here could cause random failures at runtime.  */

Suggest:

/* Use RTLD_NOW here because:
   1. In pthread_cancel_init we want to use RTLD_NOW to reduce the stack usage
      of future cancellation operations, particularly when the target thread
      is running with a small stack. Likewise for consistency we do the same
      thing in __libgcc_s_init. RTLD_NOW will rarely make a difference for
      __libgcc_s_init because unwinding is already in progress, so libgcc_s.so
      has already been loaded if its unwinder is used (Bug 22636).
   2. It allows us to provide robust fallback code at dlopen time for incorrectly 
      configured systems that mix old libnss_* modules with newly installed
      libraries e.g. old libnss_nis.so.2 with new libnsl.so.1.  Using RTLD_LAZY
      here causes a failure at the time the symbol is called and at that point
      it is much harder to safely return an error (Bug 22766).

   The use of RTLD_NOW also impacts gconv module loading, backtracing (where the
   unwinder form libgcc_s.so is used), and IDNA functions (which load libidn), all
   of which load their respective DSOs on demand, and so should not impact program
   startup.  That is to say that the DSOs are loaded as part of an API call and
   therefore we will be calling that family of API functions shortly so RTLD_NOW or
   RTLD_LAZY is not a big difference in performance, but RTLD_NOW has better error
   handling semantics for the library.  */

>  #define __libc_dlopen(name) \
> -  __libc_dlopen_mode (name, RTLD_LAZY | __RTLD_DLOPEN)
> +  __libc_dlopen_mode (name, RTLD_NOW | __RTLD_DLOPEN)
>  extern void *__libc_dlopen_mode  (const char *__name, int __mode);
>  extern void *__libc_dlsym   (void *__map, const char *__name);
>  extern void *__libc_dlvsym (void *map, const char *name, const char *version);
> diff --git a/sysdeps/gnu/unwind-resume.c b/sysdeps/gnu/unwind-resume.c
> index 7f9a1bf2c7..72a04c7969 100644
> --- a/sysdeps/gnu/unwind-resume.c
> +++ b/sysdeps/gnu/unwind-resume.c
> @@ -39,7 +39,7 @@ __libgcc_s_init (void)
>       RTLD_NOW will rarely make a difference here because unwinding is
>       already in progress, so libgcc_s.so has already been loaded if
>       its unwinder is used.  */

^^^ This should be adjusted to:

/* See include/dlfcn.h. Use of __libc_dlopen requires RTLD_NOW.  */

> -  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
> +  handle = __libc_dlopen (LIBGCC_S_SO);
>  
>    if (handle == NULL
>        || (resume = __libc_dlsym (handle, "_Unwind_Resume")) == NULL
> diff --git a/sysdeps/nptl/unwind-forcedunwind.c b/sysdeps/nptl/unwind-forcedunwind.c
> index 67b8e74b53..dbd8d66cda 100644
> --- a/sysdeps/nptl/unwind-forcedunwind.c
> +++ b/sysdeps/nptl/unwind-forcedunwind.c
> @@ -49,7 +49,8 @@ pthread_cancel_init (void)
>        return;
>      }
>  
> -  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
> +  /* __libgcc_s_init depends on RTLD_NOW being used here.  */

Suggest:

/* See include/dlfcn.h. Use of __libc_dlopen requires RTLD_NOW.  */

> +  handle = __libc_dlopen (LIBGCC_S_SO);
>  
>    if (handle == NULL
>        || (resume = __libc_dlsym (handle, "_Unwind_Resume")) == NULL
>
  
Tulio Magno Quites Machado Filho April 26, 2018, 4:57 p.m. UTC | #2
Carlos O'Donell <carlos@redhat.com> writes:

> On 04/25/2018 03:26 PM, Tulio Magno Quites Machado Filho wrote:
>> 
>> Prevent random runtime crashes due to missing symbols caused by mixed
>> libnss_* versions.
>> 
>> 2018-04-25  Tulio Magno Quites Machado Filho  <tuliom@linux.ibm.com>
>> 
>> 	[BZ #22766]
>> 	* include/dlfcn.h [__libc_dl_open]: Replace RTLD_LAZY with RTLD_NOW.
>> 	* sysdeps/gnu/unwind-resume.c (__lib_gcc_s_init): Replace
>> 	__libc_dlopen_mode() using RTLD_NOW with __libc_dlopen.
>> 	* sysdeps/nptl/unwind-forcedunwind.c: Likewise.
>> 
>> Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
>
> OK with the changes below.
>
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>

Ack.

>> diff --git a/include/dlfcn.h b/include/dlfcn.h
>> index 12ef913e19..558c9bfe17 100644
>> --- a/include/dlfcn.h
>> +++ b/include/dlfcn.h
>> @@ -31,8 +31,13 @@ extern char **__libc_argv attribute_hidden;
>>  
>>  /* Now define the internal interfaces.  */
>>  
>> +/* Use RTLD_NOW here because:
>> +   1. For consistency between __libgcc_s_init and pthread_cancel_init;
>> +   2. It allows libnss_* to provide a fallback code at dlopen() time in order
>> +      to prevent missing symbols caused by mixed glibc versions.  Using
>> +      RTLD_LAZY here could cause random failures at runtime.  */
>
> Suggest:
>
> /* Use RTLD_NOW here because:
>    1. In pthread_cancel_init we want to use RTLD_NOW to reduce the stack usage
>       of future cancellation operations, particularly when the target thread
>       is running with a small stack. Likewise for consistency we do the same
>       thing in __libgcc_s_init. RTLD_NOW will rarely make a difference for
>       __libgcc_s_init because unwinding is already in progress, so libgcc_s.so
>       has already been loaded if its unwinder is used (Bug 22636).
>    2. It allows us to provide robust fallback code at dlopen time for incorrectly 
>       configured systems that mix old libnss_* modules with newly installed
>       libraries e.g. old libnss_nis.so.2 with new libnsl.so.1.  Using RTLD_LAZY
>       here causes a failure at the time the symbol is called and at that point
>       it is much harder to safely return an error (Bug 22766).
>
>    The use of RTLD_NOW also impacts gconv module loading, backtracing (where the
>    unwinder form libgcc_s.so is used), and IDNA functions (which load libidn), all
>    of which load their respective DSOs on demand, and so should not impact program
>    startup.  That is to say that the DSOs are loaded as part of an API call and
>    therefore we will be calling that family of API functions shortly so RTLD_NOW or
>    RTLD_LAZY is not a big difference in performance, but RTLD_NOW has better error
>    handling semantics for the library.  */

Much better!  :-D

>>  #define __libc_dlopen(name) \
>> -  __libc_dlopen_mode (name, RTLD_LAZY | __RTLD_DLOPEN)
>> +  __libc_dlopen_mode (name, RTLD_NOW | __RTLD_DLOPEN)
>>  extern void *__libc_dlopen_mode  (const char *__name, int __mode);
>>  extern void *__libc_dlsym   (void *__map, const char *__name);
>>  extern void *__libc_dlvsym (void *map, const char *name, const char *version);
>> diff --git a/sysdeps/gnu/unwind-resume.c b/sysdeps/gnu/unwind-resume.c
>> index 7f9a1bf2c7..72a04c7969 100644
>> --- a/sysdeps/gnu/unwind-resume.c
>> +++ b/sysdeps/gnu/unwind-resume.c
>> @@ -39,7 +39,7 @@ __libgcc_s_init (void)
>>       RTLD_NOW will rarely make a difference here because unwinding is
>>       already in progress, so libgcc_s.so has already been loaded if
>>       its unwinder is used.  */
>
> ^^^ This should be adjusted to:
>
> /* See include/dlfcn.h. Use of __libc_dlopen requires RTLD_NOW.  */

Fixed.

>> -  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
>> +  handle = __libc_dlopen (LIBGCC_S_SO);
>>  
>>    if (handle == NULL
>>        || (resume = __libc_dlsym (handle, "_Unwind_Resume")) == NULL
>> diff --git a/sysdeps/nptl/unwind-forcedunwind.c b/sysdeps/nptl/unwind-forcedunwind.c
>> index 67b8e74b53..dbd8d66cda 100644
>> --- a/sysdeps/nptl/unwind-forcedunwind.c
>> +++ b/sysdeps/nptl/unwind-forcedunwind.c
>> @@ -49,7 +49,8 @@ pthread_cancel_init (void)
>>        return;
>>      }
>>  
>> -  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
>> +  /* __libgcc_s_init depends on RTLD_NOW being used here.  */
>
> Suggest:
>
> /* See include/dlfcn.h. Use of __libc_dlopen requires RTLD_NOW.  */

Fixed.

I'm pushing it.

Thanks!
  

Patch

diff --git a/include/dlfcn.h b/include/dlfcn.h
index 12ef913e19..558c9bfe17 100644
--- a/include/dlfcn.h
+++ b/include/dlfcn.h
@@ -31,8 +31,13 @@  extern char **__libc_argv attribute_hidden;
 
 /* Now define the internal interfaces.  */
 
+/* Use RTLD_NOW here because:
+   1. For consistency between __libgcc_s_init and pthread_cancel_init;
+   2. It allows libnss_* to provide a fallback code at dlopen() time in order
+      to prevent missing symbols caused by mixed glibc versions.  Using
+      RTLD_LAZY here could cause random failures at runtime.  */
 #define __libc_dlopen(name) \
-  __libc_dlopen_mode (name, RTLD_LAZY | __RTLD_DLOPEN)
+  __libc_dlopen_mode (name, RTLD_NOW | __RTLD_DLOPEN)
 extern void *__libc_dlopen_mode  (const char *__name, int __mode);
 extern void *__libc_dlsym   (void *__map, const char *__name);
 extern void *__libc_dlvsym (void *map, const char *name, const char *version);
diff --git a/sysdeps/gnu/unwind-resume.c b/sysdeps/gnu/unwind-resume.c
index 7f9a1bf2c7..72a04c7969 100644
--- a/sysdeps/gnu/unwind-resume.c
+++ b/sysdeps/gnu/unwind-resume.c
@@ -39,7 +39,7 @@  __libgcc_s_init (void)
      RTLD_NOW will rarely make a difference here because unwinding is
      already in progress, so libgcc_s.so has already been loaded if
      its unwinder is used.  */
-  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
+  handle = __libc_dlopen (LIBGCC_S_SO);
 
   if (handle == NULL
       || (resume = __libc_dlsym (handle, "_Unwind_Resume")) == NULL
diff --git a/sysdeps/nptl/unwind-forcedunwind.c b/sysdeps/nptl/unwind-forcedunwind.c
index 67b8e74b53..dbd8d66cda 100644
--- a/sysdeps/nptl/unwind-forcedunwind.c
+++ b/sysdeps/nptl/unwind-forcedunwind.c
@@ -49,7 +49,8 @@  pthread_cancel_init (void)
       return;
     }
 
-  handle = __libc_dlopen_mode (LIBGCC_S_SO, RTLD_NOW | __RTLD_DLOPEN);
+  /* __libgcc_s_init depends on RTLD_NOW being used here.  */
+  handle = __libc_dlopen (LIBGCC_S_SO);
 
   if (handle == NULL
       || (resume = __libc_dlsym (handle, "_Unwind_Resume")) == NULL