[1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]

Message ID 20200912134441.2407884-2-hjl.tools@gmail.com
State Superseded
Headers
Series ld.so: Add --list-tunables to print tunable values |

Commit Message

H.J. Lu Sept. 12, 2020, 1:44 p.m. UTC
  X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Instead, we should initialize x86 CPU
features and cache info by initializing dummy function pointers via
IFUNC relocation.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
---
 sysdeps/i386/dl-machine.h          |  3 +--
 sysdeps/x86/cacheinfo.c            | 10 +++++++++-
 sysdeps/x86/dl-get-cpu-features.c  | 25 ++++++++++++++++++++++++-
 sysdeps/x86/include/cpu-features.h |  1 +
 sysdeps/x86_64/dl-machine.h        |  3 +--
 5 files changed, 36 insertions(+), 6 deletions(-)
  

Comments

Florian Weimer Sept. 18, 2020, 8:06 a.m. UTC | #1
* H. J. Lu via Libc-alpha:

> X86 CPU features in ld.so are initialized by init_cpu_features, which is
> invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> loaded by static executable, DL_PLATFORM_INIT is never called.

I don't think that's accurate.  It's called from elf/dl-support.c in the
static case.  But I agree that it's too late in this case.

> Also x86 cache info in libc.o and libc.a is initialized by a
> constructor which may be called too late.  Instead, we should
> initialize x86 CPU features and cache info by initializing dummy
> function pointers via IFUNC relocation.

Why would ld put this special IRELATIVE relocation first?  I'm sorry,
but this looks very brittle.

I think there's an existing mechanism for this corner case,
ARCH_SETUP_IREL.  Could you use that?

Thanks,
Florian
  
H.J. Lu Sept. 18, 2020, 1:55 p.m. UTC | #2
On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> > loaded by static executable, DL_PLATFORM_INIT is never called.
>
> I don't think that's accurate.  It's called from elf/dl-support.c in the
> static case.  But I agree that it's too late in this case.

I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.

> > Also x86 cache info in libc.o and libc.a is initialized by a
> > constructor which may be called too late.  Instead, we should
> > initialize x86 CPU features and cache info by initializing dummy
> > function pointers via IFUNC relocation.
>
> Why would ld put this special IRELATIVE relocation first?  I'm sorry,
> but this looks very brittle.

This is for ld.so.  ld.so doesn't assume that IFUNC is usable before everything
is setted up properly.   This IRELATIVE relocation is not different from other
dynamic relocations,  IRELATIVE or not.  The key here is that we want to call
init_cacheinfo as early as possible.  Calling from _dl_relocate_object is as
early as we can get.

> I think there's an existing mechanism for this corner case,
> ARCH_SETUP_IREL.  Could you use that?
>

ARCH_SETUP_IREL is for static executable, not ld.so.
  
Florian Weimer Sept. 18, 2020, 1:59 p.m. UTC | #3
* H. J. Lu:

> On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
>> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
>> > loaded by static executable, DL_PLATFORM_INIT is never called.
>>
>> I don't think that's accurate.  It's called from elf/dl-support.c in the
>> static case.  But I agree that it's too late in this case.
>
> I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.

I missed that.

You cannot use ld.so in a static executable because it is
uninitialized/dormant.

If you need functionality after static dlopen, it has to reside in
libc.so.6, not the dynamic loader.

Thanks,
Florian
  
H.J. Lu Sept. 18, 2020, 2:18 p.m. UTC | #4
On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu via Libc-alpha:
> >>
> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
> >>
> >> I don't think that's accurate.  It's called from elf/dl-support.c in the
> >> static case.  But I agree that it's too late in this case.
> >
> > I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.
>
> I missed that.
>
> You cannot use ld.so in a static executable because it is
> uninitialized/dormant.
>
> If you need functionality after static dlopen, it has to reside in
> libc.so.6, not the dynamic loader.

Here is the problem.  static dlopen uses libc.so.6 which brings in ld.so
with _rtld_global and _rtld_global_ro via DT_NEEDED.  My patch is
trying to deal with it to initialize CPU info in _rtld_global_ro in this case.
  
Florian Weimer Sept. 18, 2020, 2:22 p.m. UTC | #5
* H. J. Lu:

> On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>> >>
>> >> * H. J. Lu via Libc-alpha:
>> >>
>> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
>> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
>> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
>> >>
>> >> I don't think that's accurate.  It's called from elf/dl-support.c in the
>> >> static case.  But I agree that it's too late in this case.
>> >
>> > I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.
>>
>> I missed that.
>>
>> You cannot use ld.so in a static executable because it is
>> uninitialized/dormant.
>>
>> If you need functionality after static dlopen, it has to reside in
>> libc.so.6, not the dynamic loader.
>
> Here is the problem.  static dlopen uses libc.so.6 which brings in ld.so
> with _rtld_global and _rtld_global_ro via DT_NEEDED.  My patch is
> trying to deal with it to initialize CPU info in _rtld_global_ro in this case.

Putting this data into _rtld_global_ro is not correct if it is required
after static dlopen.  The current static dlopen approach simply does not
support that.  You can only assume that libc.so has been initialized
after static dlopen.

Thanks,
Florian
  
H.J. Lu Sept. 18, 2020, 2:30 p.m. UTC | #6
On Fri, Sep 18, 2020 at 7:23 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu:
> >>
> >> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> >>
> >> >> * H. J. Lu via Libc-alpha:
> >> >>
> >> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> >> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> >> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
> >> >>
> >> >> I don't think that's accurate.  It's called from elf/dl-support.c in the
> >> >> static case.  But I agree that it's too late in this case.
> >> >
> >> > I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.
> >>
> >> I missed that.
> >>
> >> You cannot use ld.so in a static executable because it is
> >> uninitialized/dormant.
> >>
> >> If you need functionality after static dlopen, it has to reside in
> >> libc.so.6, not the dynamic loader.
> >
> > Here is the problem.  static dlopen uses libc.so.6 which brings in ld.so
> > with _rtld_global and _rtld_global_ro via DT_NEEDED.  My patch is
> > trying to deal with it to initialize CPU info in _rtld_global_ro in this case.
>
> Putting this data into _rtld_global_ro is not correct if it is required
> after static dlopen.  The current static dlopen approach simply does not
> support that.  You can only assume that libc.so has been initialized
> after static dlopen.
>

Unless it is removed,  _rtld_global_ro is the right place.  _rtld_global_ro
is initialized by dynamic relocation.  My patch does it.
  
Florian Weimer Sept. 21, 2020, 8:21 a.m. UTC | #7
* H. J. Lu:

>> Putting this data into _rtld_global_ro is not correct if it is required
>> after static dlopen.  The current static dlopen approach simply does not
>> support that.  You can only assume that libc.so has been initialized
>> after static dlopen.

> Unless it is removed,  _rtld_global_ro is the right place.  _rtld_global_ro
> is initialized by dynamic relocation.  My patch does it.

It initializes only part of it.  I find this rather confusing.

Why is this data in ld.so in the first place?

Thanks,
Florian
  
H.J. Lu Sept. 21, 2020, 11:31 a.m. UTC | #8
On Mon, Sep 21, 2020 at 1:21 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> >> Putting this data into _rtld_global_ro is not correct if it is required
> >> after static dlopen.  The current static dlopen approach simply does not
> >> support that.  You can only assume that libc.so has been initialized
> >> after static dlopen.
>
> > Unless it is removed,  _rtld_global_ro is the right place.  _rtld_global_ro
> > is initialized by dynamic relocation.  My patch does it.
>
> It initializes only part of it.  I find this rather confusing.
>
> Why is this data in ld.so in the first place?

I put it in ld.so with

commit e2e4f56056adddc3c1efe676b40a4b4f2453103b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Aug 13 03:37:47 2015 -0700

    Add _dl_x86_cpu_features to rtld_global

    This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
    and initializes it early before __libc_start_main is called so that
    cpu_features is always available when it is used and we can avoid
    calling __init_cpu_features in IFUNC selectors.

which made many things possible, including CET support and

commit bea3f92405f705684275bffee954cafe84ffb09d
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Oct 22 08:24:00 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]

    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
  
Florian Weimer Sept. 25, 2020, 12:55 p.m. UTC | #9
* H. J. Lu:

> On Mon, Sep 21, 2020 at 1:21 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> >> Putting this data into _rtld_global_ro is not correct if it is required
>> >> after static dlopen.  The current static dlopen approach simply does not
>> >> support that.  You can only assume that libc.so has been initialized
>> >> after static dlopen.
>>
>> > Unless it is removed,  _rtld_global_ro is the right place.  _rtld_global_ro
>> > is initialized by dynamic relocation.  My patch does it.
>>
>> It initializes only part of it.  I find this rather confusing.
>>
>> Why is this data in ld.so in the first place?
>
> I put it in ld.so with
>
> commit e2e4f56056adddc3c1efe676b40a4b4f2453103b
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date:   Thu Aug 13 03:37:47 2015 -0700
>
>     Add _dl_x86_cpu_features to rtld_global
>
>     This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
>     and initializes it early before __libc_start_main is called so that
>     cpu_features is always available when it is used and we can avoid
>     calling __init_cpu_features in IFUNC selectors.
>
> which made many things possible, including CET support and
>
> commit bea3f92405f705684275bffee954cafe84ffb09d
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date:   Sun Oct 22 08:24:00 2017 -0700
>
>     x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
>
>     In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
>     mask and bound registers.  It simplifies _dl_runtime_resolve and supports
>     different calling conventions.  ld.so code size is reduced by more than
>     1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
>     than saving and restoring vector and bound registers individually.

Hmm.  I think the current setup is not really ideal, but I guess it's
what we've got now.  I have not reviewed your entire patch series, but I
won't object to this aspect.  (I hope to review it soon, but if anyone
else wants to step in, that's okay, too.)

Thanks,
Florian
  

Patch

diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index 0f08079e48..5e22e795cc 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -25,7 +25,6 @@ 
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -250,7 +249,7 @@  dl_platform_init (void)
 #if IS_IN (rtld)
   /* init_cpu_features has been called early from __libc_start_main in
      static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 217c21c34f..7a325ab70e 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -756,7 +756,6 @@  intel_bug_no_cache_info:
 
 
 static void
-__attribute__((constructor))
 init_cacheinfo (void)
 {
   /* Find out what brand of processor.  */
@@ -770,6 +769,8 @@  init_cacheinfo (void)
   unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
+  assert (cpu_features->basic.kind != arch_kind_unknown);
+
   if (cpu_features->basic.kind == arch_kind_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
@@ -894,4 +895,11 @@  init_cacheinfo (void)
 # endif
 }
 
+/* NB: Call init_cacheinfo by initializing a dummy function pointer via
+   IFUNC relocation.  */
+extern void __x86_cacheinfo (void) attribute_hidden;
+const void (*__x86_cacheinfo_p) (void) attribute_hidden
+  = __x86_cacheinfo;
+
+__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
 #endif
diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
index 5f9e46b0c6..da4697b895 100644
--- a/sysdeps/x86/dl-get-cpu-features.c
+++ b/sysdeps/x86/dl-get-cpu-features.c
@@ -1,4 +1,4 @@ 
-/* This file is part of the GNU C Library.
+/* Initialize CPU feature data via IFUNC relocation.
    Copyright (C) 2015-2020 Free Software Foundation, Inc.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -18,6 +18,29 @@ 
 
 #include <ldsodefs.h>
 
+#ifdef SHARED
+# include <cpu-features.c>
+
+/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
+   CPU features.  But when loading ld.so inside of static executable,
+   DL_PLATFORM_INIT isn't called.  Call init_cpu_features by initializing
+   a dummy function pointer via IFUNC relocation for ld.so.  */
+extern void __x86_cpu_features (void) attribute_hidden;
+const void (*__x86_cpu_features_p) (void) attribute_hidden
+  = __x86_cpu_features;
+
+void
+_dl_x86_init_cpu_features (void)
+{
+  struct cpu_features *cpu_features = __get_cpu_features ();
+  if (cpu_features->basic.kind == arch_kind_unknown)
+    init_cpu_features (cpu_features);
+}
+
+__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
+	 _dl_x86_init_cpu_features);
+#endif
+
 #undef __x86_get_cpu_features
 
 const struct cpu_features *
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index dcf29b6fe8..f62be0b9b3 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -159,6 +159,7 @@  struct cpu_features
 /* Unused for x86.  */
 #  define INIT_ARCH()
 #  define __x86_get_cpu_features(max) (&GLRO(dl_x86_cpu_features))
+extern void _dl_x86_init_cpu_features (void) attribute_hidden;
 # endif
 
 # ifdef __x86_64__
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index ca73d8fef9..773e94c8bb 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -26,7 +26,6 @@ 
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -225,7 +224,7 @@  dl_platform_init (void)
 #if IS_IN (rtld)
   /* init_cpu_features has been called early from __libc_start_main in
      static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */