[1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
Commit Message
X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called. Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late. Instead, we should initialize x86 CPU
features and cache info by initializing dummy function pointers via
IFUNC relocation.
Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
---
sysdeps/i386/dl-machine.h | 3 +--
sysdeps/x86/cacheinfo.c | 10 +++++++++-
sysdeps/x86/dl-get-cpu-features.c | 25 ++++++++++++++++++++++++-
sysdeps/x86/include/cpu-features.h | 1 +
sysdeps/x86_64/dl-machine.h | 3 +--
5 files changed, 36 insertions(+), 6 deletions(-)
Comments
* H. J. Lu via Libc-alpha:
> X86 CPU features in ld.so are initialized by init_cpu_features, which is
> invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
> loaded by static executable, DL_PLATFORM_INIT is never called.
I don't think that's accurate. It's called from elf/dl-support.c in the
static case. But I agree that it's too late in this case.
> Also x86 cache info in libc.o and libc.a is initialized by a
> constructor which may be called too late. Instead, we should
> initialize x86 CPU features and cache info by initializing dummy
> function pointers via IFUNC relocation.
Why would ld put this special IRELATIVE relocation first? I'm sorry,
but this looks very brittle.
I think there's an existing mechanism for this corner case,
ARCH_SETUP_IREL. Could you use that?
Thanks,
Florian
On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
> > loaded by static executable, DL_PLATFORM_INIT is never called.
>
> I don't think that's accurate. It's called from elf/dl-support.c in the
> static case. But I agree that it's too late in this case.
I was referring to the ld.so case. elf/dl-support.c isn't used for ld.so.
> > Also x86 cache info in libc.o and libc.a is initialized by a
> > constructor which may be called too late. Instead, we should
> > initialize x86 CPU features and cache info by initializing dummy
> > function pointers via IFUNC relocation.
>
> Why would ld put this special IRELATIVE relocation first? I'm sorry,
> but this looks very brittle.
This is for ld.so. ld.so doesn't assume that IFUNC is usable before everything
is setted up properly. This IRELATIVE relocation is not different from other
dynamic relocations, IRELATIVE or not. The key here is that we want to call
init_cacheinfo as early as possible. Calling from _dl_relocate_object is as
early as we can get.
> I think there's an existing mechanism for this corner case,
> ARCH_SETUP_IREL. Could you use that?
>
ARCH_SETUP_IREL is for static executable, not ld.so.
* H. J. Lu:
> On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
>> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
>> > loaded by static executable, DL_PLATFORM_INIT is never called.
>>
>> I don't think that's accurate. It's called from elf/dl-support.c in the
>> static case. But I agree that it's too late in this case.
>
> I was referring to the ld.so case. elf/dl-support.c isn't used for ld.so.
I missed that.
You cannot use ld.so in a static executable because it is
uninitialized/dormant.
If you need functionality after static dlopen, it has to reside in
libc.so.6, not the dynamic loader.
Thanks,
Florian
On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu via Libc-alpha:
> >>
> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
> >>
> >> I don't think that's accurate. It's called from elf/dl-support.c in the
> >> static case. But I agree that it's too late in this case.
> >
> > I was referring to the ld.so case. elf/dl-support.c isn't used for ld.so.
>
> I missed that.
>
> You cannot use ld.so in a static executable because it is
> uninitialized/dormant.
>
> If you need functionality after static dlopen, it has to reside in
> libc.so.6, not the dynamic loader.
Here is the problem. static dlopen uses libc.so.6 which brings in ld.so
with _rtld_global and _rtld_global_ro via DT_NEEDED. My patch is
trying to deal with it to initialize CPU info in _rtld_global_ro in this case.
* H. J. Lu:
> On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>> >>
>> >> * H. J. Lu via Libc-alpha:
>> >>
>> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
>> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
>> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
>> >>
>> >> I don't think that's accurate. It's called from elf/dl-support.c in the
>> >> static case. But I agree that it's too late in this case.
>> >
>> > I was referring to the ld.so case. elf/dl-support.c isn't used for ld.so.
>>
>> I missed that.
>>
>> You cannot use ld.so in a static executable because it is
>> uninitialized/dormant.
>>
>> If you need functionality after static dlopen, it has to reside in
>> libc.so.6, not the dynamic loader.
>
> Here is the problem. static dlopen uses libc.so.6 which brings in ld.so
> with _rtld_global and _rtld_global_ro via DT_NEEDED. My patch is
> trying to deal with it to initialize CPU info in _rtld_global_ro in this case.
Putting this data into _rtld_global_ro is not correct if it is required
after static dlopen. The current static dlopen approach simply does not
support that. You can only assume that libc.so has been initialized
after static dlopen.
Thanks,
Florian
On Fri, Sep 18, 2020 at 7:23 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu:
> >>
> >> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> >>
> >> >> * H. J. Lu via Libc-alpha:
> >> >>
> >> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> >> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
> >> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
> >> >>
> >> >> I don't think that's accurate. It's called from elf/dl-support.c in the
> >> >> static case. But I agree that it's too late in this case.
> >> >
> >> > I was referring to the ld.so case. elf/dl-support.c isn't used for ld.so.
> >>
> >> I missed that.
> >>
> >> You cannot use ld.so in a static executable because it is
> >> uninitialized/dormant.
> >>
> >> If you need functionality after static dlopen, it has to reside in
> >> libc.so.6, not the dynamic loader.
> >
> > Here is the problem. static dlopen uses libc.so.6 which brings in ld.so
> > with _rtld_global and _rtld_global_ro via DT_NEEDED. My patch is
> > trying to deal with it to initialize CPU info in _rtld_global_ro in this case.
>
> Putting this data into _rtld_global_ro is not correct if it is required
> after static dlopen. The current static dlopen approach simply does not
> support that. You can only assume that libc.so has been initialized
> after static dlopen.
>
Unless it is removed, _rtld_global_ro is the right place. _rtld_global_ro
is initialized by dynamic relocation. My patch does it.
* H. J. Lu:
>> Putting this data into _rtld_global_ro is not correct if it is required
>> after static dlopen. The current static dlopen approach simply does not
>> support that. You can only assume that libc.so has been initialized
>> after static dlopen.
> Unless it is removed, _rtld_global_ro is the right place. _rtld_global_ro
> is initialized by dynamic relocation. My patch does it.
It initializes only part of it. I find this rather confusing.
Why is this data in ld.so in the first place?
Thanks,
Florian
On Mon, Sep 21, 2020 at 1:21 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> >> Putting this data into _rtld_global_ro is not correct if it is required
> >> after static dlopen. The current static dlopen approach simply does not
> >> support that. You can only assume that libc.so has been initialized
> >> after static dlopen.
>
> > Unless it is removed, _rtld_global_ro is the right place. _rtld_global_ro
> > is initialized by dynamic relocation. My patch does it.
>
> It initializes only part of it. I find this rather confusing.
>
> Why is this data in ld.so in the first place?
I put it in ld.so with
commit e2e4f56056adddc3c1efe676b40a4b4f2453103b
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Aug 13 03:37:47 2015 -0700
Add _dl_x86_cpu_features to rtld_global
This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
and initializes it early before __libc_start_main is called so that
cpu_features is always available when it is used and we can avoid
calling __init_cpu_features in IFUNC selectors.
which made many things possible, including CET support and
commit bea3f92405f705684275bffee954cafe84ffb09d
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Sun Oct 22 08:24:00 2017 -0700
x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers. It simplifies _dl_runtime_resolve and supports
different calling conventions. ld.so code size is reduced by more than
1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.
* H. J. Lu:
> On Mon, Sep 21, 2020 at 1:21 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> >> Putting this data into _rtld_global_ro is not correct if it is required
>> >> after static dlopen. The current static dlopen approach simply does not
>> >> support that. You can only assume that libc.so has been initialized
>> >> after static dlopen.
>>
>> > Unless it is removed, _rtld_global_ro is the right place. _rtld_global_ro
>> > is initialized by dynamic relocation. My patch does it.
>>
>> It initializes only part of it. I find this rather confusing.
>>
>> Why is this data in ld.so in the first place?
>
> I put it in ld.so with
>
> commit e2e4f56056adddc3c1efe676b40a4b4f2453103b
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date: Thu Aug 13 03:37:47 2015 -0700
>
> Add _dl_x86_cpu_features to rtld_global
>
> This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
> and initializes it early before __libc_start_main is called so that
> cpu_features is always available when it is used and we can avoid
> calling __init_cpu_features in IFUNC selectors.
>
> which made many things possible, including CET support and
>
> commit bea3f92405f705684275bffee954cafe84ffb09d
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date: Sun Oct 22 08:24:00 2017 -0700
>
> x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
>
> In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
> mask and bound registers. It simplifies _dl_runtime_resolve and supports
> different calling conventions. ld.so code size is reduced by more than
> 1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles
> than saving and restoring vector and bound registers individually.
Hmm. I think the current setup is not really ideal, but I guess it's
what we've got now. I have not reviewed your entire patch series, but I
won't object to this aspect. (I hope to review it soon, but if anyone
else wants to step in, that's okay, too.)
Thanks,
Florian
@@ -25,7 +25,6 @@
#include <sysdep.h>
#include <tls.h>
#include <dl-tlsdesc.h>
-#include <cpu-features.c>
/* Return nonzero iff ELF header is compatible with the running host. */
static inline int __attribute__ ((unused))
@@ -250,7 +249,7 @@ dl_platform_init (void)
#if IS_IN (rtld)
/* init_cpu_features has been called early from __libc_start_main in
static executable. */
- init_cpu_features (&GLRO(dl_x86_cpu_features));
+ _dl_x86_init_cpu_features ();
#else
if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
/* Avoid an empty string which would disturb us. */
@@ -756,7 +756,6 @@ intel_bug_no_cache_info:
static void
-__attribute__((constructor))
init_cacheinfo (void)
{
/* Find out what brand of processor. */
@@ -770,6 +769,8 @@ init_cacheinfo (void)
unsigned int threads = 0;
const struct cpu_features *cpu_features = __get_cpu_features ();
+ assert (cpu_features->basic.kind != arch_kind_unknown);
+
if (cpu_features->basic.kind == arch_kind_intel)
{
data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
@@ -894,4 +895,11 @@ init_cacheinfo (void)
# endif
}
+/* NB: Call init_cacheinfo by initializing a dummy function pointer via
+ IFUNC relocation. */
+extern void __x86_cacheinfo (void) attribute_hidden;
+const void (*__x86_cacheinfo_p) (void) attribute_hidden
+ = __x86_cacheinfo;
+
+__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
#endif
@@ -1,4 +1,4 @@
-/* This file is part of the GNU C Library.
+/* Initialize CPU feature data via IFUNC relocation.
Copyright (C) 2015-2020 Free Software Foundation, Inc.
The GNU C Library is free software; you can redistribute it and/or
@@ -18,6 +18,29 @@
#include <ldsodefs.h>
+#ifdef SHARED
+# include <cpu-features.c>
+
+/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
+ CPU features. But when loading ld.so inside of static executable,
+ DL_PLATFORM_INIT isn't called. Call init_cpu_features by initializing
+ a dummy function pointer via IFUNC relocation for ld.so. */
+extern void __x86_cpu_features (void) attribute_hidden;
+const void (*__x86_cpu_features_p) (void) attribute_hidden
+ = __x86_cpu_features;
+
+void
+_dl_x86_init_cpu_features (void)
+{
+ struct cpu_features *cpu_features = __get_cpu_features ();
+ if (cpu_features->basic.kind == arch_kind_unknown)
+ init_cpu_features (cpu_features);
+}
+
+__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
+ _dl_x86_init_cpu_features);
+#endif
+
#undef __x86_get_cpu_features
const struct cpu_features *
@@ -159,6 +159,7 @@ struct cpu_features
/* Unused for x86. */
# define INIT_ARCH()
# define __x86_get_cpu_features(max) (&GLRO(dl_x86_cpu_features))
+extern void _dl_x86_init_cpu_features (void) attribute_hidden;
# endif
# ifdef __x86_64__
@@ -26,7 +26,6 @@
#include <sysdep.h>
#include <tls.h>
#include <dl-tlsdesc.h>
-#include <cpu-features.c>
/* Return nonzero iff ELF header is compatible with the running host. */
static inline int __attribute__ ((unused))
@@ -225,7 +224,7 @@ dl_platform_init (void)
#if IS_IN (rtld)
/* init_cpu_features has been called early from __libc_start_main in
static executable. */
- init_cpu_features (&GLRO(dl_x86_cpu_features));
+ _dl_x86_init_cpu_features ();
#else
if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
/* Avoid an empty string which would disturb us. */