mbox series

[v2,0/8] Extensible rseq integration

Message ID cover.1638880888.git.fweimer@redhat.com
Headers show
Series Extensible rseq integration | expand

Message

Florian Weimer Dec. 7, 2021, 12:59 p.m. UTC
This series integrates the previous posted v2 for <thread_pointer.h>.

It incorporates Mathieu's and Paul E. McKenney suggestion to use a
volatile read for rseq_abi.cpu_id access, using a new
THREAD_GETMEM_VOLATILE macro.

The last patch in the series makes rseq registration consistent across
threads.

Florian Weimer (8):
  nptl: Add <thread_pointer.h> for defining __thread_pointer
  nptl: Introduce <tcb-access.h> for THREAD_* accessors
  nptl: Introduce THREAD_GETMEM_VOLATILE
  nptl: Add rseq registration
  Linux: Use rseq to accelerate sched_getcpu
  nptl: Add glibc.pthread.rseq tunable to control rseq registration
  nptl: Add public rseq symbols and <sys/rseq.h>
  nptl: rseq failure after registration on main thread is fatal

 NEWS                                          |  11 +
 manual/threads.texi                           |  81 ++++++
 manual/tunables.texi                          |  10 +
 nptl/descr.h                                  |   4 +
 nptl/pthread_create.c                         |  22 ++
 sysdeps/aarch64/nptl/tls.h                    |  10 +-
 sysdeps/alpha/nptl/tls.h                      |  10 +-
 sysdeps/arc/nptl/tls.h                        |  10 +-
 sysdeps/arm/nptl/tls.h                        |  10 +-
 sysdeps/csky/nptl/tls.h                       |  10 +-
 sysdeps/hppa/nptl/tls.h                       |  10 +-
 sysdeps/i386/nptl/tcb-access.h                | 125 +++++++++
 sysdeps/i386/nptl/tls.h                       | 108 +-------
 sysdeps/ia64/nptl/tls.h                       |  10 +-
 sysdeps/m68k/nptl/tls.h                       |  10 +-
 sysdeps/microblaze/nptl/tls.h                 |  15 +-
 sysdeps/mips/nptl/tls.h                       |   9 +-
 sysdeps/nios2/nptl/tls.h                      |  10 +-
 sysdeps/nptl/dl-tls_init_tp.c                 |  38 ++-
 sysdeps/nptl/dl-tunables.list                 |   6 +
 sysdeps/nptl/internaltypes.h                  |   1 +
 sysdeps/nptl/tcb-access.h                     |  32 +++
 sysdeps/nptl/thread_pointer.h                 |  28 ++
 sysdeps/powerpc/nptl/thread_pointer.h         |  33 +++
 sysdeps/powerpc/nptl/tls.h                    |  15 +-
 sysdeps/riscv/nptl/tls.h                      |   9 +-
 sysdeps/s390/nptl/tls.h                       |  10 +-
 sysdeps/sh/nptl/tls.h                         |  14 +-
 sysdeps/sparc/nptl/tls.h                      |  10 +-
 sysdeps/unix/sysv/linux/Makefile              |  20 +-
 sysdeps/unix/sysv/linux/Versions              |   5 +
 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h   |  43 +++
 sysdeps/unix/sysv/linux/aarch64/ld.abilist    |   3 +
 sysdeps/unix/sysv/linux/alpha/ld.abilist      |   3 +
 sysdeps/unix/sysv/linux/arc/ld.abilist        |   3 +
 sysdeps/unix/sysv/linux/arm/be/ld.abilist     |   3 +
 sysdeps/unix/sysv/linux/arm/bits/rseq.h       |  83 ++++++
 sysdeps/unix/sysv/linux/arm/le/ld.abilist     |   3 +
 sysdeps/unix/sysv/linux/bits/rseq.h           |  29 ++
 sysdeps/unix/sysv/linux/csky/ld.abilist       |   3 +
 sysdeps/unix/sysv/linux/hppa/ld.abilist       |   3 +
 sysdeps/unix/sysv/linux/i386/ld.abilist       |   3 +
 sysdeps/unix/sysv/linux/ia64/ld.abilist       |   3 +
 .../unix/sysv/linux/m68k/coldfire/ld.abilist  |   3 +
 .../unix/sysv/linux/m68k/m680x0/ld.abilist    |   3 +
 sysdeps/unix/sysv/linux/microblaze/ld.abilist |   3 +
 sysdeps/unix/sysv/linux/mips/bits/rseq.h      |  62 +++++
 .../unix/sysv/linux/mips/mips32/ld.abilist    |   3 +
 .../sysv/linux/mips/mips64/n32/ld.abilist     |   3 +
 .../sysv/linux/mips/mips64/n64/ld.abilist     |   3 +
 sysdeps/unix/sysv/linux/nios2/ld.abilist      |   3 +
 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h   |  37 +++
 .../sysv/linux/powerpc/powerpc32/ld.abilist   |   3 +
 .../linux/powerpc/powerpc64/be/ld.abilist     |   3 +
 .../linux/powerpc/powerpc64/le/ld.abilist     |   3 +
 sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist |   3 +
 sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist |   3 +
 sysdeps/unix/sysv/linux/rseq-internal.h       |  52 ++++
 sysdeps/unix/sysv/linux/s390/bits/rseq.h      |  37 +++
 .../unix/sysv/linux/s390/s390-32/ld.abilist   |   3 +
 .../unix/sysv/linux/s390/s390-64/ld.abilist   |   3 +
 sysdeps/unix/sysv/linux/sched_getcpu.c        |  19 +-
 sysdeps/unix/sysv/linux/sh/be/ld.abilist      |   3 +
 sysdeps/unix/sysv/linux/sh/le/ld.abilist      |   3 +
 .../unix/sysv/linux/sparc/sparc32/ld.abilist  |   3 +
 .../unix/sysv/linux/sparc/sparc64/ld.abilist  |   3 +
 sysdeps/unix/sysv/linux/sys/rseq.h            | 184 +++++++++++++
 sysdeps/unix/sysv/linux/tst-rseq-disable.c    |  95 +++++++
 sysdeps/unix/sysv/linux/tst-rseq-nptl.c       | 260 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-rseq.c            |  72 +++++
 sysdeps/unix/sysv/linux/tst-rseq.h            |  57 ++++
 sysdeps/unix/sysv/linux/x86/bits/rseq.h       |  30 ++
 sysdeps/unix/sysv/linux/x86_64/64/ld.abilist  |   3 +
 sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist |   3 +
 sysdeps/x86/nptl/thread_pointer.h             |  38 +++
 sysdeps/x86_64/nptl/tcb-access.h              | 132 +++++++++
 sysdeps/x86_64/nptl/tls.h                     | 114 +-------
 77 files changed, 1745 insertions(+), 382 deletions(-)
 create mode 100644 sysdeps/i386/nptl/tcb-access.h
 create mode 100644 sysdeps/nptl/tcb-access.h
 create mode 100644 sysdeps/nptl/thread_pointer.h
 create mode 100644 sysdeps/powerpc/nptl/thread_pointer.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h
 create mode 100644 sysdeps/x86/nptl/thread_pointer.h
 create mode 100644 sysdeps/x86_64/nptl/tcb-access.h


base-commit: 68007900beef12000ed90f38c251eaf32fbc0490

Comments

Rich Felker Feb. 1, 2022, 3:21 p.m. UTC | #1
On Tue, Dec 07, 2021 at 01:59:26PM +0100, Florian Weimer via Libc-alpha wrote:
> This series integrates the previous posted v2 for <thread_pointer.h>.
> 
> It incorporates Mathieu's and Paul E. McKenney suggestion to use a
> volatile read for rseq_abi.cpu_id access, using a new
> THREAD_GETMEM_VOLATILE macro.
> 
> The last patch in the series makes rseq registration consistent across
> threads.
> 
> Florian Weimer (8):
>   nptl: Add <thread_pointer.h> for defining __thread_pointer
>   nptl: Introduce <tcb-access.h> for THREAD_* accessors
>   nptl: Introduce THREAD_GETMEM_VOLATILE
>   nptl: Add rseq registration
>   Linux: Use rseq to accelerate sched_getcpu
>   nptl: Add glibc.pthread.rseq tunable to control rseq registration
>   nptl: Add public rseq symbols and <sys/rseq.h>
>   nptl: rseq failure after registration on main thread is fatal

I'm sorry for bringing this up so late; I wasn't aware that redesign
of the rseq ABI was taking place. I wish this had been discussed in a
cross-libc venue, since, in its current form, I don't think the ABI is
suitable for inclusion in, or use as a third-party library with, musl.

The most pressing issue I see is that it does not admit lazy
registration, which precludes it being implemented outside of libc
(because it has to hook into pthread_create) and imposes runtime cost
on programs which do not use it. RSEQ_CPU_ID_UNINITIALIZED exists to
inform the application about an uninitialized state, but the
application has no way to request an attempt at registration upon
seeing it. I think that would be easy to add. Basically it's just
making the syscall, which a consumer of the ABI could in theory do
itself, but it's probably best not to have it do that and instead have
registration mediated through the ABI/through libc.

Related to this, if rseq is implemented outside of libc, I'm not sure
if there's a safe way to ensure it's unregistered prior to thread
exit. It may already be possible but I haven't sufficiently convinced
myself.

On another issue, while this isn't entirely a show-stopper, I'm not a
fan of requiring constant __rseq_offset. This comes across as an
instance-specific hack to make up for GD TLS being slow, when we
already have a fully general solution to that which isn't being
deployed: TLSDESC. As it stands in the current ABI, whatever library
is providing rseq must be present at application startup; it can't be
dlopened. And due to the ABI this applies *even if* we just wanted to
make rseq always-fail in that case. The ABI simply doesn't admit not
having memory pre-reserved for every thread (note: the size is
something like a +30% increase to musl's per-thread memory usage and
will surely increase over time, which is a lot for something we don't
expect the vast majority of applications to use).

One minor and hopefully non-controversial declared-ABI issue I see is
that the __rseq_offset etc. objects are declared const, with a
pre-relro access hack used to modify them at runtime. This is
incompatible with LTO and static linking. If protecting them is
desired, they should be declared non-const but live in non-modifiable
memory, like string literals do. Otherwise a static linking LTO
compiler is free to copy the initial values directly into code.

I'm not sure what the right thing to do on the verge of release is. If
it were my choice, I would hold it back and wait until it was better
reviewed and these issues worked out before making it public API/ABI,
but I don't know what glibc's constraints here are and how to best
weigh them against the ability to revise this ABI after release. Most
of these things I think *are* of the sort that can be fixed in
non-breaking ways, except that applications written to the current
version might need to adjust before they can use a version of the
API/ABI we'd be willing to adopt in musl.

Rich
Florian Weimer Feb. 1, 2022, 4:36 p.m. UTC | #2
* Rich Felker:

> On Tue, Dec 07, 2021 at 01:59:26PM +0100, Florian Weimer via Libc-alpha wrote:
>> This series integrates the previous posted v2 for <thread_pointer.h>.
>> 
>> It incorporates Mathieu's and Paul E. McKenney suggestion to use a
>> volatile read for rseq_abi.cpu_id access, using a new
>> THREAD_GETMEM_VOLATILE macro.
>> 
>> The last patch in the series makes rseq registration consistent across
>> threads.
>> 
>> Florian Weimer (8):
>>   nptl: Add <thread_pointer.h> for defining __thread_pointer
>>   nptl: Introduce <tcb-access.h> for THREAD_* accessors
>>   nptl: Introduce THREAD_GETMEM_VOLATILE
>>   nptl: Add rseq registration
>>   Linux: Use rseq to accelerate sched_getcpu
>>   nptl: Add glibc.pthread.rseq tunable to control rseq registration
>>   nptl: Add public rseq symbols and <sys/rseq.h>
>>   nptl: rseq failure after registration on main thread is fatal
>
> I'm sorry for bringing this up so late; I wasn't aware that redesign
> of the rseq ABI was taking place. I wish this had been discussed in a
> cross-libc venue, since, in its current form, I don't think the ABI is
> suitable for inclusion in, or use as a third-party library with, musl.

Well, I Cc:ed you on the original proposal in November, and cross-posted
it to linux-api as well.

> The most pressing issue I see is that it does not admit lazy
> registration, which precludes it being implemented outside of libc
> (because it has to hook into pthread_create) and imposes runtime cost
> on programs which do not use it. RSEQ_CPU_ID_UNINITIALIZED exists to
> inform the application about an uninitialized state, but the
> application has no way to request an attempt at registration upon
> seeing it. I think that would be easy to add. Basically it's just
> making the syscall, which a consumer of the ABI could in theory do
> itself, but it's probably best not to have it do that and instead have
> registration mediated through the ABI/through libc.

I rejected that because the programming model is too complex: In the
extreme, a library that observes rseq support on the main thread may be
called again from another thread where rseq is not yet enabled, and
cannot be enabled.

I think it is also necessary to enable it unconditionally to force
people to actually implement support for it in their tools (e.g., CRIU).
Otherwise we'll never get to the point where it is reliable.  I doubt
we'd have learned about the CRIU issue by now unless we took that step.

> Related to this, if rseq is implemented outside of libc, I'm not sure
> if there's a safe way to ensure it's unregistered prior to thread
> exit. It may already be possible but I haven't sufficiently convinced
> myself.

I expect that asking for rseq to be implemented outside of libc is like
asking for robust mutexes to be implemented outside libc: it's really
pushing what can be done in a library.

> On another issue, while this isn't entirely a show-stopper, I'm not a
> fan of requiring constant __rseq_offset. This comes across as an
> instance-specific hack to make up for GD TLS being slow, when we
> already have a fully general solution to that which isn't being
> deployed: TLSDESC. As it stands in the current ABI, whatever library
> is providing rseq must be present at application startup; it can't be
> dlopened. And due to the ABI this applies *even if* we just wanted to
> make rseq always-fail in that case. The ABI simply doesn't admit not
> having memory pre-reserved for every thread (note: the size is
> something like a +30% increase to musl's per-thread memory usage and
> will surely increase over time, which is a lot for something we don't
> expect the vast majority of applications to use).

If the memory is not allocated, __rseq_size can be set to 0.

> One minor and hopefully non-controversial declared-ABI issue I see is
> that the __rseq_offset etc. objects are declared const, with a
> pre-relro access hack used to modify them at runtime. This is
> incompatible with LTO and static linking. If protecting them is
> desired, they should be declared non-const but live in non-modifiable
> memory, like string literals do. Otherwise a static linking LTO
> compiler is free to copy the initial values directly into code.

Yes, you'll need a compiler barrier with LTO.  It's not different from
other types of relocations.

> I'm not sure what the right thing to do on the verge of release is. If
> it were my choice, I would hold it back and wait until it was better
> reviewed and these issues worked out before making it public API/ABI,
> but I don't know what glibc's constraints here are and how to best
> weigh them against the ability to revise this ABI after release. Most
> of these things I think *are* of the sort that can be fixed in
> non-breaking ways, except that applications written to the current
> version might need to adjust before they can use a version of the
> API/ABI we'd be willing to adopt in musl.

Quoting for Mathieu's benefit.  Also Cc:ing Carlos as the release
manager.

Thanks,
Florian
Carlos O'Donell Feb. 3, 2022, 12:37 a.m. UTC | #3
On 2/1/22 11:36, Florian Weimer wrote:
> * Rich Felker:
> 
>> On Tue, Dec 07, 2021 at 01:59:26PM +0100, Florian Weimer via Libc-alpha wrote:
>>> This series integrates the previous posted v2 for <thread_pointer.h>.
>>>
>>> It incorporates Mathieu's and Paul E. McKenney suggestion to use a
>>> volatile read for rseq_abi.cpu_id access, using a new
>>> THREAD_GETMEM_VOLATILE macro.
>>>
>>> The last patch in the series makes rseq registration consistent across
>>> threads.
>>>
>>> Florian Weimer (8):
>>>   nptl: Add <thread_pointer.h> for defining __thread_pointer
>>>   nptl: Introduce <tcb-access.h> for THREAD_* accessors
>>>   nptl: Introduce THREAD_GETMEM_VOLATILE
>>>   nptl: Add rseq registration
>>>   Linux: Use rseq to accelerate sched_getcpu
>>>   nptl: Add glibc.pthread.rseq tunable to control rseq registration
>>>   nptl: Add public rseq symbols and <sys/rseq.h>
>>>   nptl: rseq failure after registration on main thread is fatal
>>
>> I'm sorry for bringing this up so late; I wasn't aware that redesign
>> of the rseq ABI was taking place. I wish this had been discussed in a
>> cross-libc venue, since, in its current form, I don't think the ABI is
>> suitable for inclusion in, or use as a third-party library with, musl.
> 
> Well, I Cc:ed you on the original proposal in November, and cross-posted
> it to linux-api as well.
> 
>> The most pressing issue I see is that it does not admit lazy
>> registration, which precludes it being implemented outside of libc
>> (because it has to hook into pthread_create) and imposes runtime cost
>> on programs which do not use it. RSEQ_CPU_ID_UNINITIALIZED exists to
>> inform the application about an uninitialized state, but the
>> application has no way to request an attempt at registration upon
>> seeing it. I think that would be easy to add. Basically it's just
>> making the syscall, which a consumer of the ABI could in theory do
>> itself, but it's probably best not to have it do that and instead have
>> registration mediated through the ABI/through libc.
> 
> I rejected that because the programming model is too complex: In the
> extreme, a library that observes rseq support on the main thread may be
> called again from another thread where rseq is not yet enabled, and
> cannot be enabled.
> 
> I think it is also necessary to enable it unconditionally to force
> people to actually implement support for it in their tools (e.g., CRIU).
> Otherwise we'll never get to the point where it is reliable.  I doubt
> we'd have learned about the CRIU issue by now unless we took that step.

Agreed.

>> Related to this, if rseq is implemented outside of libc, I'm not sure
>> if there's a safe way to ensure it's unregistered prior to thread
>> exit. It may already be possible but I haven't sufficiently convinced
>> myself.
> 
> I expect that asking for rseq to be implemented outside of libc is like
> asking for robust mutexes to be implemented outside libc: it's really
> pushing what can be done in a library.

This is a design decision that we made in glibc.

>> On another issue, while this isn't entirely a show-stopper, I'm not a
>> fan of requiring constant __rseq_offset. This comes across as an
>> instance-specific hack to make up for GD TLS being slow, when we
>> already have a fully general solution to that which isn't being
>> deployed: TLSDESC. As it stands in the current ABI, whatever library
>> is providing rseq must be present at application startup; it can't be
>> dlopened. And due to the ABI this applies *even if* we just wanted to
>> make rseq always-fail in that case. The ABI simply doesn't admit not
>> having memory pre-reserved for every thread (note: the size is
>> something like a +30% increase to musl's per-thread memory usage and
>> will surely increase over time, which is a lot for something we don't
>> expect the vast majority of applications to use).
> 
> If the memory is not allocated, __rseq_size can be set to 0.
> 
>> One minor and hopefully non-controversial declared-ABI issue I see is
>> that the __rseq_offset etc. objects are declared const, with a
>> pre-relro access hack used to modify them at runtime. This is
>> incompatible with LTO and static linking. If protecting them is
>> desired, they should be declared non-const but live in non-modifiable
>> memory, like string literals do. Otherwise a static linking LTO
>> compiler is free to copy the initial values directly into code.
> 
> Yes, you'll need a compiler barrier with LTO.  It's not different from
> other types of relocations.

Agreed.

At the language level the offset is constant.

Of the two choices, I think that making __rseq_offset non-const is pessimistic.

LTO and static linking must be aware of details outside of the language level
and may need to handle those details in an implementation defined manner.

>> I'm not sure what the right thing to do on the verge of release is. If
>> it were my choice, I would hold it back and wait until it was better
>> reviewed and these issues worked out before making it public API/ABI,
>> but I don't know what glibc's constraints here are and how to best
>> weigh them against the ability to revise this ABI after release. Most
>> of these things I think *are* of the sort that can be fixed in
>> non-breaking ways, except that applications written to the current
>> version might need to adjust before they can use a version of the
>> API/ABI we'd be willing to adopt in musl.
> 
> Quoting for Mathieu's benefit.  Also Cc:ing Carlos as the release
> manager.

We have spent ~1.5 years correcting rseq integration since the initial attempt
in July 2020. The inclusion of __rseq is ready for glibc 2.35.

I plan to make the release with the ABI included.

I think we can continue to work with the musl community on integration issues.