[v7,0/8] Extend rseq support

Message ID 20240201193648.584917-1-mjeanson@efficios.com
Headers
Series Extend rseq support |

Message

Michael Jeanson Feb. 1, 2024, 7:36 p.m. UTC
  This series rebases the standalone "Add rseq extensible ABI" patch on
current master and adds an accelerated getcpu() implementation using the
rseq extensible ABI with initial support for aarch64 and x86_64.

On an aarch64 system (Snapdragon 8cx Gen 3) which lacks a vDSO for
getcpu() we measured an improvement from 130 ns to 1 ns while on x86_64
(i7-8550U) which has a vDSO we measured a more modest improvement from
10 ns to 2 ns.

Tested on i386, aarch64 and x86_64.

Cc: Florian Weimer <fweimer@redhat.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Mathieu Desnoyers (2):
  x86-64: Add rseq_load32_load32_relaxed
  aarch64: Add rseq_load32_load32_relaxed

Michael Jeanson (6):
  nptl: fix potential merge of __rseq_* relro symbols
  Add rseq extensible ABI support
  nptl: Add public __rseq_feature_size symbol
  nptl: Add features to internal 'struct rseq_area'
  nptl: Add rseq internal utils
  Linux: Use rseq to accelerate getcpu

 csu/Makefile                                  |   2 +-
 csu/libc-tls.c                                |  66 ++++++-
 csu/rseq-sizes.sym                            |  11 ++
 elf/Makefile                                  |   1 +
 elf/dl-rseq-symbols.S                         |  72 ++++++++
 elf/dl-tls.c                                  |  62 +++++++
 elf/rtld_static_init.c                        |  12 ++
 manual/threads.texi                           |   8 +
 nptl/descr.h                                  |  20 +-
 nptl/pthread_create.c                         |   2 +-
 sysdeps/generic/dl-rseq.h                     |  26 +++
 sysdeps/generic/ldsodefs.h                    |  12 ++
 sysdeps/i386/nptl/tcb-access.h                |  56 ++++++
 sysdeps/nptl/dl-tls_init_tp.c                 |  16 +-
 sysdeps/nptl/tcb-access.h                     |   5 +
 sysdeps/unix/sysv/linux/Makefile              |  10 +
 sysdeps/unix/sysv/linux/Versions              |   3 +
 sysdeps/unix/sysv/linux/aarch64/ld.abilist    |   1 +
 .../unix/sysv/linux/aarch64/rseq-internal.h   | 173 ++++++++++++++++++
 sysdeps/unix/sysv/linux/alpha/ld.abilist      |   1 +
 sysdeps/unix/sysv/linux/arc/ld.abilist        |   1 +
 sysdeps/unix/sysv/linux/arm/be/ld.abilist     |   1 +
 sysdeps/unix/sysv/linux/arm/le/ld.abilist     |   1 +
 sysdeps/unix/sysv/linux/csky/ld.abilist       |   1 +
 sysdeps/unix/sysv/linux/dl-parse_auxv.h       |   6 +
 sysdeps/unix/sysv/linux/getcpu.c              |  32 +++-
 sysdeps/unix/sysv/linux/hppa/ld.abilist       |   1 +
 sysdeps/unix/sysv/linux/i386/ld.abilist       |   1 +
 .../unix/sysv/linux/loongarch/lp64/ld.abilist |   1 +
 .../unix/sysv/linux/m68k/coldfire/ld.abilist  |   1 +
 .../unix/sysv/linux/m68k/m680x0/ld.abilist    |   1 +
 sysdeps/unix/sysv/linux/microblaze/ld.abilist |   1 +
 .../unix/sysv/linux/mips/mips32/ld.abilist    |   1 +
 .../sysv/linux/mips/mips64/n32/ld.abilist     |   1 +
 .../sysv/linux/mips/mips64/n64/ld.abilist     |   1 +
 sysdeps/unix/sysv/linux/nios2/ld.abilist      |   1 +
 sysdeps/unix/sysv/linux/or1k/ld.abilist       |   1 +
 .../sysv/linux/powerpc/powerpc32/ld.abilist   |   1 +
 .../linux/powerpc/powerpc64/be/ld.abilist     |   1 +
 .../linux/powerpc/powerpc64/le/ld.abilist     |   1 +
 sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/rseq-internal.h       |  89 ++++++++-
 .../unix/sysv/linux/s390/s390-32/ld.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-64/ld.abilist   |   1 +
 sysdeps/unix/sysv/linux/sched_getcpu.c        |   3 +-
 sysdeps/unix/sysv/linux/sh/be/ld.abilist      |   1 +
 sysdeps/unix/sysv/linux/sh/le/ld.abilist      |   1 +
 .../unix/sysv/linux/sparc/sparc32/ld.abilist  |   1 +
 .../unix/sysv/linux/sparc/sparc64/ld.abilist  |   1 +
 sysdeps/unix/sysv/linux/sys/rseq.h            |   4 +
 .../unix/sysv/linux/tst-rseq-disable-static.c |   1 +
 sysdeps/unix/sysv/linux/tst-rseq-disable.c    |  20 +-
 .../unix/sysv/linux/tst-rseq-nptl-static.c    |   1 +
 sysdeps/unix/sysv/linux/tst-rseq-static.c     |   1 +
 sysdeps/unix/sysv/linux/tst-rseq.c            |  24 ++-
 sysdeps/unix/sysv/linux/tst-rseq.h            |   9 +-
 sysdeps/unix/sysv/linux/x86_64/64/ld.abilist  |   1 +
 .../unix/sysv/linux/x86_64/rseq-internal.h    | 109 +++++++++++
 sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist |   1 +
 sysdeps/x86_64/nptl/tcb-access.h              |  56 ++++++
 61 files changed, 886 insertions(+), 56 deletions(-)
 create mode 100644 csu/rseq-sizes.sym
 create mode 100644 elf/dl-rseq-symbols.S
 create mode 100644 sysdeps/generic/dl-rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-static.c
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/rseq-internal.h
  

Comments

Michael Jeanson Feb. 2, 2024, 3:40 p.m. UTC | #1
On 2024-02-01 14:36, Michael Jeanson wrote:
> This series rebases the standalone "Add rseq extensible ABI" patch on
> current master and adds an accelerated getcpu() implementation using the
> rseq extensible ABI with initial support for aarch64 and x86_64.
> 
> On an aarch64 system (Snapdragon 8cx Gen 3) which lacks a vDSO for
> getcpu() we measured an improvement from 130 ns to 1 ns while on x86_64
> (i7-8550U) which has a vDSO we measured a more modest improvement from
> 10 ns to 2 ns.
> 
> Tested on i386, aarch64 and x86_64.

The failures reported by the Linaro-TCWG-CI on arm seem to be only an issue of 
symbol sorting in the abilist files. Is there tooling to regenerate those for 
all architectures?

Thanks,

Michael
  
Adhemerval Zanella Feb. 2, 2024, 4:18 p.m. UTC | #2
On 02/02/24 12:40, Michael Jeanson wrote:
> On 2024-02-01 14:36, Michael Jeanson wrote:
>> This series rebases the standalone "Add rseq extensible ABI" patch on
>> current master and adds an accelerated getcpu() implementation using the
>> rseq extensible ABI with initial support for aarch64 and x86_64.
>>
>> On an aarch64 system (Snapdragon 8cx Gen 3) which lacks a vDSO for
>> getcpu() we measured an improvement from 130 ns to 1 ns while on x86_64
>> (i7-8550U) which has a vDSO we measured a more modest improvement from
>> 10 ns to 2 ns.
>>
>> Tested on i386, aarch64 and x86_64.
> 
> The failures reported by the Linaro-TCWG-CI on arm seem to be only an issue of symbol sorting in the abilist files. Is there tooling to regenerate those for all architectures?

The 'make update-abi' on the build folder will sort this out.