From patchwork Wed Dec 6 19:10:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Jeanson X-Patchwork-Id: 81555 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 70E823858025 for ; Wed, 6 Dec 2023 19:10:46 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by sourceware.org (Postfix) with ESMTPS id 955903858D3C for ; Wed, 6 Dec 2023 19:10:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 955903858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=efficios.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 955903858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701889832; cv=none; b=jf1CHc8k6FiZiEpYrnZEagSJ39PBzkQRwd0Cpibzsf/4wDdhCJw6zeSpyyaSHzOInH0a/KKocq/RlAbW2u37r6k1Q1/5AaMHycinDr+0CDeodshOGfPhUbjw9ig391KgBoMD0VvdHZ7TnosFNS1xssZWXrfVOp/IVfoOjisWNxw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701889832; c=relaxed/simple; bh=rG8xNh3plY7LPc4ulLBO6CGfpB4Sy24lmLhJpxMZ9ro=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=lBkZIFVLjCfoWJFeDAvdaZE8LgqU3jzk6+i4MvHS2v2xqOth/Iur5WI8sVXnXzt1pNOo9uBOJ75GEqhJCBXLZtlpV7OLB7d8rTPbXU3uXEOmcItx0OG8KzkL7BCQmhlokguluXRxt6OTDsCMUN30+T1oergFZ2Cm7AqjUV0mm40= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1701889826; bh=rG8xNh3plY7LPc4ulLBO6CGfpB4Sy24lmLhJpxMZ9ro=; h=From:To:Cc:Subject:Date:From; b=MyWZL4F6QCCQg3haFfhmAUMnr1/8LcV2s5n0CY8yUgMG5VU2DUJmbE6rSe5Q+e+ob YCvgrkzqCbXpl9+oPZbKEyL1+kSHmTg4wIVsP/fafbIHnyTfnt1/d48g+hlbuBMS/L HAqJaaAjzxp4+Kc+JfpwhN1KBSVceySTi4qnw/vBqSHaiSRcjHRB38XCToHTqYpvsF RmntjFPQR87qbtVP5IAH82lNOkz7FmoRvwpl54kQlHdYP/+DHdUz7cp23UC/Cj8IAH UtgxZ2LTR28Ksm6lDVAVaCtlCwUBAbuAllTppnt29cLTw/hZvg5Ki/bk1BhbPbTf5u O51iHtoeIC2Lg== Received: from laptop-mjeanson.internal.efficios.com (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Sln6V5qJmz1fQj; Wed, 6 Dec 2023 14:10:26 -0500 (EST) From: Michael Jeanson To: libc-alpha@sourceware.org Cc: Michael Jeanson , Mathieu Desnoyers , Florian Weimer , Carlos O'Donell Subject: [PATCH v5] Add rseq extensible ABI support Date: Wed, 6 Dec 2023 14:10:14 -0500 Message-Id: <20231206191014.201286-1-mjeanson@efficios.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding rseq features past the initial 32 bytes of the original ABI. While the rseq features in the latest kernel still fit within the original ABI size, there are currently only 4 bytes left. It would thus be a good time to add support for the extensible ABI so that when new features are added, they are immediately available to GNU libc users. We use the ELF auxiliary vectors to query the kernel for the size and alignment of the rseq area, if this fails we default to the original fixed size and alignment of '32' which the kernel will accept as a compatibility mode with the original ABI. This makes the size of the rseq area variable and thus requires to relocate it out of 'struct pthread'. We chose to move it after (in block allocation order) the last TLS block since it required a fairly small modification to the TLS block allocator and did not interfere with the main executable TLS block which must always be first. [1] https://lore.kernel.org/all/20221122203932.231377-4-mathieu.desnoyers@efficios.com/ Signed-off-by: Michael Jeanson Co-Authored-By: Mathieu Desnoyers Signed-off-By: Mathieu Desnoyers Cc: Florian Weimer Cc: Carlos O'Donell --- Changes since RFC v1: - Insert the rseq area after the last TLS block - Add proper support for TLS_TCB_AT_TP variant Changes since RFC v2: - Set __rseq_size even when the registration fails - Adjust rseq tests to the new ABI - Added support for statically linked executables Changes since RFC v3: - Fix RSEQ_SETMEM for rseq disabled - Replace sys/auxv.h usage with dl-parse_auxv.h - Fix offset for TLS_TCB_AT_TP with statically linked executables - Zero the rseq area before registration Changes since RFC v4: - Move dynamic linker defines to a header file - Fix alignment when tls block align is smaller than rseq align with statically linked executables - Add statically linked rseq tests - Revert: Set __rseq_size even when the registration fails - Use minimum size when rseq is disabled by tunable --- csu/libc-tls.c | 64 ++++++++++++++++-- elf/dl-tls.c | 66 +++++++++++++++++++ elf/rtld_static_init.c | 12 ++++ nptl/descr.h | 20 +----- nptl/pthread_create.c | 2 +- sysdeps/generic/dl-rseq.h | 23 +++++++ sysdeps/generic/ldsodefs.h | 12 ++++ sysdeps/i386/nptl/tcb-access.h | 56 ++++++++++++++++ sysdeps/nptl/dl-tls_init_tp.c | 4 +- sysdeps/nptl/tcb-access.h | 5 ++ sysdeps/unix/sysv/linux/Makefile | 10 +++ sysdeps/unix/sysv/linux/dl-parse_auxv.h | 3 + sysdeps/unix/sysv/linux/rseq-internal.h | 29 ++++++-- sysdeps/unix/sysv/linux/sched_getcpu.c | 3 +- .../unix/sysv/linux/tst-rseq-disable-static.c | 1 + sysdeps/unix/sysv/linux/tst-rseq-disable.c | 19 +++--- .../unix/sysv/linux/tst-rseq-nptl-static.c | 1 + sysdeps/unix/sysv/linux/tst-rseq-static.c | 1 + sysdeps/unix/sysv/linux/tst-rseq.c | 22 +++++-- sysdeps/unix/sysv/linux/tst-rseq.h | 8 ++- sysdeps/x86_64/nptl/tcb-access.h | 56 ++++++++++++++++ 21 files changed, 373 insertions(+), 44 deletions(-) create mode 100644 sysdeps/generic/dl-rseq.h create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable-static.c create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-static.c diff --git a/csu/libc-tls.c b/csu/libc-tls.c index cdf6442c02..61edd20efc 100644 --- a/csu/libc-tls.c +++ b/csu/libc-tls.c @@ -26,6 +26,8 @@ #include #include #include +#include +#include #ifdef SHARED #error makefile bug, this file is for static only @@ -62,6 +64,18 @@ size_t _dl_tls_static_surplus; dynamic TLS access (e.g. with TLSDESC). */ size_t _dl_tls_static_optional; +/* Size of the features present in the rseq area. */ +size_t _dl_tls_rseq_feature_size; + +/* Alignment requirement of the rseq area. */ +size_t _dl_tls_rseq_align; + +/* Size of the rseq area in the static TLS block. */ +size_t _dl_tls_rseq_size; + +/* Offset of the rseq area from the thread pointer. */ +ptrdiff_t _dl_tls_rseq_offset; + /* Generation counter for the dtv. */ size_t _dl_tls_generation; @@ -110,6 +124,7 @@ __libc_setup_tls (void) size_t filesz = 0; void *initimage = NULL; size_t align = 0; + size_t tls_blocks_size = 0; size_t max_align = TCB_ALIGNMENT; size_t tcb_offset; const ElfW(Phdr) *phdr; @@ -135,22 +150,61 @@ __libc_setup_tls (void) /* Calculate the size of the static TLS surplus, with 0 auditors. */ _dl_tls_static_surplus_init (0); + /* Get the rseq auxiliary vectors, 0 is returned when not implemented + and we then default to the rseq ABI minimums. When rseq is disabled just + use the minimums. */ + size_t rseq_size = TLS_DL_RSEQ_MIN_SIZE; + size_t rseq_align = TLS_DL_RSEQ_MIN_ALIGN; + bool do_rseq = true; + do_rseq = TUNABLE_GET_FULL (glibc, pthread, rseq, int, NULL); + if (do_rseq) + { + rseq_size = MAX (GLRO(dl_tls_rseq_feature_size), TLS_DL_RSEQ_MIN_SIZE); + rseq_align = MAX (GLRO(dl_tls_rseq_align), TLS_DL_RSEQ_MIN_ALIGN); + } + + /* Make sure the rseq area size is a multiple of the requested + aligment. */ + rseq_size = roundup (rseq_size, rseq_align); + + /* Increase the max_align if necessary. */ + max_align = MAX (max_align, rseq_align); + + /* Record the rseq_area block size. */ + GLRO (dl_tls_rseq_size) = rseq_size; + /* We have to set up the TCB block which also (possibly) contains 'errno'. Therefore we avoid 'malloc' which might touch 'errno'. Instead we use 'sbrk' which would only uses 'errno' if it fails. In this case we are right away out of memory and the user gets what she/he deserves. */ #if TLS_TCB_AT_TP + /* Before the the thread pointer, add the aligned tls block size and then + align the rseq area block on top. */ + tls_blocks_size = roundup (roundup (memsz, align ?: 1) + rseq_size, rseq_align); + + /* Record the rseq_area offset. The offset is negative with TLS_TCB_AT_TP + because the TLS blocks are located before the thread pointer. */ + GLRO (dl_tls_rseq_offset) = -tls_blocks_size; + /* Align the TCB offset to the maximum alignment, as _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign and dl_tls_static_align. */ - tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align); + tcb_offset = roundup (tls_blocks_size + GLRO(dl_tls_static_surplus), max_align); tlsblock = _dl_early_allocate (tcb_offset + TLS_INIT_TCB_SIZE + max_align); if (tlsblock == NULL) _startup_fatal_tls_error (); #elif TLS_DTV_AT_TP + /* After the thread pointer, add the non-aligned tls block size and then + align the rseq area block on top. */ + tls_blocks_size = roundup (memsz + rseq_size, rseq_align); + + /* Record the rseq_area offset. The offset is positive with TLS_DTV_AT_TP + because the TLS blocks are located after the thread pointer. */ + GLRO (dl_tls_rseq_offset) = tls_blocks_size - rseq_size; + tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1); - tlsblock = _dl_early_allocate (tcb_offset + memsz + max_align + tlsblock = _dl_early_allocate (tcb_offset + tls_blocks_size + max_align + TLS_PRE_TCB_SIZE + GLRO(dl_tls_static_surplus)); if (tlsblock == NULL) @@ -209,11 +263,9 @@ __libc_setup_tls (void) /* static_slotinfo.slotinfo[1].gen = 0; -- Already zero. */ static_slotinfo.slotinfo[1].map = main_map; - memsz = roundup (memsz, align ?: 1); - #if TLS_DTV_AT_TP - memsz += tcb_offset; + tls_blocks_size += tcb_offset; #endif - init_static_tls (memsz, MAX (TCB_ALIGNMENT, max_align)); + init_static_tls (tls_blocks_size, MAX (TCB_ALIGNMENT, max_align)); } diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 70446e71a8..4c8d002f2b 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -27,6 +27,7 @@ #include #include +#include #include #if PTHREAD_IN_LIBC @@ -298,6 +299,37 @@ _dl_determine_tlsoffset (void) slotinfo[cnt].map->l_tls_offset = off; } + /* Insert the rseq area block after the last TLS block. */ + + /* Get the rseq auxiliary vectors, 0 is returned when not implemented + and we then default to the rseq ABI minimums. When rseq is disabled just + use the minimums. */ + size_t rseq_size = TLS_DL_RSEQ_MIN_SIZE; + size_t rseq_align = TLS_DL_RSEQ_MIN_ALIGN; + bool do_rseq = true; + do_rseq = TUNABLE_GET_FULL (glibc, pthread, rseq, int, NULL); + if (do_rseq) + { + rseq_size = MAX (GLRO(dl_tls_rseq_feature_size), TLS_DL_RSEQ_MIN_SIZE); + rseq_align = MAX (GLRO(dl_tls_rseq_align), TLS_DL_RSEQ_MIN_ALIGN); + } + + /* Make sure the rseq area size is a multiple of the requested + aligment. */ + rseq_size = roundup (rseq_size, rseq_align); + + /* Add the rseq area block to the global offset. */ + offset = roundup (offset, rseq_align) + rseq_size; + + /* Increase the max_align if necessary. */ + max_align = MAX (max_align, rseq_align); + + /* Record the rseq_area block size and offset. The offset is negative + with TLS_TCB_AT_TP because the TLS blocks are located before the + thread pointer. */ + GLRO (dl_tls_rseq_offset) = -offset; + GLRO (dl_tls_rseq_size) = rseq_size; + GL(dl_tls_static_used) = offset; GLRO (dl_tls_static_size) = (roundup (offset + GLRO(dl_tls_static_surplus), max_align) @@ -343,6 +375,40 @@ _dl_determine_tlsoffset (void) offset = off + slotinfo[cnt].map->l_tls_blocksize - firstbyte; } + /* Insert the rseq area block after the last TLS block. */ + + /* Get the rseq auxiliary vectors, 0 is returned when not implemented + and we then default to the rseq ABI minimums. When rseq is disabled just + use the minimums. */ + size_t rseq_size = TLS_DL_RSEQ_MIN_SIZE; + size_t rseq_align = TLS_DL_RSEQ_MIN_ALIGN; + bool do_rseq = true; + do_rseq = TUNABLE_GET_FULL (glibc, pthread, rseq, int, NULL); + if (do_rseq) + { + rseq_size = MAX (GLRO(dl_tls_rseq_feature_size), TLS_DL_RSEQ_MIN_SIZE); + rseq_align = MAX (GLRO(dl_tls_rseq_align), TLS_DL_RSEQ_MIN_ALIGN); + } + + /* Make sure the rseq area size is a multiple of the requested + aligment. */ + rseq_size = roundup (rseq_size, rseq_align); + + /* Align the global offset to the beginning of the rseq area. */ + offset = roundup (offset, rseq_align); + + /* Record the rseq_area block size and offset. The offset is positive + with TLS_DTV_AT_TP because the TLS blocks are located after the + thread pointer. */ + GLRO (dl_tls_rseq_size) = rseq_size; + GLRO (dl_tls_rseq_offset) = offset; + + /* Add the rseq area block to the global offset. */ + offset += rseq_size; + + /* Increase the max_align if necessary. */ + max_align = MAX (max_align, rseq_align); + GL(dl_tls_static_used) = offset; GLRO (dl_tls_static_size) = roundup (offset + GLRO(dl_tls_static_surplus), TCB_ALIGNMENT); diff --git a/elf/rtld_static_init.c b/elf/rtld_static_init.c index aec8cc056b..5d7ee37d67 100644 --- a/elf/rtld_static_init.c +++ b/elf/rtld_static_init.c @@ -78,6 +78,18 @@ __rtld_static_init (struct link_map *map) extern __typeof (dl->_dl_tls_static_size) _dl_tls_static_size attribute_hidden; dl->_dl_tls_static_size = _dl_tls_static_size; + extern __typeof (dl->_dl_tls_rseq_feature_size) _dl_tls_rseq_feature_size + attribute_hidden; + dl->_dl_tls_rseq_feature_size = _dl_tls_rseq_feature_size; + extern __typeof (dl->_dl_tls_rseq_align) _dl_tls_rseq_align + attribute_hidden; + dl->_dl_tls_rseq_align = _dl_tls_rseq_align; + extern __typeof (dl->_dl_tls_rseq_size) _dl_tls_rseq_size + attribute_hidden; + dl->_dl_tls_rseq_size = _dl_tls_rseq_size; + extern __typeof (dl->_dl_tls_rseq_offset) _dl_tls_rseq_offset + attribute_hidden; + dl->_dl_tls_rseq_offset = _dl_tls_rseq_offset; dl->_dl_find_object = _dl_find_object; __rtld_static_init_arch (map, dl); diff --git a/nptl/descr.h b/nptl/descr.h index 0171576c23..fadba1f619 100644 --- a/nptl/descr.h +++ b/nptl/descr.h @@ -404,25 +404,11 @@ struct pthread /* Used on strsignal. */ struct tls_internal_t tls_state; - /* rseq area registered with the kernel. Use a custom definition - here to isolate from kernel struct rseq changes. The - implementation of sched_getcpu needs acccess to the cpu_id field; - the other fields are unused and not included here. */ - union - { - struct - { - uint32_t cpu_id_start; - uint32_t cpu_id; - }; - char pad[32]; /* Original rseq area size. */ - } rseq_area __attribute__ ((aligned (32))); - /* Amount of end padding, if any, in this structure. - This definition relies on rseq_area being last. */ + This definition relies on tls_state being last. */ #define PTHREAD_STRUCT_END_PADDING \ - (sizeof (struct pthread) - offsetof (struct pthread, rseq_area) \ - + sizeof ((struct pthread) {}.rseq_area)) + (sizeof (struct pthread) - offsetof (struct pthread, tls_state) \ + + sizeof ((struct pthread) {}.tls_state)) } __attribute ((aligned (TCB_ALIGNMENT))); static inline bool diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 63cb684f04..4ee0f24741 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -691,7 +691,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, /* Inherit rseq registration state. Without seccomp filters, rseq registration will either always fail or always succeed. */ - if ((int) THREAD_GETMEM_VOLATILE (self, rseq_area.cpu_id) >= 0) + if ((int) RSEQ_GETMEM_VOLATILE (rseq_get_area(), cpu_id) >= 0) pd->flags |= ATTR_FLAG_DO_RSEQ; /* Initialize the field for the ID of the thread which is waiting diff --git a/sysdeps/generic/dl-rseq.h b/sysdeps/generic/dl-rseq.h new file mode 100644 index 0000000000..c327dff56e --- /dev/null +++ b/sysdeps/generic/dl-rseq.h @@ -0,0 +1,23 @@ +/* RSEQ defines for the dynamic linker. Generic version. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Minimum size of the rseq area. */ +#define TLS_DL_RSEQ_MIN_SIZE 32 + +/* Minimum size of the rseq area alignment. */ +#define TLS_DL_RSEQ_MIN_ALIGN 32 diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index 9b50ddd09f..14ed361d94 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -610,6 +610,18 @@ struct rtld_global_ro See comments in elf/dl-tls.c where it is initialized. */ EXTERN size_t _dl_tls_static_surplus; + /* Size of the features present in the rseq area. */ + EXTERN size_t _dl_tls_rseq_feature_size; + + /* Alignment requirement of the rseq area. */ + EXTERN size_t _dl_tls_rseq_align; + + /* Size of the rseq area in the static TLS block. */ + EXTERN size_t _dl_tls_rseq_size; + + /* Offset of the rseq area from the thread pointer. */ + EXTERN ptrdiff_t _dl_tls_rseq_offset; + /* Name of the shared object to be profiled (if any). */ EXTERN const char *_dl_profile; /* Filename of the output file. */ diff --git a/sysdeps/i386/nptl/tcb-access.h b/sysdeps/i386/nptl/tcb-access.h index 28f0a5625f..e73802f5f5 100644 --- a/sysdeps/i386/nptl/tcb-access.h +++ b/sysdeps/i386/nptl/tcb-access.h @@ -123,3 +123,59 @@ "i" (offsetof (struct pthread, member)), \ "r" (idx)); \ }}) + + +/* Read member of the RSEQ area directly. */ +#define RSEQ_GETMEM_VOLATILE(descr, member) \ + ({ __typeof (descr->member) __value; \ + ptrdiff_t _rseq_offset = GLRO (dl_tls_rseq_offset); \ + _Static_assert (sizeof (__value) == 1 \ + || sizeof (__value) == 4 \ + || sizeof (__value) == 8, \ + "size of per-thread data"); \ + if (sizeof (__value) == 1) \ + asm volatile ("movb %%gs:%P2(%3),%b0" \ + : "=q" (__value) \ + : "0" (0), "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + else if (sizeof (__value) == 4) \ + asm volatile ("movl %%gs:%P1(%2),%0" \ + : "=r" (__value) \ + : "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + else /* 8 */ \ + { \ + asm volatile ("movl %%gs:%P1(%2),%%eax\n\t" \ + "movl %%gs:4+%P1(%2),%%edx" \ + : "=&A" (__value) \ + : "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + } \ + __value; }) + +/* Set member of the RSEQ area directly. */ +#define RSEQ_SETMEM(descr, member, value) \ + ({ \ + ptrdiff_t _rseq_offset = GLRO (dl_tls_rseq_offset); \ + _Static_assert (sizeof (descr->member) == 1 \ + || sizeof (descr->member) == 4 \ + || sizeof (descr->member) == 8, \ + "size of per-thread data"); \ + if (sizeof (descr->member) == 1) \ + asm volatile ("movb %b0,%%gs:%P1(%2)" : \ + : "iq" (value), \ + "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + else if (sizeof (descr->member) == 4) \ + asm volatile ("movl %0,%%gs:%P1(%2)" : \ + : "ir" (value), \ + "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + else /* 8 */ \ + { \ + asm volatile ("movl %%eax,%%gs:%P1(%2)\n\t" \ + "movl %%edx,%%gs:4+%P1(%2)" : \ + : "A" ((uint64_t) cast_to_integer (value)), \ + "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + }}) diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c index 2ed98c5a31..406f6ba716 100644 --- a/sysdeps/nptl/dl-tls_init_tp.c +++ b/sysdeps/nptl/dl-tls_init_tp.c @@ -108,7 +108,7 @@ __tls_init_tp (void) /* We need a writable view of the variables. They are in .data.relro and are not yet write-protected. */ extern unsigned int size __asm__ ("__rseq_size"); - size = sizeof (pd->rseq_area); + size = GLRO (dl_tls_rseq_size); } #ifdef RSEQ_SIG @@ -118,7 +118,7 @@ __tls_init_tp (void) if the rseq registration may have happened because RSEQ_SIG is defined. */ extern ptrdiff_t offset __asm__ ("__rseq_offset"); - offset = (char *) &pd->rseq_area - (char *) __thread_pointer (); + offset = GLRO (dl_tls_rseq_offset); #endif } diff --git a/sysdeps/nptl/tcb-access.h b/sysdeps/nptl/tcb-access.h index d6dfd41391..7092428d38 100644 --- a/sysdeps/nptl/tcb-access.h +++ b/sysdeps/nptl/tcb-access.h @@ -30,3 +30,8 @@ descr->member = (value) #define THREAD_SETMEM_NC(descr, member, idx, value) \ descr->member[idx] = (value) + +#define RSEQ_GETMEM_VOLATILE(descr, member) \ + THREAD_GETMEM_VOLATILE(descr, member) +#define RSEQ_SETMEM(descr, member, value) \ + THREAD_SETMEM(descr, member, value) diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile index 415aa1f14d..6bcf81461b 100644 --- a/sysdeps/unix/sysv/linux/Makefile +++ b/sysdeps/unix/sysv/linux/Makefile @@ -261,6 +261,11 @@ tests-internal += \ tst-rseq-disable \ # tests-internal +tests-static += \ + tst-rseq-disable-static \ + tst-rseq-static \ + # tests-static + tests-time64 += \ tst-adjtimex-time64 \ tst-clock_adjtime-time64 \ @@ -394,6 +399,7 @@ $(objpfx)tst-mount-compile.out: ../sysdeps/unix/sysv/linux/tst-mount-compile.py $(objpfx)tst-mount-compile.out: $(sysdeps-linux-python-deps) tst-rseq-disable-ENV = GLIBC_TUNABLES=glibc.pthread.rseq=0 +tst-rseq-disable-static-ENV = GLIBC_TUNABLES=glibc.pthread.rseq=0 endif # $(subdir) == misc @@ -655,4 +661,8 @@ tests += \ tests-internal += \ tst-rseq-nptl \ # tests-internal + +tests-static += \ + tst-rseq-nptl-static \ + # tests-static endif diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h index cf5e81bf2c..fa6e6db398 100644 --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h @@ -57,5 +57,8 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values) GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO]; #endif + GLRO(dl_tls_rseq_feature_size) = auxv_values[AT_RSEQ_FEATURE_SIZE]; + GLRO(dl_tls_rseq_align) = auxv_values[AT_RSEQ_ALIGN]; + DL_PLATFORM_AUXV } diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h index 294880c04e..d372f95315 100644 --- a/sysdeps/unix/sysv/linux/rseq-internal.h +++ b/sysdeps/unix/sysv/linux/rseq-internal.h @@ -24,6 +24,24 @@ #include #include #include +#include +#include + +/* rseq area registered with the kernel. Use a custom definition + here to isolate from kernel struct rseq changes. The + implementation of sched_getcpu needs acccess to the cpu_id field; + the other fields are unused and not included here. */ +struct rseq_area +{ + uint32_t cpu_id_start; + uint32_t cpu_id; +}; + +static inline struct rseq_area * +rseq_get_area(void) +{ + return (struct rseq_area *) ((char *) __thread_pointer() + GLRO (dl_tls_rseq_offset)); +} #ifdef RSEQ_SIG static inline bool @@ -31,20 +49,23 @@ rseq_register_current_thread (struct pthread *self, bool do_rseq) { if (do_rseq) { - int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area, - sizeof (self->rseq_area), + /* The kernel expects 'rseq_area->rseq_cs == NULL' on registration, zero + the whole rseq area. */ + memset(rseq_get_area(), 0, GLRO (dl_tls_rseq_size)); + int ret = INTERNAL_SYSCALL_CALL (rseq, rseq_get_area(), + GLRO (dl_tls_rseq_size), 0, RSEQ_SIG); if (!INTERNAL_SYSCALL_ERROR_P (ret)) return true; } - THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED); + RSEQ_SETMEM (rseq_get_area(), cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED); return false; } #else /* RSEQ_SIG */ static inline bool rseq_register_current_thread (struct pthread *self, bool do_rseq) { - THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED); + RSEQ_SETMEM (rseq_get_area(), cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED); return false; } #endif /* RSEQ_SIG */ diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c index 4457d714bc..6109c68625 100644 --- a/sysdeps/unix/sysv/linux/sched_getcpu.c +++ b/sysdeps/unix/sysv/linux/sched_getcpu.c @@ -19,6 +19,7 @@ #include #include #include +#include static int vsyscall_sched_getcpu (void) @@ -37,7 +38,7 @@ vsyscall_sched_getcpu (void) int sched_getcpu (void) { - int cpu_id = THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id); + int cpu_id = RSEQ_GETMEM_VOLATILE (rseq_get_area(), cpu_id); return __glibc_likely (cpu_id >= 0) ? cpu_id : vsyscall_sched_getcpu (); } #else /* RSEQ_SIG */ diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c b/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c new file mode 100644 index 0000000000..2687d13d3d --- /dev/null +++ b/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c @@ -0,0 +1 @@ +#include "tst-rseq-disable.c" diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable.c b/sysdeps/unix/sysv/linux/tst-rseq-disable.c index cc7e94b3fe..5aae20fbe4 100644 --- a/sysdeps/unix/sysv/linux/tst-rseq-disable.c +++ b/sysdeps/unix/sysv/linux/tst-rseq-disable.c @@ -26,27 +26,30 @@ #include #ifdef RSEQ_SIG +# include +# include "tst-rseq.h" + +static __thread struct rseq local_rseq; /* Check that rseq can be registered and has not been taken by glibc. */ static void check_rseq_disabled (void) { - struct pthread *pd = THREAD_SELF; + struct rseq *rseq_area = (struct rseq *) ((char *) __thread_pointer () + __rseq_offset); TEST_COMPARE (__rseq_flags, 0); - TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset - == (char *) &pd->rseq_area); TEST_COMPARE (__rseq_size, 0); - TEST_COMPARE ((int) pd->rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED); + TEST_COMPARE ((int) rseq_area->cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED); + + TEST_COMPARE (sizeof (local_rseq), RSEQ_TEST_MIN_SIZE); - int ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area), - 0, RSEQ_SIG); + int ret = syscall (__NR_rseq, &local_rseq, RSEQ_TEST_MIN_SIZE, 0, RSEQ_SIG); if (ret == 0) { - ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area), + ret = syscall (__NR_rseq, &local_rseq, RSEQ_TEST_MIN_SIZE, RSEQ_FLAG_UNREGISTER, RSEQ_SIG); TEST_COMPARE (ret, 0); - pd->rseq_area.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED; + rseq_area->cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED; } else { diff --git a/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c b/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c new file mode 100644 index 0000000000..6e2c923bb9 --- /dev/null +++ b/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c @@ -0,0 +1 @@ +#include "tst-rseq-nptl.c" diff --git a/sysdeps/unix/sysv/linux/tst-rseq-static.c b/sysdeps/unix/sysv/linux/tst-rseq-static.c new file mode 100644 index 0000000000..1d97f3bd3d --- /dev/null +++ b/sysdeps/unix/sysv/linux/tst-rseq-static.c @@ -0,0 +1 @@ +#include "tst-rseq.c" diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c index 16983503b1..6677d3ce30 100644 --- a/sysdeps/unix/sysv/linux/tst-rseq.c +++ b/sysdeps/unix/sysv/linux/tst-rseq.c @@ -31,18 +31,32 @@ # include # include # include +# include # include "tst-rseq.h" static void do_rseq_main_test (void) { - struct pthread *pd = THREAD_SELF; + size_t rseq_size = MAX (getauxval (AT_RSEQ_FEATURE_SIZE), RSEQ_TEST_MIN_SIZE); + size_t rseq_align = MAX (getauxval (AT_RSEQ_ALIGN), RSEQ_TEST_MIN_ALIGN); + struct rseq *rseq = __thread_pointer () + __rseq_offset; TEST_VERIFY_EXIT (rseq_thread_registered ()); TEST_COMPARE (__rseq_flags, 0); - TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset - == (char *) &pd->rseq_area); - TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area)); + TEST_COMPARE (__rseq_size, rseq_size); + /* The size of the rseq area must be a multiple of the alignment. */ + TEST_VERIFY ((__rseq_size % rseq_align) == 0); + /* The rseq area address must be aligned. */ + TEST_VERIFY (((unsigned long) rseq % rseq_align) == 0); +#if TLS_TCB_AT_TP + /* The rseq area block should come before the thread pointer and be at least 32 bytes. */ + TEST_VERIFY (__rseq_offset <= RSEQ_TEST_MIN_SIZE); +#elif TLS_DTV_AT_TP + /* The rseq area block should come after the thread pointer. */ + TEST_VERIFY (__rseq_offset >= 0); +#else +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" +#endif } static void diff --git a/sysdeps/unix/sysv/linux/tst-rseq.h b/sysdeps/unix/sysv/linux/tst-rseq.h index 95d12048df..cb621b76dd 100644 --- a/sysdeps/unix/sysv/linux/tst-rseq.h +++ b/sysdeps/unix/sysv/linux/tst-rseq.h @@ -23,11 +23,17 @@ #include #include #include +#include + +#define RSEQ_TEST_MIN_SIZE 32 +#define RSEQ_TEST_MIN_ALIGN 32 static inline bool rseq_thread_registered (void) { - return THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id) >= 0; + struct rseq_area *rseq = (struct rseq_area *) ((char *) __thread_pointer () + __rseq_offset); + + return __atomic_load_n (&rseq->cpu_id, __ATOMIC_RELAXED) >= 0; } static inline int diff --git a/sysdeps/x86_64/nptl/tcb-access.h b/sysdeps/x86_64/nptl/tcb-access.h index 110b1be44d..3a943b2fba 100644 --- a/sysdeps/x86_64/nptl/tcb-access.h +++ b/sysdeps/x86_64/nptl/tcb-access.h @@ -130,3 +130,59 @@ "i" (offsetof (struct pthread, member[0])), \ "r" (idx)); \ }}) + +/* Read member of the RSEQ area directly. */ +# define RSEQ_GETMEM_VOLATILE(descr, member) \ + ({ __typeof (descr->member) __value; \ + ptrdiff_t _rseq_offset = GLRO (dl_tls_rseq_offset); \ + _Static_assert (sizeof (__value) == 1 \ + || sizeof (__value) == 4 \ + || sizeof (__value) == 8, \ + "size of per-thread data"); \ + if (sizeof (__value) == 1) \ + asm volatile ("movb %%fs:%P2(%q3),%b0" \ + : "=q" (__value) \ + : "0" (0), "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + else if (sizeof (__value) == 4) \ + asm volatile ("movl %%fs:%P1(%q2),%0" \ + : "=r" (__value) \ + : "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + else /* 8 */ \ + { \ + asm volatile ("movq %%fs:%P1(%q2),%q0" \ + : "=r" (__value) \ + : "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + } \ + __value; }) + +/* Set member of the RSEQ area directly. */ +# define RSEQ_SETMEM(descr, member, value) \ + ({ \ + ptrdiff_t _rseq_offset = GLRO (dl_tls_rseq_offset); \ + _Static_assert (sizeof (descr->member) == 1 \ + || sizeof (descr->member) == 4 \ + || sizeof (descr->member) == 8, \ + "size of per-thread data"); \ + if (sizeof (descr->member) == 1) \ + asm volatile ("movb %b0,%%fs:%P1(%q2)" : \ + : "iq" (value), \ + "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + else if (sizeof (descr->member) == 4) \ + asm volatile ("movl %0,%%fs:%P1(%q2)" : \ + : IMM_MODE (value), \ + "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + else /* 8 */ \ + { \ + /* Since movq takes a signed 32-bit immediate or a register source \ + operand, use "er" constraint for 32-bit signed integer constant \ + or register. */ \ + asm volatile ("movq %q0,%%fs:%P1(%q2)" : \ + : "er" ((uint64_t) cast_to_integer (value)), \ + "i" (offsetof (struct rseq_area, member)), \ + "r" (_rseq_offset)); \ + }})