Message ID | 20241121190924.837446-1-mjeanson@efficios.com (mailing list archive) |
---|---|
Headers |
Return-Path: <libc-alpha-bounces~patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 478793857BBA for <patchwork@sourceware.org>; Thu, 21 Nov 2024 19:18:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 478793857BBA Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=efficios.com header.i=@efficios.com header.a=rsa-sha256 header.s=smtpout1 header.b=Sl7aOLrO X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtpout.efficios.com (smtpout.efficios.com [IPv6:2607:5300:203:b2ee::31e5]) by sourceware.org (Postfix) with ESMTPS id 4D0483857C7A for <libc-alpha@sourceware.org>; Thu, 21 Nov 2024 19:09:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4D0483857C7A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=efficios.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4D0483857C7A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:5300:203:b2ee::31e5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732216176; cv=none; b=I0VHGq+KiqekLD7q0m03dSAeWJ2UWiDyvmnIrDqhkcA9Ceu1OV3IiTscYNCKqIXsKy1YS9yQtAKvuneXXXNz0qCN9MR3XCh04EhAlLYKspDqFtpb07ztrwPcvpIiEbjz13NtELgWrIhdU3qaXfYQtBZyIwV3soE48BY6cuFpVDM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732216176; c=relaxed/simple; bh=2nilNZRFESX3PNhBecTZ0F/LjCOur8VpdV66DkSyQ3E=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=nkSjV78N0/lJWT2MNnxd+o/e6qHrGi+YvUT2qyOghO9G1sgwzrii+zRcbNC0KbMTNTU3CbNclT/MnQOy1F3hcCA8dq8pUa9AOgZ9BwFhM3E5oiyv3z5st/LYceENa/B+jzuo6cz0qEwwSZcGaYFVxm0yyg81o9rK6092XyAn3aA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4D0483857C7A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1732216174; bh=2nilNZRFESX3PNhBecTZ0F/LjCOur8VpdV66DkSyQ3E=; h=From:To:Cc:Subject:Date:From; b=Sl7aOLrOUmWTyS4aWVIrCjs148a6MYVnhqXmRGueRxM/gr/5ApRnC6AV3bbAB0O3d 1edzE3rgTbH3QgRGX8Cf7p3l4tOxWUVij6guDdENRZjFLKhKvIK2BfY0PYSVG9/lDe q/qh3OQgh3vO1E84khzJVu4R0CB0Lcb8uu6nzkTj5uj4twxKuONLsvohGSxTp7iWHk 01OUS1eCxV+ntjqgKusIh9SMy1gqzch0VObUhkG8/l6qoIYVtdUpGQ3Lgp9995Dnww bxcS1jaQf5puAKq7tBG7IVw2W+kV5tNXwOEqLldTFbJ7pP54H/ucxe4uF5xzzjuxIC +BXuagh+X4oFg== Received: from laptop-mjeanson.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XvSTV5T1pz13nc; Thu, 21 Nov 2024 14:09:34 -0500 (EST) From: Michael Jeanson <mjeanson@efficios.com> To: libc-alpha@sourceware.org Cc: Michael Jeanson <mjeanson@efficios.com>, Florian Weimer <fweimer@redhat.com>, Carlos O'Donell <carlos@redhat.com>, DJ Delorie <dj@redhat.com>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Subject: [PATCH v14 0/9] Add rseq extensible ABI support Date: Thu, 21 Nov 2024 14:08:50 -0500 Message-ID: <20241121190924.837446-1-mjeanson@efficios.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org |
Series |
Add rseq extensible ABI support
|
|
Message
Michael Jeanson
Nov. 21, 2024, 7:08 p.m. UTC
Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding rseq features past the initial 32 bytes of the original ABI. While the rseq features in the latest kernel still fit within the original ABI size, there are currently only 4 bytes left. It would thus be a good time to add support for the extensible ABI so that when new features are added, they are immediately available to GNU libc users. We use the ELF auxiliary vector to query the kernel for the size and alignment of the rseq area, if this fails we default to the original fixed size and alignment of '32' which the kernel will accept as a compatibility mode with the original ABI. This makes the size of the rseq area variable and thus requires to relocate it out of 'struct pthread'. We chose to move it after (in block allocation order) the last TLS block inside the static TLS block allocation. It required a fairly small modification to the TLS block allocator and did not interfere with the main executable TLS block which must always be the first block relative to the thread pointer. [1] https://lore.kernel.org/all/20221122203932.231377-4-mathieu.desnoyers@efficios.com/ Cc: Florian Weimer <fweimer@redhat.com> Cc: Carlos O'Donell <carlos@redhat.com> Cc: DJ Delorie <dj@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- Changes since v13: - Ensure that the VOLATILE variants of the RSEQ_ accessors static asserts on 64bit types on 32bit architectures - Move rtld_hidden_proto rseq symbols to a separate patch Changes since v12: - Set _rseq_size to 0 on registration failure - Split RSEQ_SET/GETMEM from THREAD_SET/GETMEM - Rename rseq_get_area() to RSEQ_SELF() - Add rtld_hidden_proto to __rseq_size and __rseq_offset - Add comment and variable array member to 'struct rseq_area' - Style nits Changes since v11: - Removed _dl_rseq_feature_size, use __rseq_size instead - Replace GLRO(dl_rseq_align) with a hidden global variable _rseq_align - __rseq_size is now set directly in _dl_parse_auxv, set it to 0 when the main thread registration fails or is disabled by tunable Changes since v10: - Split the patchset in smaller patches - Rebased on 'Make __rseq_size useful for feature detection' - Remove 'rseq' from the generic TLS code, add 'extra TLS' - Add thread_pointer.h for all Linux architectures - Fix build on the Hurd Changes since v8: - Fix copyright year in sysdeps/generic/dl-rseq.h - Clarify the the tcb math comments - Add a comment to clarify what enforces the aligment requirements of a pointer calculated from the rseq_offset - Remove nonsensical test in tst-rseq-disable - Add comments to clarify why the rseq size is 0 when registration fails or is disabled - Add comments to explain why whe allocate and rseq area block even when the registration is disabled by tunable - Rename 'rseq_size' -> 'rseq_alloc_size' and 'dl_tls_rseq_size' -> 'dl_tls_rseq_alloc_size' to clarify the distinction between the allocated rseq size and the size reported to application code in '__rseq_size' Changes since v6: - Fix tst-rseq for feature size over 32 bytes - Rebased on 'nptl: fix potential merge of __rseq_* relro symbols' Changes since v5: - Fix TLS_DTV_AT_TP rseq offset with statically linked executables Changes since RFC v4: - Move dynamic linker defines to a header file - Fix alignment when tls block align is smaller than rseq align with statically linked executables - Add statically linked rseq tests - Revert: Set __rseq_size even when the registration fails - Use minimum size when rseq is disabled by tunable Changes since RFC v3: - Fix RSEQ_SETMEM for rseq disabled - Replace sys/auxv.h usage with dl-parse_auxv.h - Fix offset for TLS_TCB_AT_TP with statically linked executables - Zero the rseq area before registration Changes since RFC v2: - Set __rseq_size even when the registration fails - Adjust rseq tests to the new ABI - Added support for statically linked executables Changes since RFC v1: - Insert the rseq area after the last TLS block - Add proper support for TLS_TCB_AT_TP variant --- Michael Jeanson (9): nptl: initialize cpu_id_start prior to rseq registration nptl: Add rseq auxvals Add generic 'extra TLS' Add Linux 'extra TLS' nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset nptl: Introduce <rseq-access.h> for RSEQ_* accessors nptl: Move the rseq area to the 'extra TLS' block nptl: Remove the rseq area from 'struct pthread' Linux: Update internal copy of '<sys/rseq.h>' csu/libc-tls.c | 59 +++++++++-- elf/dl-tls.c | 59 +++++++++++ nptl/descr.h | 22 +---- nptl/pthread_create.c | 2 +- sysdeps/generic/dl-extra_tls.h | 45 +++++++++ sysdeps/i386/nptl/rseq-access.h | 98 +++++++++++++++++++ sysdeps/nptl/dl-tls_init_tp.c | 23 ++--- sysdeps/nptl/rseq-access.h | 58 +++++++++++ sysdeps/unix/sysv/linux/Makefile | 10 ++ sysdeps/unix/sysv/linux/dl-extra_tls.h | 70 +++++++++++++ sysdeps/unix/sysv/linux/dl-parse_auxv.h | 7 ++ sysdeps/unix/sysv/linux/dl-rseq-symbols.S | 27 +++-- sysdeps/unix/sysv/linux/rseq-internal.h | 94 ++++++++++++++---- sysdeps/unix/sysv/linux/sched_getcpu.c | 3 +- sysdeps/unix/sysv/linux/sys/rseq.h | 11 +++ .../unix/sysv/linux/tst-rseq-disable-static.c | 1 + sysdeps/unix/sysv/linux/tst-rseq-disable.c | 64 ++++++++++-- .../unix/sysv/linux/tst-rseq-nptl-static.c | 1 + sysdeps/unix/sysv/linux/tst-rseq-static.c | 1 + sysdeps/unix/sysv/linux/tst-rseq.c | 84 +++++++++++++--- sysdeps/unix/sysv/linux/tst-rseq.h | 3 +- sysdeps/x86_64/nptl/rseq-access.h | 79 +++++++++++++++ 22 files changed, 725 insertions(+), 96 deletions(-) create mode 100644 sysdeps/generic/dl-extra_tls.h create mode 100644 sysdeps/i386/nptl/rseq-access.h create mode 100644 sysdeps/nptl/rseq-access.h create mode 100644 sysdeps/unix/sysv/linux/dl-extra_tls.h create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable-static.c create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-static.c create mode 100644 sysdeps/x86_64/nptl/rseq-access.h
Comments
On 2024-11-21 14:08, Michael Jeanson wrote: > Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding > rseq features past the initial 32 bytes of the original ABI. > > While the rseq features in the latest kernel still fit within the > original ABI size, there are currently only 4 bytes left. It would thus > be a good time to add support for the extensible ABI so that when new > features are added, they are immediately available to GNU libc users. > > We use the ELF auxiliary vector to query the kernel for the size and > alignment of the rseq area, if this fails we default to the original > fixed size and alignment of '32' which the kernel will accept as a > compatibility mode with the original ABI. > > This makes the size of the rseq area variable and thus requires to > relocate it out of 'struct pthread'. We chose to move it after (in block > allocation order) the last TLS block inside the static TLS block > allocation. It required a fairly small modification to the TLS block > allocator and did not interfere with the main executable TLS block which > must always be the first block relative to the thread pointer. > > [1] https://lore.kernel.org/all/20221122203932.231377-4-mathieu.desnoyers@efficios.com/ Ping? I see the release date for 2.41 and I'm getting a bit nervous. Michael
On 2024-12-06 10:50, Michael Jeanson wrote: > On 2024-11-21 14:08, Michael Jeanson wrote: >> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding >> rseq features past the initial 32 bytes of the original ABI. >> >> While the rseq features in the latest kernel still fit within the >> original ABI size, there are currently only 4 bytes left. It would thus >> be a good time to add support for the extensible ABI so that when new >> features are added, they are immediately available to GNU libc users. >> >> We use the ELF auxiliary vector to query the kernel for the size and >> alignment of the rseq area, if this fails we default to the original >> fixed size and alignment of '32' which the kernel will accept as a >> compatibility mode with the original ABI. >> >> This makes the size of the rseq area variable and thus requires to >> relocate it out of 'struct pthread'. We chose to move it after (in block >> allocation order) the last TLS block inside the static TLS block >> allocation. It required a fairly small modification to the TLS block >> allocator and did not interfere with the main executable TLS block which >> must always be the first block relative to the thread pointer. >> >> [1] https://lore.kernel.org/all/20221122203932.231377-4-mathieu.desnoyers@efficios.com/ > > Ping? I see the release date for 2.41 and I'm getting a bit nervous. Florian, Carlos, DJ, what is missing to get this into 2.41 ? Thanks, Mathieu > > Michael >
On 2024-11-21 14:08, Michael Jeanson wrote: > Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding > rseq features past the initial 32 bytes of the original ABI. Hi Florian, I am currently implementing a new rseq feature which requires adding an 8-byte field, which goes beyond the 32 initial bytes (it reaches 36 bytes). I need to disable the glibc rseq integration with the tunable for my development testing, but I am reaching a point where we will shortly need the extensible rseq integration in glibc. How is review progressing ? How can we help ? Thanks, Mathieu > > While the rseq features in the latest kernel still fit within the > original ABI size, there are currently only 4 bytes left. It would thus > be a good time to add support for the extensible ABI so that when new > features are added, they are immediately available to GNU libc users. > > We use the ELF auxiliary vector to query the kernel for the size and > alignment of the rseq area, if this fails we default to the original > fixed size and alignment of '32' which the kernel will accept as a > compatibility mode with the original ABI. > > This makes the size of the rseq area variable and thus requires to > relocate it out of 'struct pthread'. We chose to move it after (in block > allocation order) the last TLS block inside the static TLS block > allocation. It required a fairly small modification to the TLS block > allocator and did not interfere with the main executable TLS block which > must always be the first block relative to the thread pointer. > > [1] https://lore.kernel.org/all/20221122203932.231377-4-mathieu.desnoyers@efficios.com/ > > Cc: Florian Weimer <fweimer@redhat.com> > Cc: Carlos O'Donell <carlos@redhat.com> > Cc: DJ Delorie <dj@redhat.com> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > --- > Changes since v13: > - Ensure that the VOLATILE variants of the RSEQ_ accessors static > asserts on 64bit types on 32bit architectures > - Move rtld_hidden_proto rseq symbols to a separate patch > Changes since v12: > - Set _rseq_size to 0 on registration failure > - Split RSEQ_SET/GETMEM from THREAD_SET/GETMEM > - Rename rseq_get_area() to RSEQ_SELF() > - Add rtld_hidden_proto to __rseq_size and __rseq_offset > - Add comment and variable array member to 'struct rseq_area' > - Style nits > Changes since v11: > - Removed _dl_rseq_feature_size, use __rseq_size instead > - Replace GLRO(dl_rseq_align) with a hidden global variable _rseq_align > - __rseq_size is now set directly in _dl_parse_auxv, set it to 0 when > the main thread registration fails or is disabled by tunable > Changes since v10: > - Split the patchset in smaller patches > - Rebased on 'Make __rseq_size useful for feature detection' > - Remove 'rseq' from the generic TLS code, add 'extra TLS' > - Add thread_pointer.h for all Linux architectures > - Fix build on the Hurd > Changes since v8: > - Fix copyright year in sysdeps/generic/dl-rseq.h > - Clarify the the tcb math comments > - Add a comment to clarify what enforces the aligment requirements of a > pointer calculated from the rseq_offset > - Remove nonsensical test in tst-rseq-disable > - Add comments to clarify why the rseq size is 0 when registration fails > or is disabled > - Add comments to explain why whe allocate and rseq area block even when > the registration is disabled by tunable > - Rename 'rseq_size' -> 'rseq_alloc_size' and 'dl_tls_rseq_size' -> > 'dl_tls_rseq_alloc_size' to clarify the distinction between the > allocated rseq size and the size reported to application code in > '__rseq_size' > Changes since v6: > - Fix tst-rseq for feature size over 32 bytes > - Rebased on 'nptl: fix potential merge of __rseq_* relro symbols' > Changes since v5: > - Fix TLS_DTV_AT_TP rseq offset with statically linked executables > Changes since RFC v4: > - Move dynamic linker defines to a header file > - Fix alignment when tls block align is smaller than rseq align with > statically linked executables > - Add statically linked rseq tests > - Revert: Set __rseq_size even when the registration fails > - Use minimum size when rseq is disabled by tunable > Changes since RFC v3: > - Fix RSEQ_SETMEM for rseq disabled > - Replace sys/auxv.h usage with dl-parse_auxv.h > - Fix offset for TLS_TCB_AT_TP with statically linked executables > - Zero the rseq area before registration > Changes since RFC v2: > - Set __rseq_size even when the registration fails > - Adjust rseq tests to the new ABI > - Added support for statically linked executables > Changes since RFC v1: > - Insert the rseq area after the last TLS block > - Add proper support for TLS_TCB_AT_TP variant > > --- > Michael Jeanson (9): > nptl: initialize cpu_id_start prior to rseq registration > nptl: Add rseq auxvals > Add generic 'extra TLS' > Add Linux 'extra TLS' > nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset > nptl: Introduce <rseq-access.h> for RSEQ_* accessors > nptl: Move the rseq area to the 'extra TLS' block > nptl: Remove the rseq area from 'struct pthread' > Linux: Update internal copy of '<sys/rseq.h>' > > csu/libc-tls.c | 59 +++++++++-- > elf/dl-tls.c | 59 +++++++++++ > nptl/descr.h | 22 +---- > nptl/pthread_create.c | 2 +- > sysdeps/generic/dl-extra_tls.h | 45 +++++++++ > sysdeps/i386/nptl/rseq-access.h | 98 +++++++++++++++++++ > sysdeps/nptl/dl-tls_init_tp.c | 23 ++--- > sysdeps/nptl/rseq-access.h | 58 +++++++++++ > sysdeps/unix/sysv/linux/Makefile | 10 ++ > sysdeps/unix/sysv/linux/dl-extra_tls.h | 70 +++++++++++++ > sysdeps/unix/sysv/linux/dl-parse_auxv.h | 7 ++ > sysdeps/unix/sysv/linux/dl-rseq-symbols.S | 27 +++-- > sysdeps/unix/sysv/linux/rseq-internal.h | 94 ++++++++++++++---- > sysdeps/unix/sysv/linux/sched_getcpu.c | 3 +- > sysdeps/unix/sysv/linux/sys/rseq.h | 11 +++ > .../unix/sysv/linux/tst-rseq-disable-static.c | 1 + > sysdeps/unix/sysv/linux/tst-rseq-disable.c | 64 ++++++++++-- > .../unix/sysv/linux/tst-rseq-nptl-static.c | 1 + > sysdeps/unix/sysv/linux/tst-rseq-static.c | 1 + > sysdeps/unix/sysv/linux/tst-rseq.c | 84 +++++++++++++--- > sysdeps/unix/sysv/linux/tst-rseq.h | 3 +- > sysdeps/x86_64/nptl/rseq-access.h | 79 +++++++++++++++ > 22 files changed, 725 insertions(+), 96 deletions(-) > create mode 100644 sysdeps/generic/dl-extra_tls.h > create mode 100644 sysdeps/i386/nptl/rseq-access.h > create mode 100644 sysdeps/nptl/rseq-access.h > create mode 100644 sysdeps/unix/sysv/linux/dl-extra_tls.h > create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable-static.c > create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c > create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-static.c > create mode 100644 sysdeps/x86_64/nptl/rseq-access.h >
* Mathieu Desnoyers: > On 2024-11-21 14:08, Michael Jeanson wrote: >> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding >> rseq features past the initial 32 bytes of the original ABI. > > Hi Florian, > > I am currently implementing a new rseq feature which requires adding > an 8-byte field, which goes beyond the 32 initial bytes (it reaches 36 > bytes). > > I need to disable the glibc rseq integration with the tunable for > my development testing, but I am reaching a point where we will shortly > need the extensible rseq integration in glibc. > > How is review progressing ? How can we help ? There doesn't seem to be a __thread_pointer patch for C-SKY. There's a simple to resolve conflict in nptl/descr.h. TLS is currently broken due to the botched GET_ADDR_ARGS removal. I'm testing the rseq series after reverting 5e249192cac7354af02a7347a0d8c. Thanks, Florian
* Florian Weimer: > * Mathieu Desnoyers: > >> On 2024-11-21 14:08, Michael Jeanson wrote: >>> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding >>> rseq features past the initial 32 bytes of the original ABI. >> >> Hi Florian, >> >> I am currently implementing a new rseq feature which requires adding >> an 8-byte field, which goes beyond the 32 initial bytes (it reaches 36 >> bytes). >> >> I need to disable the glibc rseq integration with the tunable for >> my development testing, but I am reaching a point where we will shortly >> need the extensible rseq integration in glibc. >> >> How is review progressing ? How can we help ? > > There doesn't seem to be a __thread_pointer patch for C-SKY. > > There's a simple to resolve conflict in nptl/descr.h. > > TLS is currently broken due to the botched GET_ADDR_ARGS removal. I'm > testing the rseq series after reverting 5e249192cac7354af02a7347a0d8c. So it looks like that TLS_DTV_AT_TP part doesn't work, unfortunately. I see many new failures (after the mentioned revert) on powerpc64le, which is one of those targets. I haven't tried to reproduce them yet on AArch64. The GCC compile farm has a powerpc64le test machine (gcc120.fsffrance.org). I should be able to look into this in more detail on Monday. Thanks, Florian
On 2024-12-27 13:40, Florian Weimer wrote: > * Florian Weimer: > >> * Mathieu Desnoyers: >> >>> On 2024-11-21 14:08, Michael Jeanson wrote: >>>> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding >>>> rseq features past the initial 32 bytes of the original ABI. >>> >>> Hi Florian, >>> >>> I am currently implementing a new rseq feature which requires adding >>> an 8-byte field, which goes beyond the 32 initial bytes (it reaches 36 >>> bytes). >>> >>> I need to disable the glibc rseq integration with the tunable for >>> my development testing, but I am reaching a point where we will shortly >>> need the extensible rseq integration in glibc. >>> >>> How is review progressing ? How can we help ? >> >> There doesn't seem to be a __thread_pointer patch for C-SKY. >> >> There's a simple to resolve conflict in nptl/descr.h. >> >> TLS is currently broken due to the botched GET_ADDR_ARGS removal. I'm >> testing the rseq series after reverting 5e249192cac7354af02a7347a0d8c. > > So it looks like that TLS_DTV_AT_TP part doesn't work, unfortunately. I > see many new failures (after the mentioned revert) on powerpc64le, which > is one of those targets. I haven't tried to reproduce them yet on > AArch64. The GCC compile farm has a powerpc64le test machine > (gcc120.fsffrance.org). I recall that Michael did test on at least one architecture with TLS_DTV_AT_TP (aarch64), and one with TLS_TCB_AT_TP (x86-64). The issue may be specific to powerpc. > > I should be able to look into this in more detail on Monday. Michael is currently on vacation for the holiday break, he should be back on January 6. Thanks! Mathieu > > Thanks, > Florian >
* Mathieu Desnoyers: >> So it looks like that TLS_DTV_AT_TP part doesn't work, unfortunately. >> I see many new failures (after the mentioned revert) on powerpc64le, >> which is one of those targets. I haven't tried to reproduce them yet >> on AArch64. The GCC compile farm has a powerpc64le test machine >> (gcc120.fsffrance.org). > > I recall that Michael did test on at least one architecture > with TLS_DTV_AT_TP (aarch64), and one with TLS_TCB_AT_TP (x86-64). > > The issue may be specific to powerpc. Maybe it's caused by a non-zero TLS_TP_OFFSET. We use a shifted thread pointer on POWER, presumably to increase the reach of signed displacements in TLS-accessing instructions. This is alluded to in this comment: #ifdef RSEQ_SIG /* This should be a compile-time constant, but the current infrastructure makes it difficult to determine its value. Not all targets support __thread_pointer, so set __rseq_offset only if the rseq registration may have happened because RSEQ_SIG is defined. */ _rseq_offset = (char *) &pd->rseq_area - (char *) __thread_pointer (); #endif If this is indeed the cause, we should define TLS_TP_OFFSET on all architectures and use it to initialize __rseq_offset. Similar to what I started here: [PATCH 1/4] elf: Introduce generic <dl-tls.h> <https://inbox.sourceware.org/libc-alpha/1b470b147d0bfe51af6256b10cd38c14285a61b2.1735313702.git.fweimer@redhat.com/> Thanks, Florian
* Florian Weimer: > * Mathieu Desnoyers: > >>> So it looks like that TLS_DTV_AT_TP part doesn't work, unfortunately. >>> I see many new failures (after the mentioned revert) on powerpc64le, >>> which is one of those targets. I haven't tried to reproduce them yet >>> on AArch64. The GCC compile farm has a powerpc64le test machine >>> (gcc120.fsffrance.org). >> >> I recall that Michael did test on at least one architecture >> with TLS_DTV_AT_TP (aarch64), and one with TLS_TCB_AT_TP (x86-64). >> >> The issue may be specific to powerpc. > > Maybe it's caused by a non-zero TLS_TP_OFFSET. We use a shifted thread > pointer on POWER, presumably to increase the reach of signed > displacements in TLS-accessing instructions. > > This is alluded to in this comment: > > #ifdef RSEQ_SIG > /* This should be a compile-time constant, but the current > infrastructure makes it difficult to determine its value. Not > all targets support __thread_pointer, so set __rseq_offset only > if the rseq registration may have happened because RSEQ_SIG is > defined. */ > _rseq_offset = (char *) &pd->rseq_area - (char *) __thread_pointer (); > #endif > > If this is indeed the cause, we should define TLS_TP_OFFSET on all > architectures and use it to initialize __rseq_offset. Similar to what I > started here: > > [PATCH 1/4] elf: Introduce generic <dl-tls.h> > <https://inbox.sourceware.org/libc-alpha/1b470b147d0bfe51af6256b10cd38c14285a61b2.1735313702.git.fweimer@redhat.com/> With diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 57e72be4..1055032a 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -419,7 +419,7 @@ _dl_determine_tlsoffset (void) The alignment requirements of the pointer resulting from this offset and the thread pointer are enforced by 'max_align' which is used to align the tcb_offset. */ - _dl_extra_tls_set_offset(offset); + _dl_extra_tls_set_offset(offset - TLS_TP_OFFSET); /* Add the extra TLS block to the global offset. */ offset += extra_tls_size; the results look better, but I'm not sure if this is the correct fix. I see crashes on the scv 0 instruction in statically linked binaries. Looks like the TCB placement is wrong and the flag that indicates support for the new system call instruction (scv 0) is not loaded correctly. Thanks, Florian
* Florian Weimer: > * Mathieu Desnoyers: > >> On 2024-11-21 14:08, Michael Jeanson wrote: >>> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding >>> rseq features past the initial 32 bytes of the original ABI. >> >> Hi Florian, >> >> I am currently implementing a new rseq feature which requires adding >> an 8-byte field, which goes beyond the 32 initial bytes (it reaches 36 >> bytes). >> >> I need to disable the glibc rseq integration with the tunable for >> my development testing, but I am reaching a point where we will shortly >> need the extensible rseq integration in glibc. >> >> How is review progressing ? How can we help ? > > There doesn't seem to be a __thread_pointer patch for C-SKY. No worries, found it and reviewed it. Thanks, Florian
On 2024-12-30 05:06, Florian Weimer wrote: > * Florian Weimer: > >> * Mathieu Desnoyers: >> >>>> So it looks like that TLS_DTV_AT_TP part doesn't work, unfortunately. >>>> I see many new failures (after the mentioned revert) on powerpc64le, >>>> which is one of those targets. I haven't tried to reproduce them yet >>>> on AArch64. The GCC compile farm has a powerpc64le test machine >>>> (gcc120.fsffrance.org). >>> >>> I recall that Michael did test on at least one architecture >>> with TLS_DTV_AT_TP (aarch64), and one with TLS_TCB_AT_TP (x86-64). >>> >>> The issue may be specific to powerpc. >> >> Maybe it's caused by a non-zero TLS_TP_OFFSET. We use a shifted thread >> pointer on POWER, presumably to increase the reach of signed >> displacements in TLS-accessing instructions. >> >> This is alluded to in this comment: >> >> #ifdef RSEQ_SIG >> /* This should be a compile-time constant, but the current >> infrastructure makes it difficult to determine its value. Not >> all targets support __thread_pointer, so set __rseq_offset only >> if the rseq registration may have happened because RSEQ_SIG is >> defined. */ >> _rseq_offset = (char *) &pd->rseq_area - (char *) __thread_pointer (); >> #endif >> >> If this is indeed the cause, we should define TLS_TP_OFFSET on all >> architectures and use it to initialize __rseq_offset. Similar to what I >> started here: >> >> [PATCH 1/4] elf: Introduce generic <dl-tls.h> >> <https://inbox.sourceware.org/libc-alpha/1b470b147d0bfe51af6256b10cd38c14285a61b2.1735313702.git.fweimer@redhat.com/> > > With > > diff --git a/elf/dl-tls.c b/elf/dl-tls.c > index 57e72be4..1055032a 100644 > --- a/elf/dl-tls.c > +++ b/elf/dl-tls.c > @@ -419,7 +419,7 @@ _dl_determine_tlsoffset (void) > The alignment requirements of the pointer resulting from this offset and > the thread pointer are enforced by 'max_align' which is used to align the > tcb_offset. */ > - _dl_extra_tls_set_offset(offset); > + _dl_extra_tls_set_offset(offset - TLS_TP_OFFSET); > > /* Add the extra TLS block to the global offset. */ > offset += extra_tls_size; > > the results look better, but I'm not sure if this is the correct fix. > > I see crashes on the scv 0 instruction in statically linked binaries. > Looks like the TCB placement is wrong and the flag that indicates > support for the new system call instruction (scv 0) is not loaded > correctly. > > Thanks, > Florian > Hi, Sorry for the delay, I'm now back from holidays. I'm setting up a ppc64el test VM to investigate this, I had only tested the TLS_DTV_AT_TP implementation on aarch64 with the assumption that the other architectures would behave similarly, that might have been too optimistic. Thanks, Michael
On 2025-01-06 13:54, Michael Jeanson wrote: > On 2024-12-30 05:06, Florian Weimer wrote: >> * Florian Weimer: >> >>> * Mathieu Desnoyers: >>> >>>>> So it looks like that TLS_DTV_AT_TP part doesn't work, unfortunately. >>>>> I see many new failures (after the mentioned revert) on powerpc64le, >>>>> which is one of those targets. I haven't tried to reproduce them yet >>>>> on AArch64. The GCC compile farm has a powerpc64le test machine >>>>> (gcc120.fsffrance.org). >>>> >>>> I recall that Michael did test on at least one architecture >>>> with TLS_DTV_AT_TP (aarch64), and one with TLS_TCB_AT_TP (x86-64). >>>> >>>> The issue may be specific to powerpc. >>> >>> Maybe it's caused by a non-zero TLS_TP_OFFSET. We use a shifted thread >>> pointer on POWER, presumably to increase the reach of signed >>> displacements in TLS-accessing instructions. >>> >>> This is alluded to in this comment: >>> >>> #ifdef RSEQ_SIG >>> /* This should be a compile-time constant, but the current >>> infrastructure makes it difficult to determine its value. Not >>> all targets support __thread_pointer, so set __rseq_offset only >>> if the rseq registration may have happened because RSEQ_SIG is >>> defined. */ >>> _rseq_offset = (char *) &pd->rseq_area - (char *) __thread_pointer (); >>> #endif >>> >>> If this is indeed the cause, we should define TLS_TP_OFFSET on all >>> architectures and use it to initialize __rseq_offset. Similar to what I >>> started here: >>> >>> [PATCH 1/4] elf: Introduce generic <dl-tls.h> >>> <https://inbox.sourceware.org/libc-alpha/1b470b147d0bfe51af6256b10cd38c14285a61b2.1735313702.git.fweimer@redhat.com/> >> >> With >> >> diff --git a/elf/dl-tls.c b/elf/dl-tls.c >> index 57e72be4..1055032a 100644 >> --- a/elf/dl-tls.c >> +++ b/elf/dl-tls.c >> @@ -419,7 +419,7 @@ _dl_determine_tlsoffset (void) >> The alignment requirements of the pointer resulting from this offset and >> the thread pointer are enforced by 'max_align' which is used to align the >> tcb_offset. */ >> - _dl_extra_tls_set_offset(offset); >> + _dl_extra_tls_set_offset(offset - TLS_TP_OFFSET); >> >> /* Add the extra TLS block to the global offset. */ >> offset += extra_tls_size; >> >> the results look better, but I'm not sure if this is the correct fix. >> >> I see crashes on the scv 0 instruction in statically linked binaries. >> Looks like the TCB placement is wrong and the flag that indicates >> support for the new system call instruction (scv 0) is not loaded >> correctly. Your patch looks good to me but it's missing the same change for the statically linked binary code, like this : diff --git a/csu/libc-tls.c b/csu/libc-tls.c index 20d6b48a4d..65ce9ddb3d 100644 --- a/csu/libc-tls.c +++ b/csu/libc-tls.c @@ -20,6 +20,7 @@ #include <errno.h> #include <ldsodefs.h> #include <tls.h> +#include <dl-tls.h> #include <unistd.h> #include <stdio.h> #include <sys/param.h> @@ -195,7 +196,7 @@ __libc_setup_tls (void) The alignment requirements of the pointer resulting from this offset and the thread pointer are enforced by 'max_align' which is used to align the tcb_offset. */ - _dl_extra_tls_set_offset(tls_blocks_size - extra_tls_size); + _dl_extra_tls_set_offset(tls_blocks_size - extra_tls_size - TLS_TP_OFFSET); tlsblock = _dl_early_allocate (tls_blocks_size + max_align + TLS_PRE_TCB_SIZE There are also minor adjustments required in the rseq tests since we can't assume the rseq_offset to be greater than zero on architectures that have a non-zero TLS_TP_OFFSET. From what I can see, all the architectures that define TLS_TP_OFFSET are of the TLS_DTV_AT_TP type so we don't need to change the offset calculation in the TLS_TCB_AT_TP code paths. I ran the test suite on a ppc64el VM with the current master and then with the RSEQ patchset and your patch plus my changes, the tests results are the same. === Summary of results === 18 FAIL 5877 PASS 21 UNSUPPORTED 16 XFAIL 2 XPASS Do you want me to send an updated patchset or should I wait for further review? Michael
* Michael Jeanson: > Your patch looks good to me but it's missing the same change for the > statically linked binary code, like this : > > diff --git a/csu/libc-tls.c b/csu/libc-tls.c index 20d6b48a4d..65ce9ddb3d 100644 > --- a/csu/libc-tls.c > +++ b/csu/libc-tls.c > @@ -20,6 +20,7 @@ > #include <errno.h> > #include <ldsodefs.h> > #include <tls.h> > +#include <dl-tls.h> > #include <unistd.h> > #include <stdio.h> > #include <sys/param.h> > @@ -195,7 +196,7 @@ __libc_setup_tls (void) > The alignment requirements of the pointer resulting from this offset and > the thread pointer are enforced by 'max_align' which is used to align the > tcb_offset. */ > - _dl_extra_tls_set_offset(tls_blocks_size - extra_tls_size); > + _dl_extra_tls_set_offset(tls_blocks_size - extra_tls_size - TLS_TP_OFFSET); > > tlsblock = _dl_early_allocate (tls_blocks_size + max_align > + TLS_PRE_TCB_SIZE > Ah, makes sense. I've worked on posting a patch for exposing TLS_TP_OFFSET unconditionally, but the test case for it hits a snag: we still don't have <thread_pointer.h> on all architectures. I see build failures on: alpha arc arm mips s390 s390x sh Of these, only mips and sh do not yet assume that GCC's __builtin_thread_pointer work. We could disable the new test, but perhaps we should complete the <thread_pointer.h> implementation? Not sure where this puts us in terms of inclusion in the upcoming release. I'll try to add a generic <thread_pointer.h> and implementations for mips and sh. I also looked at an alternative implementation: make the size of struct pthread dynamic. This minimizes changes on architectures such as x86-64, at least if we assume that 32-byte alignment of the rseq area is enough. It's more complicated on architectures such as AArch64, where THREAD_SELF would need to perform variable adjustment. It does not look like an overall win in terms of complexity. Michael, before reposting your changes, you could fix some style nits? Mostly missing space before '(' in function calls, missing two spaces after '.' at the end of the comment. Copyright years will need updating, too. grep -E '^\+.*(Contributed by|http://|Copyright.*(19[0-9][0-9]|20[01][0-9]|202[0-4])|XXX|FIXME|TODO|\?\?\?|internal_function|\bu_(char|int)|the\s+the|[^ *$({~]\(|\b(if|for|while|do)\b.*\{|[[:space:]]$|\($|\. \*/)' Thanks, Florian
On 2025-01-07 06:15, Florian Weimer wrote: > * Michael Jeanson: > >> Your patch looks good to me but it's missing the same change for the >> statically linked binary code, like this : >> >> diff --git a/csu/libc-tls.c b/csu/libc-tls.c index 20d6b48a4d..65ce9ddb3d 100644 >> --- a/csu/libc-tls.c >> +++ b/csu/libc-tls.c >> @@ -20,6 +20,7 @@ >> #include <errno.h> >> #include <ldsodefs.h> >> #include <tls.h> >> +#include <dl-tls.h> >> #include <unistd.h> >> #include <stdio.h> >> #include <sys/param.h> >> @@ -195,7 +196,7 @@ __libc_setup_tls (void) >> The alignment requirements of the pointer resulting from this offset and >> the thread pointer are enforced by 'max_align' which is used to align the >> tcb_offset. */ >> - _dl_extra_tls_set_offset(tls_blocks_size - extra_tls_size); >> + _dl_extra_tls_set_offset(tls_blocks_size - extra_tls_size - TLS_TP_OFFSET); >> >> tlsblock = _dl_early_allocate (tls_blocks_size + max_align >> + TLS_PRE_TCB_SIZE >> > > Ah, makes sense. > > I've worked on posting a patch for exposing TLS_TP_OFFSET > unconditionally, but the test case for it hits a snag: we still don't > have <thread_pointer.h> on all architectures. I see build failures on: > > alpha > arc > arm > mips > s390 > s390x > sh > > Of these, only mips and sh do not yet assume that GCC's > __builtin_thread_pointer work. > > We could disable the new test, but perhaps we should complete the > <thread_pointer.h> implementation? Not sure where this puts us in terms > of inclusion in the upcoming release. I'll try to add a generic > <thread_pointer.h> and implementations for mips and sh. I agree we should complete the <thread_pointer.h> implementation, I was under the impression it was mostly done with my previous patches. I'll have a look at the architectures you listed. I can do mips and sh if you haven't already started and it saves you some time? > > I also looked at an alternative implementation: make the size of struct > pthread dynamic. This minimizes changes on architectures such as > x86-64, at least if we assume that 32-byte alignment of the rseq area is > enough. It's more complicated on architectures such as AArch64, where > THREAD_SELF would need to perform variable adjustment. It does not look > like an overall win in terms of complexity. I don't think we can assume 32-byte alignment will always be enough and as you say it doesn't seem to reduce complexity compared to this patchset. > > Michael, before reposting your changes, you could fix some style nits? > Mostly missing space before '(' in function calls, missing two spaces > after '.' at the end of the comment. Copyright years will need > updating, too. Will do. > > grep -E '^\+.*(Contributed by|http://|Copyright.*(19[0-9][0-9]|20[01][0-9]|202[0-4])|XXX|FIXME|TODO|\?\?\?|internal_function|\bu_(char|int)|the\s+the|[^ *$({~]\(|\b(if|for|while|do)\b.*\{|[[:space:]]$|\($|\. \*/)' > > Thanks, > Florian >
On 2025-01-07 11:16, Michael Jeanson wrote: >> I've worked on posting a patch for exposing TLS_TP_OFFSET >> unconditionally, but the test case for it hits a snag: we still don't >> have <thread_pointer.h> on all architectures. I see build failures on: >> >> alpha >> arc >> arm >> mips >> s390 >> s390x >> sh >> >> Of these, only mips and sh do not yet assume that GCC's >> __builtin_thread_pointer work. >> >> We could disable the new test, but perhaps we should complete the >> <thread_pointer.h> implementation? Not sure where this puts us in terms >> of inclusion in the upcoming release. I'll try to add a generic >> <thread_pointer.h> and implementations for mips and sh. > > I agree we should complete the <thread_pointer.h> implementation, I was > under the impression it was mostly done with my previous patches. I'll > have a look at the architectures you listed. > > I can do mips and sh if you haven't already started and it saves you > some time? I've tried build-many-glibcs.py builds for all the architectures you mentioned here with your TLS_TP_OFFSET v2 patchset on top of my updated RSEQ ABI patchset and I had no build failures other than something weird on mips-n32 where I had to add an include guard to 'sysdeps/mips/dl-tls.h'. Is there something different in your environment? Maybe some older GCC version on some of those architectures without __builtin_thread_pointer() support?
* Michael Jeanson: > On 2025-01-07 11:16, Michael Jeanson wrote: >>> I've worked on posting a patch for exposing TLS_TP_OFFSET >>> unconditionally, but the test case for it hits a snag: we still don't >>> have <thread_pointer.h> on all architectures. I see build failures on: >>> >>> alpha >>> arc >>> arm >>> mips >>> s390 >>> s390x >>> sh >>> >>> Of these, only mips and sh do not yet assume that GCC's >>> __builtin_thread_pointer work. >>> >>> We could disable the new test, but perhaps we should complete the >>> <thread_pointer.h> implementation? Not sure where this puts us in terms >>> of inclusion in the upcoming release. I'll try to add a generic >>> <thread_pointer.h> and implementations for mips and sh. >> >> I agree we should complete the <thread_pointer.h> implementation, I was >> under the impression it was mostly done with my previous patches. I'll >> have a look at the architectures you listed. >> >> I can do mips and sh if you haven't already started and it saves you >> some time? > > I've tried build-many-glibcs.py builds for all the architectures you > mentioned here with your TLS_TP_OFFSET v2 patchset on top of my > updated RSEQ ABI patchset and I had no build failures other than > something weird on mips-n32 where I had to add an include guard > to 'sysdeps/mips/dl-tls.h'. > > Is there something different in your environment? No, the first patch was buggy, as you indicated: the generic implementation in sysdeps/thread_pointer.h was not found anymore, so I forgot that it existed. After moving it to sysdeps/generic/thread_pointer.h, it worked as expected. Thanks, Florian
On 2025-01-07 15:40, Florian Weimer wrote: > No, the first patch was buggy, as you indicated: the generic > implementation in sysdeps/thread_pointer.h was not found anymore, so I > forgot that it existed. After moving it to > sysdeps/generic/thread_pointer.h, it worked as expected. Ahhh, makes sense, I'll send v15 once I've finished running the tests.