From patchwork Sun Mar 17 11:37:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 87283 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 302D5385840A for ; Sun, 17 Mar 2024 11:38:04 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d]) by sourceware.org (Postfix) with ESMTPS id 70D663858D37 for ; Sun, 17 Mar 2024 11:37:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 70D663858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 70D663858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710675463; cv=none; b=WRrU1f4zbDMMhM52KD0CpOP+NulbEYVCWru+q1fIvgaPUqTsC2A1VCMqHMSXdoHPLAHu3g0WFpFCxZfJb8+Ow0iSFwFflkPR+ab1XwrKQKSs6wD3DiOnxdSX3GeNiG/WLCDvb23YJGZFsJsimIw7RmvqItrE4e8BWfC5ClxRSmo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710675463; c=relaxed/simple; bh=IrQgx5VXgM0H3RwQvhsn07FkHNukbFumOgIaZEe3F8I=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=p8zx4Bufhc1W5oAgUEVjvALmAneCnT0Zo9i25o3VvNMlUTLKvyCeQSLGi7s8wg8EqujK/lUNdltQPkU9tMVIRIKsS7pQ+GfHXepj9I9Ovg6Ko2SvLO8TPzcFpjV/5LFiK+iEpOy+NDzv9B1gq4R/ZUfNUoSZNsuYDMjT4vsgX5Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-5ce6b5e3c4eso2080680a12.2 for ; Sun, 17 Mar 2024 04:37:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710675459; x=1711280259; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=unyifC456PjzlQDWn8KoprP0/IEHxOfqurBzxv9XbzM=; b=SWp8ivzQRmR0bFUAG7T6JdU0zTA7O3s/EEM6rcdDzyfXBFCooQ0uTWd11hYG2EE6/k QHLDVJeNj+gM0xcInr9D5kU8Eq0XDkCrry6JpSgVVI5dGTXkW6PaJG38jCjhtdVTGCii IkuxPoxTXpLVUjdrn4xb330fNJdgxTHvuxDp91XQdQ/wENtae5tKTw0HcID15dnfreNl x6B1NqEsmQ7cKvQB0+YC/m66Hg7tp/hHixCp8jpQkYTOp+v5E+NkGpz5hBFiplX1AS6j oktM7uoV5Ty+Stq98VzfdnG/mYOEqHgaNn/b1qY7ZFi4xHgImBR9I+9Szr6ukDBSdzif PiFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710675459; x=1711280259; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=unyifC456PjzlQDWn8KoprP0/IEHxOfqurBzxv9XbzM=; b=FUwNBsDO7yERTiIG2k6cCeZRHR7k5yFmdnJOMm2wV8SGrirCQc46IXa3JtJhSTAzhF 5V5boOhWd2QkE9VoJUF6RIlU1/5SdLCMvc7bRDBFG58E0JyRut4LR8zXyTfWNKoOZ2eo ZjuEj/EJHalIZr+EU3p26oX5cl4kOa56spO9MTaTAu/aa7oWWnPQ/v5AyCdDunjRldOG 73zF3W1kdrb1Yts0J84/SkpJaj0lyiWtSJWmFC0roauYxm807ls87VHmdDA1UocHoFE/ Rcwfa13kvftMrbJaLP+7LuWh58rxyGTbX+0VfuK+uEpGhkwiUFW0nuMyv2QuSUsE70e/ 4++w== X-Gm-Message-State: AOJu0Yx8acRt3EPgy5K8Nqr2bcSpdslQNbaQL/symh98rCcUMj1AKmK0 khd/72cCtgTTvq2+a+70/aPWoVu/c01Hja+lmbJUPjm9nUUURB5q X-Google-Smtp-Source: AGHT+IGIcogtoKAy6rce+GiRaSiJ6ICOOP4SkTkYjA7htlSCEWhGbwGXSY1H0Pl3xWTJutHz4OGxFg== X-Received: by 2002:a17:902:dacf:b0:1e0:188a:5ae1 with SMTP id q15-20020a170902dacf00b001e0188a5ae1mr869902plx.10.1710675459102; Sun, 17 Mar 2024 04:37:39 -0700 (PDT) Received: from gnu-cfl-3.localdomain ([172.58.89.72]) by smtp.gmail.com with ESMTPSA id e15-20020a17090301cf00b001dcc7795524sm5339598plh.24.2024.03.17.04.37.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 17 Mar 2024 04:37:38 -0700 (PDT) Received: from gnu-cfl-3.. (localhost [IPv6:::1]) by gnu-cfl-3.localdomain (Postfix) with ESMTP id 49AFA7400AC; Sun, 17 Mar 2024 04:37:37 -0700 (PDT) From: "H.J. Lu" To: libc-alpha@sourceware.org Cc: fweimer@redhat.com Subject: [PATCH v3] x86-64: Allocate state buffer space for RDI, RSI and RBX Date: Sun, 17 Mar 2024 04:37:37 -0700 Message-ID: <20240317113737.681177-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Spam-Status: No, score=-3019.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_ABUSEAT, RCVD_IN_DNSWL_NONE, RCVD_IN_SBL_CSS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to STATE_SAVE_OFFSET(%rsp). +==================+<- stack frame start aligned at 8 or 16 bytes | |<- RDI | |<- RSI | |<- RBX | |<- paddings from stack realignment of 64 bytes |------------------|<- xsave buffer end aligned at 64 bytes | |<- | |<- | |<- |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) | |<- 8-byte padding | |<- 8-byte padding | |<- R11 | |<- R10 | |<- R9 | |<- R8 | |<- RDX | |<- RCX +==================+<- State buffer start aligned at 64 bytes This fixes BZ #31501. --- sysdeps/x86/cpu-features.c | 11 ++- sysdeps/x86/sysdep.h | 29 ++++++++ sysdeps/x86_64/Makefile | 19 +++++ sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.c | 19 +++++ sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S | 87 ++++++++++++++++++++++ sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.c | 19 +++++ sysdeps/x86_64/tst-gnu2-tls2-x86-64.c | 21 ++++++ 7 files changed, 201 insertions(+), 4 deletions(-) create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.c create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.c create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64.c diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 4ea373dffa..3d7c2819d7 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -311,7 +311,7 @@ update_active (struct cpu_features *cpu_features) /* NB: On AMX capable processors, ebx always includes AMX states. */ unsigned int xsave_state_full_size - = ALIGN_UP (ebx + STATE_SAVE_OFFSET, 64); + = ALIGN_UP (ebx + TLSDESC_CALL_REGISTER_SAVE_AREA, 64); cpu_features->xsave_state_size = xsave_state_full_size; @@ -401,8 +401,10 @@ update_active (struct cpu_features *cpu_features) unsigned int amx_size = (xstate_amx_comp_offsets[31] + xstate_amx_comp_sizes[31]); - amx_size = ALIGN_UP (amx_size + STATE_SAVE_OFFSET, - 64); + amx_size + = ALIGN_UP ((amx_size + + TLSDESC_CALL_REGISTER_SAVE_AREA), + 64); /* Set xsave_state_full_size to the compact AMX state size for XSAVEC. NB: xsave_state_full_size is only used in _dl_tlsdesc_dynamic_xsave and @@ -410,7 +412,8 @@ update_active (struct cpu_features *cpu_features) cpu_features->xsave_state_full_size = amx_size; #endif cpu_features->xsave_state_size - = ALIGN_UP (size + STATE_SAVE_OFFSET, 64); + = ALIGN_UP (size + TLSDESC_CALL_REGISTER_SAVE_AREA, + 64); CPU_FEATURE_SET (cpu_features, XSAVEC); } } diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h index db8e576e91..46fcd27345 100644 --- a/sysdeps/x86/sysdep.h +++ b/sysdeps/x86/sysdep.h @@ -46,6 +46,34 @@ red-zone into account. */ # define STATE_SAVE_OFFSET (8 * 7 + 8) +/* _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning + stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and + R11. Allocate space for RDI, RSI and RBX to avoid clobbering saved + RDI, RSI and RBX values on stack by xsave. + + +==================+<- stack frame start aligned at 8 or 16 bytes + | |<- RDI + | |<- RSI + | |<- RBX + | |<- paddings from stack realignment of 64 bytes + |------------------|<- xsave buffer end aligned at 64 bytes + | |<- + | |<- + | |<- + |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) + | |<- 8-byte padding + | |<- 8-byte padding + | |<- R11 + | |<- R10 + | |<- R9 + | |<- R8 + | |<- RDX + | |<- RCX + +==================+<- State buffer start aligned at 64 bytes + +*/ +# define TLSDESC_CALL_REGISTER_SAVE_AREA (STATE_SAVE_OFFSET + 24) + /* Save SSE, AVX, AVX512, mask, bound and APX registers. Bound and APX registers are mutually exclusive. */ # define STATE_SAVE_MASK \ @@ -68,6 +96,7 @@ /* Offset for fxsave/xsave area used by _dl_tlsdesc_dynamic. Since i386 doesn't have red-zone, use 0 here. */ # define STATE_SAVE_OFFSET 0 +# define TLSDESC_CALL_REGISTER_SAVE_AREA 0 /* Save SSE, AVX, AXV512, mask and bound registers. */ # define STATE_SAVE_MASK \ diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index 66b21954f3..e21e4b96ab 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -217,6 +217,25 @@ valgrind-suppressions-tst-valgrind-smoke = \ --suppressions=$(..)sysdeps/x86_64/tst-valgrind-smoke.supp endif +tests += \ + tst-gnu2-tls2-x86-64 \ +# tests + +modules-names += \ + tst-gnu2-tls2-x86-64-mod0 \ + tst-gnu2-tls2-x86-64-mod1 \ + tst-gnu2-tls2-x86-64-mod2 \ +# modules-names + +$(objpfx)tst-gnu2-tls2-x86-64: $(shared-thread-library) +$(objpfx)tst-gnu2-tls2-x86-64.out: \ + $(objpfx)tst-gnu2-tls2-x86-64-mod0.so \ + $(objpfx)tst-gnu2-tls2-x86-64-mod1.so \ + $(objpfx)tst-gnu2-tls2-x86-64-mod2.so + +CFLAGS-tst-gnu2-tls2-x86-64-mod0.c += -mtls-dialect=gnu2 +CFLAGS-tst-gnu2-tls2-x86-64-mod2.c += -mtls-dialect=gnu2 + endif # $(subdir) == elf ifeq ($(subdir),csu) diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.c b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.c new file mode 100644 index 0000000000..40b3ec5c82 --- /dev/null +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.c @@ -0,0 +1,19 @@ +/* DSO used by tst-gnu2-tls2-x86-64. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S new file mode 100644 index 0000000000..449ddd5c9d --- /dev/null +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S @@ -0,0 +1,87 @@ +/* Check if TLSDESC relocation preserves %rdi, %rsi and %rbx. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +/* On AVX512 machines, OFFSET == 104 caused _dl_tlsdesc_dynamic_xsavec + to clobber %rdi, %rsi and %rbx. On Intel AVX CPUs, the state size + is 960 bytes and this test didn't fail. It may be due to the unused + last 128 bytes. On AMD AVX CPUs, the state size is 832 bytes and + this test might fail without the fix. */ +#ifndef OFFSET +# define OFFSET 104 +#endif + + .text + .p2align 4 + .globl apply_tls + .type apply_tls, @function +apply_tls: + cfi_startproc + _CET_ENDBR + pushq %rbp + cfi_def_cfa_offset (16) + cfi_offset (6, -16) + movdqu (%RDI_LP), %xmm0 + lea tls_var1@TLSDESC(%rip), %RAX_LP + mov %RSP_LP, %RBP_LP + cfi_def_cfa_register (6) + /* Align stack to 64 bytes. */ + and $-64, %RSP_LP + sub $OFFSET, %RSP_LP + pushq %rbx + /* Set %ebx to 0xbadbeef. */ + movl $0xbadbeef, %ebx + movl $0xbadbeef, %esi + movq %rdi, saved_rdi(%rip) + movq %rsi, saved_rsi(%rip) + call *tls_var1@TLSCALL(%RAX_LP) + /* Check if _dl_tlsdesc_dynamic preserves %rdi, %rsi and %rbx. */ + cmpq saved_rdi(%rip), %rdi + jne L(hlt) + cmpq saved_rsi(%rip), %rsi + jne L(hlt) + cmpl $0xbadbeef, %ebx + jne L(hlt) + add %fs:0, %RAX_LP + movups %xmm0, 32(%RAX_LP) + movdqu 16(%RDI_LP), %xmm1 + mov %RAX_LP, %RBX_LP + movups %xmm1, 48(%RAX_LP) + lea 32(%RBX_LP), %RAX_LP + pop %rbx + leave + cfi_def_cfa (7, 8) + ret +L(hlt): + hlt + cfi_endproc + .size apply_tls, .-apply_tls + .hidden tls_var1 + .globl tls_var1 + .section .tbss,"awT",@nobits + .align 16 + .type tls_var1, @object + .size tls_var1, 3200 +tls_var1: + .zero 3200 + .local saved_rdi + .comm saved_rdi,8,8 + .local saved_rsi + .comm saved_rsi,8,8 + .section .note.GNU-stack,"",@progbits diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.c b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.c new file mode 100644 index 0000000000..c12b81a49b --- /dev/null +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.c @@ -0,0 +1,19 @@ +/* DSO used by tst-gnu2-tls2-x86-64. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c b/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c new file mode 100644 index 0000000000..7d51f488bd --- /dev/null +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c @@ -0,0 +1,21 @@ +/* Check if TLSDESC relocation preserves %rdi, %rsi and %rbx. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define MOD(i) "tst-gnu2-tls2-x86-64-mod" #i ".so" + +#include