From patchwork Sun Oct 2 12:34:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58265 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0D467385AC2A for ; Sun, 2 Oct 2022 12:34:47 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id A53103858D32 for ; Sun, 2 Oct 2022 12:34:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A53103858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=QLvUumlvkDt35Af98mJ6IL6kSj4LksLonKMDGKI9z6M=; b=u5IV4wrlIhrk3WBh9kOSiQ2xd6 3X0HlDePU/h9Hl6ziZTRlsU8mKZWO0hih2Nvb6KOQNRu8R9FhRq4IpZ2DdTHu/EwcZB3IKUiNBGUv 514Aj5+VLOxCl6MZzsHrJuNgp4EjVkYmFq5u0bKhDRXIVvytmn0yjGDj+BPIBdbm+gfTMmD9OL9DH s/rfIjVfmXsmD+zomEqPHNXpapjM8vZSSXW7cYP6++Lc5T9c01zxw+JGSpe9B7hDwG6ca6N+E5agG sHWw3maQyUW4PR7J8rcZV5w8qzQY0jLUf6qQvzWUQO0onmswOhkysVHDU+EpLbsZg6U0KMUrnGA/a cSwqDkQA==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oeyAu-00FLU9-IN; Sun, 02 Oct 2022 14:34:28 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oeyAt-00CvFx-2O; Sun, 02 Oct 2022 14:34:27 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH v2 1/6] x86: include BMI1 and BMI2 in x86-64-v3 level Date: Sun, 2 Oct 2022 14:34:19 +0200 Message-Id: <20221002123424.3079805-2-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221002123424.3079805-1-aurelien@aurel32.net> References: <20221002123424.3079805-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The "System V Application Binary Interface AMD64 Architecture Processor Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3 level. --- sysdeps/x86/get-isa-level.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sysdeps/x86/get-isa-level.h b/sysdeps/x86/get-isa-level.h index 1ade78ab73..5b4dd5f062 100644 --- a/sysdeps/x86/get-isa-level.h +++ b/sysdeps/x86/get-isa-level.h @@ -47,6 +47,8 @@ get_isa_level (const struct cpu_features *cpu_features) isa_level |= GNU_PROPERTY_X86_ISA_1_V2; if (CPU_FEATURE_USABLE_P (cpu_features, AVX) && CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && CPU_FEATURE_USABLE_P (cpu_features, BMI1) + && CPU_FEATURE_USABLE_P (cpu_features, BMI2) && CPU_FEATURE_USABLE_P (cpu_features, F16C) && CPU_FEATURE_USABLE_P (cpu_features, FMA) && CPU_FEATURE_USABLE_P (cpu_features, LZCNT) From patchwork Sun Oct 2 12:34:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58269 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E1BAF3851AAB for ; Sun, 2 Oct 2022 12:35:30 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id AFAFB385841F for ; Sun, 2 Oct 2022 12:34:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AFAFB385841F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=XyImlBJHW1mIgPwtbaMbfz5fPyxGHL+XiFJRKau7lLQ=; b=I0AOTss2s92FYjCf/I/xYrQ5QB z5KHd1XIss1nf7U6TfAT3okqVks2KPmWYfmM3nQcQskfA5PRw5AaJ1tXdeTsk75biYrELMrvqAq18 e5GC9OhBKWgPRWXXs6+BCJWbi9AiTUZvps26QdUajKuTErIVaLPEqehUO+4WINVzq5rYQoU644+Hw AuXqQQ38PUaFEeOtSjdzYP2qQp+rY3MaOkpVYUsvITlUH2ZEknyIhJsH1tsk+Sr5coAJqfRqxCQ4c YgGYEBUTClW2wh/YYIaRQHhFuew+HioS8TjMNLwgOxGVNxPjCGJ1rAE43Wl7EiFeLDnM9itFeYeof UmoUYY6Q==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oeyAu-00FLUA-Ij; Sun, 02 Oct 2022 14:34:28 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oeyAt-00CvG1-2T; Sun, 02 Oct 2022 14:34:27 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations Date: Sun, 2 Oct 2022 14:34:20 +0200 Message-Id: <20221002123424.3079805-3-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221002123424.3079805-1-aurelien@aurel32.net> References: <20221002123424.3079805-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 str*cmp and wcs(n)cmp implementations use the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 47 +++++++++++++++------ sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + sysdeps/x86_64/multiarch/strcmp.c | 4 +- sysdeps/x86_64/multiarch/strncmp.c | 4 +- 4 files changed, 39 insertions(+), 17 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index a71444eccb..fec8790c11 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -448,13 +448,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strcasecmp, X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strcasecmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strcasecmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strcasecmp_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp, @@ -470,13 +473,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strcasecmp_l, X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strcasecmp_l_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strcasecmp_l_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strcasecmp_l_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp_l, @@ -585,10 +591,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __strcmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strcmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strcmp_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strcmp, @@ -638,13 +646,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncasecmp, X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncasecmp_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp, @@ -660,13 +671,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncasecmp_l, X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + & CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_l_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_l_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncasecmp_l_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l, @@ -796,10 +810,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __wcscmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __wcscmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wcscmp_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -816,10 +832,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __wcsncmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __wcsncmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wcsncmp_avx2_rtm) /* ISA V2 wrapper for GENERIC implementation because the @@ -1162,13 +1180,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncmp, X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strncmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncmp_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp, diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h index 68646ef199..7622af259c 100644 --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c index fdd5afe3af..9d6c9f66ba 100644 --- a/sysdeps/x86_64/multiarch/strcmp.c +++ b/sysdeps/x86_64/multiarch/strcmp.c @@ -45,12 +45,12 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) return OPTIMIZE (evex); if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c index 4ebe4bde30..c4f8b6bbb5 100644 --- a/sysdeps/x86_64/multiarch/strncmp.c +++ b/sysdeps/x86_64/multiarch/strncmp.c @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) return OPTIMIZE (evex); if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) From patchwork Sun Oct 2 12:34:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58267 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EC804385382A for ; Sun, 2 Oct 2022 12:35:08 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id A92AA3858401 for ; Sun, 2 Oct 2022 12:34:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A92AA3858401 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=WLMfjF0fVrim7qFW3MfHCyxqLbNa8bFkeqDJ1LGj5jY=; b=tKZkvnwQ7+rbI7ZF+WsMNmSGoP /YksUh4CUY7Nq/avT/F/14EqQcbU3SwF5pbpUgXFRalCorv7o4eEUTWBhFX6gQNdKe5YEJLlH+M1m yyFPq3l7bZ0FEEPLaSjL+y0vDRcD0e3TeL93o1IRewBdKtgsTuJ7y5eLlQ2YjKJZHTyGX/iv1CG+q bvcMPmlUXs8yIBFKezqpTCVwU4/GUxsQa2X7fgoxYm8A4Ge6sbNXQQcVFym1QdQj1VboYIGq2LEYA QxoUvvqz+fO/MkOejOMxYmw7EJulQhEejIixk6decbwOCvMpYBrxzQV0wC9yzR4eYAemppfCJ7UxJ Xwg/J73g==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oeyAu-00FLUB-Ik; Sun, 02 Oct 2022 14:34:28 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oeyAt-00CvG5-2Y; Sun, 02 Oct 2022 14:34:27 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH v2 3/6] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Date: Sun, 2 Oct 2022 14:34:21 +0200 Message-Id: <20221002123424.3079805-4-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221002123424.3079805-1-aurelien@aurel32.net> References: <20221002123424.3079805-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 memchr, rawmemchr and wmemchr implementations use the 'bzhi' and 'sarx' instructions, which belongs to the BMI2 CPU feature. Fixes: acfd088a1963 ("x86: Optimize memchr-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index fec8790c11..7c84963d92 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -69,10 +69,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __memchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, memchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __memchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, memchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __memchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -335,10 +337,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __rawmemchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __rawmemchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __rawmemchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -927,10 +931,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __wmemchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __wmemchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wmemchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 From patchwork Sun Oct 2 12:34:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58268 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A41CF38515F5 for ; Sun, 2 Oct 2022 12:35:29 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id B1EF73858430 for ; Sun, 2 Oct 2022 12:34:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B1EF73858430 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=sYMnXeQZZ/k1SsHRZR0phiR+xD636vM/P+OBOM5IrvI=; b=gEfYPuFTkdNL2aBW7KyDWQkIiP 4YCX4oKqIfKcfK125lI2NS7TChj2WLcn3Zkx/qLjAx6CGNUTFD4d5djZa0cWlI6xHH+NYmvi5q7B6 foTRHcOCJj/HOB9B5XIIYDUczzOga9uBqsHg2NJzOf0v4x7TwRtzZ/VxxuMye4+6cI22ZKwqZSC/a PchsCt4w/C6tRom2sojKsSkqWrHamnKTNmnbQBYfxZCwHeQ4++e9wRsd9kvWQUCHrxxqpcKHAud6+ s8WYvNh7U99dkAfSB1nILDJrnJqh27Dvv2t2tQQ9n4FTWeVzzmIYAB7CjHeLCulBfUvvcAj370djE p0txQ5lw==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oeyAu-00FLUC-LU; Sun, 02 Oct 2022 14:34:28 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oeyAt-00CvG9-2d; Sun, 02 Oct 2022 14:34:27 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH v2 4/6] x86-64: Require LZCNT for AVX2 memrchr implementation Date: Sun, 2 Oct 2022 14:34:22 +0200 Message-Id: <20221002123424.3079805-5-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221002123424.3079805-1-aurelien@aurel32.net> References: <20221002123424.3079805-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 memrchr implementation uses the 'lzcnt' instruction, which belongs to the LZCNT CPU feature. Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86/isa-level.h | 1 + sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 7 +++++-- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index 3c4480aba7..bbb90f5c5e 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -80,6 +80,7 @@ #define AVX_X86_ISA_LEVEL 3 #define AVX2_X86_ISA_LEVEL 3 #define BMI2_X86_ISA_LEVEL 3 +#define LZCNT_X86_ISA_LEVEL 3 #define MOVBE_X86_ISA_LEVEL 3 /* ISA level >= 2 guaranteed includes. */ diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h index a57a9952f3..f1741083fd 100644 --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h @@ -37,6 +37,7 @@ IFUNC_SELECTOR (void) if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index 7c84963d92..4ee28c99bd 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -209,13 +209,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, memrchr, X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (LZCNT)), __memrchr_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (LZCNT)), __memrchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (LZCNT) && CPU_FEATURE_USABLE (RTM)), __memrchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 From patchwork Sun Oct 2 12:34:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58266 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7B869385383D for ; Sun, 2 Oct 2022 12:35:08 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id AD8363858413 for ; Sun, 2 Oct 2022 12:34:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AD8363858413 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=tAckQBNK02872JD0mlwC0fw3xc/7aSRm/i23IO0b/4w=; b=eYx9uPfEjbwqeiAj4PK8dEMMFW pXUpWm0vsjBvgDxVu9YpnKOicUgn12tYma4TCbyHateLPGbngPNpgmJ5PM6osjAG4FWLvE5wvfXd5 nynbnApoAdHRtqj6KdH7qsjvr4fbKgW89HNcsZYUO/qk34ivV78ncInpVnfa9Ko7LxqGwjGhqw1wr iR3x9keTsJ8RSGX7vVo3xCjmSsi+y5xrg9xwooQendGK3zqOY1ufOFxEEOpxyLrAvKz1TFvxB0A5+ InPJuqe8ECQZmyAr0Mvakvn9SeunBI7kb2GT4oAxaCdv9PCHg1dBZ4pcl7dex4GCRRjOQkRKzoCJR RZ87SRjQ==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oeyAu-00FLUD-Me; Sun, 02 Oct 2022 14:34:28 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oeyAt-00CvGD-2i; Sun, 02 Oct 2022 14:34:27 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH v2 5/6] x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations Date: Sun, 2 Oct 2022 14:34:23 +0200 Message-Id: <20221002123424.3079805-6-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221002123424.3079805-1-aurelien@aurel32.net> References: <20221002123424.3079805-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk' instruction which belongs to the BMI1 CPU feature and the 'shrx' instruction, which belongs to the BMI2 CPU feature. Fixes: df7e295d18ff ("x86: Optimize {str|wcs}rchr-avx2") Partially resolves: BZ #29611 --- sysdeps/x86/isa-level.h | 1 + sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 17 ++++++++++++++--- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index bbb90f5c5e..06f6c9663e 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -79,6 +79,7 @@ /* ISA level >= 3 guaranteed includes. */ #define AVX_X86_ISA_LEVEL 3 #define AVX2_X86_ISA_LEVEL 3 +#define BMI1_X86_ISA_LEVEL 3 #define BMI2_X86_ISA_LEVEL 3 #define LZCNT_X86_ISA_LEVEL 3 #define MOVBE_X86_ISA_LEVEL 3 diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h index f1741083fd..f2f5e8a211 100644 --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h @@ -36,6 +36,7 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI1) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index 4ee28c99bd..1c8afa229f 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -575,13 +575,19 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strrchr, X86_IFUNC_IMPL_ADD_V4 (array, i, strrchr, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), __strrchr_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), __strrchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strrchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -794,13 +800,18 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, wcsrchr, (CPU_FEATURE_USABLE (AVX512VL) && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI1) && CPU_FEATURE_USABLE (BMI2)), __wcsrchr_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), __wcsrchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wcsrchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 From patchwork Sun Oct 2 12:34:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58264 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 12D4B385382D for ; Sun, 2 Oct 2022 12:34:46 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id A5BBB3858C2C for ; Sun, 2 Oct 2022 12:34:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A5BBB3858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=hCsUgYf2Ig+9Wma0re/Kj5f9oUhEjOSKH3ZOZrjK0mY=; b=bde4AbuhKI9Y26/P1htehJhGWv DKWCVgE8kXuys5Z7Yhel4yKde5rD9+43SPNFgQ5yg1bYGdSNJ7ClSQ99bz47iMueD2Z/VZSrRuHHB hSm9+qNKaJzm1fDT8ptZswXQWI933Qs9lP31nDZMNEiFIe57KdbQP9edPVgAZ0LTO4qLcXUhh9FlG fXzLPwUQG715CgkzQUWhcBMobKaIVhFpSJ/+Yn7NfA9IHEeP9di8XmJUKvuy57kS/sv/5ZYYIosRD 0u+7BIoYr1kEONpEEKXIXTmnt37Korrv8/dZei38pB8LWLKQTaP/pQFMdKXYVk4RtxgYWlIn039w1 Cz2njLsQ==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oeyAu-00FLUE-Mo; Sun, 02 Oct 2022 14:34:28 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oeyAt-00CvGH-2n; Sun, 02 Oct 2022 14:34:27 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH v2 6/6] x86-64: Require BMI2 for AVX2 memrchr implementation Date: Sun, 2 Oct 2022 14:34:24 +0200 Message-Id: <20221002123424.3079805-7-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221002123424.3079805-1-aurelien@aurel32.net> References: <20221002123424.3079805-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 memrchr implementation use the 'shlxl' instruction, which belongs to the BMI2 CPU feature. Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index 1c8afa229f..00a91123d3 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -210,14 +210,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX512VL) && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (LZCNT)), __memrchr_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (LZCNT)), __memrchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (LZCNT) && CPU_FEATURE_USABLE (RTM)), __memrchr_avx2_rtm)