From patchwork Sat Oct 1 19:09:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58260 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E499E3851C34 for ; Sat, 1 Oct 2022 19:09:35 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id 646FA385841F for ; Sat, 1 Oct 2022 19:09:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 646FA385841F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=QLvUumlvkDt35Af98mJ6IL6kSj4LksLonKMDGKI9z6M=; b=pDDM6XY2ofqosBDhTUDs6YijAt puqlQnIfYHS/USWT9XQDQ0nhw5/7GpHDGcwF7SedfjRw7J2eRw195SSXoxbloCD/oa22/RRBkM46d 2R9+PGytUepLc9kSh9L63j6l9IlCyxh9zxiZQ3x3jUbhiQaJUV/3kuAUrCCfLQgyAZ4v3E3PaZEOh ycKcXcL+XEbWI6BHF9xy/KgZDlPeM5kV7FSX9BhggojWLRk3mqOVUw+G225CzHf9gBQTWxPOCDxyZ w2OEiUKEBvAkataQ3nAd8R1q1B7pO00Nd9QxRkRNwoPja9Sy3Zk/xqS7rxhb6yFC5R8fZG6ZuTODe /6WoPC2Q==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oehrQ-00Ern0-3h; Sat, 01 Oct 2022 21:09:16 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oehrN-00CZ0S-15; Sat, 01 Oct 2022 21:09:13 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH 1/4] x86: include BMI1 and BMI2 in x86-64-v3 level Date: Sat, 1 Oct 2022 21:09:08 +0200 Message-Id: <20221001190911.2994478-2-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221001190911.2994478-1-aurelien@aurel32.net> References: <20221001190911.2994478-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The "System V Application Binary Interface AMD64 Architecture Processor Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3 level. --- sysdeps/x86/get-isa-level.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sysdeps/x86/get-isa-level.h b/sysdeps/x86/get-isa-level.h index 1ade78ab73..5b4dd5f062 100644 --- a/sysdeps/x86/get-isa-level.h +++ b/sysdeps/x86/get-isa-level.h @@ -47,6 +47,8 @@ get_isa_level (const struct cpu_features *cpu_features) isa_level |= GNU_PROPERTY_X86_ISA_1_V2; if (CPU_FEATURE_USABLE_P (cpu_features, AVX) && CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && CPU_FEATURE_USABLE_P (cpu_features, BMI1) + && CPU_FEATURE_USABLE_P (cpu_features, BMI2) && CPU_FEATURE_USABLE_P (cpu_features, F16C) && CPU_FEATURE_USABLE_P (cpu_features, FMA) && CPU_FEATURE_USABLE_P (cpu_features, LZCNT) From patchwork Sat Oct 1 19:09:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58261 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AA2CD386074C for ; Sat, 1 Oct 2022 19:09:39 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id 68D6F3858439 for ; Sat, 1 Oct 2022 19:09:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 68D6F3858439 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=4lzWagvdSzMKdtbanAagGz7uFR6K3E85zVqyRfMbCJg=; b=MeVtDw++kSg8NRtOW+DR/g/7uT 1Xj+y/cdp3UDtc+u4GLnYoVRWtfQZNMtSkMyNzRWNiwbgEaGUDyfXL0DTN+f16DxQw4pA5UgCc1bu tEyXAJ7E3d4ggv4iLKxeEtwt19hKSqQOxYnS9gA8PbyER+gVQ5gQZedC9J52jSljnDgFkmT2hoRDt RguPXar9k3EdhWdjfZGakEdZ6zDZlKdFqaiNH0zERnAm3/TXCeAqQP0sv2Z5IkI6MJsCteMypD0Vk UYYqLsykhOvTYTfi9Th3cKf4PUspzNNKFIKd05ZDJquuKxtTqL7xeVKdTUgHPhgr4jpkfyQR9+FNF GkXTwJSg==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oehrQ-00Ern1-4F; Sat, 01 Oct 2022 21:09:16 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oehrN-00CZ0W-1A; Sat, 01 Oct 2022 21:09:13 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations Date: Sat, 1 Oct 2022 21:09:09 +0200 Message-Id: <20221001190911.2994478-3-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221001190911.2994478-1-aurelien@aurel32.net> References: <20221001190911.2994478-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 strncmp, strncasecmp and wcsncmp implementations use the bzhil instructions, which belongs to the BMI2 CPU feature. Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 25 +++++++++++++++------ sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + sysdeps/x86_64/multiarch/strncmp.c | 4 ++-- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index a71444eccb..ec1a8bff5e 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -638,13 +638,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncasecmp, X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncasecmp_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp, @@ -660,13 +663,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncasecmp_l, X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + & CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_l_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2), __strncasecmp_l_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncasecmp_l_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l, @@ -816,10 +822,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __wcsncmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __wcsncmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wcsncmp_avx2_rtm) /* ISA V2 wrapper for GENERIC implementation because the @@ -1162,13 +1170,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncmp, X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strncmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncmp_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp, diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h index 68646ef199..7622af259c 100644 --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c index 4ebe4bde30..c4f8b6bbb5 100644 --- a/sysdeps/x86_64/multiarch/strncmp.c +++ b/sysdeps/x86_64/multiarch/strncmp.c @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) return OPTIMIZE (evex); if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) From patchwork Sat Oct 1 19:09:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58259 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5819D3853561 for ; Sat, 1 Oct 2022 19:09:34 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id 605633858D38 for ; Sat, 1 Oct 2022 19:09:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 605633858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=JTlpRFeJE48CCqkVWkNZ+uf6PGe/Lhvcn23jb0Q4ZfE=; b=wFRO3qyOmHB7m7kXXIsiJ5CQKd lj7sZ/5gDVg+auroPgqrA4tjG/Rb+D3losGArLqcyMntcrwCtHx8vD/yEHs5VUItReJLYeDwAY+xp 3PI7jPb2KYgHcbElVwRjc56h8E6KpZcqRxIq4j9dzpUjTMNXsxgej0qdpaNhGOCvutX+L20n+KTU0 Oyap7eY0XmwYg5LIt8IUKAq9XH+wEHC+mmF8lNExkxe5ge84S6IJ4a1rwUknUB+V9bBAEcU63mH2r k2Jgnpavga0GhBfDINp9GRX9Dur5LxJPqh01LBDJmH84uoq2hoND3CLAY1fcoCROBMYst8oHFAY6E zmVyD0QQ==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oehrQ-00Ern2-3u; Sat, 01 Oct 2022 21:09:16 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oehrN-00CZ0a-1F; Sat, 01 Oct 2022 21:09:13 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH 3/4] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Date: Sat, 1 Oct 2022 21:09:10 +0200 Message-Id: <20221001190911.2994478-4-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221001190911.2994478-1-aurelien@aurel32.net> References: <20221001190911.2994478-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 memchr, rawmemchr and wmemchr implementations use the bzhiq and sarxl instructions, which belongs to the BMI2 CPU feature. Fixes: acfd088a1963 ("x86: Optimize memchr-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index ec1a8bff5e..c628462d47 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -69,10 +69,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __memchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, memchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __memchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, memchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __memchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -335,10 +337,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __rawmemchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __rawmemchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __rawmemchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -917,10 +921,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __wmemchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __wmemchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wmemchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 From patchwork Sat Oct 1 19:09:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58258 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F06513853815 for ; Sat, 1 Oct 2022 19:09:32 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id 607BF3858C20 for ; Sat, 1 Oct 2022 19:09:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 607BF3858C20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=d2so11i5O523/PQ9pl72ujvJfPglZ8R4dNzG059XDRQ=; b=uImY1+bAJp6GJf8+kx/H26h1rX wg20VGhkOq4Yl5CUj0+CGV6yGZvoFhzo02XU8xbrVpBgikvZndn+KiFc7xYtZl8Bzdr3o/1YzYclE 1pSRsf4gFUoTwZAyVook9y5a7Wu9Em12MGiyfRo2iyqVUdFQdssX99m+duC2dHoFNPdgBwJtwL8b0 u/QQ7BDz5HExdkO6BDXoKdEiO8xgoQ2HRzSSR6QUVZlcs47f2UAKbd7+AxFjdd80pw3ZvjG4QM5GY zX96w8PdcqSgJIuDMB/JA0jPJo59apU4ZwfCuYmmk2XUoXL0eKTcVpy7NNqDC/9Rp047t6o6/1OmD eOLnSGEw==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oehrQ-00Ern3-3u; Sat, 01 Oct 2022 21:09:16 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oehrN-00CZ0e-1K; Sat, 01 Oct 2022 21:09:13 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation Date: Sat, 1 Oct 2022 21:09:11 +0200 Message-Id: <20221001190911.2994478-5-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221001190911.2994478-1-aurelien@aurel32.net> References: <20221001190911.2994478-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 memrchr implementation uses the lzcntl and lzcntq instructions, which belongs to the LZCNT CPU feature. Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 7 +++++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h index a57a9952f3..f1741083fd 100644 --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h @@ -37,6 +37,7 @@ IFUNC_SELECTOR (void) if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index c628462d47..db5a2032d6 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -209,13 +209,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, memrchr, X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (LZCNT)), __memrchr_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (LZCNT)), __memrchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (LZCNT) && CPU_FEATURE_USABLE (RTM)), __memrchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2