From patchwork Mon Oct 3 19:59:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aurelien Jarno X-Patchwork-Id: 58309 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9F7C03885C1E for ; Mon, 3 Oct 2022 20:01:02 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id ACECD385C338 for ; Mon, 3 Oct 2022 19:59:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org ACECD385C338 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Content-Type:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=84oP3zTs5rW+DRb/4CGlZWkcNKVhqnqHpwphjFd1N8g=; b=L6A3f25gz2AxKVT+jVbZ0G0joE jWh8P0rTuw4M7z/fm6xIQ4Qn9qwhgy6ra3+HiXijtenmSmX0kG8QzDP5e/xN4kdt94DBnziMjyi5w B6G6S85i8zwJACRTgCu2bZBZw1TyG/u2dxcEJChG1gpKbAbd7iHqW3de4Q+duX4ohYI9CSTsJzJoA BldCbO/YM8gqwqboljw6tHmWpw8CNSCfnKsjjQTF6SnrnB7ABb8uoX/mCK5emdErv6ZmFRYzkLeu3 NkrgsMLgMpEydeBRRX5A2LjcTj7oLg6/neQAUxbBT9OEaBbK4eVfNmOkmAFpbvuXQV5lnzmyd1cZt ThAw6fGQ==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ofRbS-00GHJG-PD; Mon, 03 Oct 2022 21:59:50 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1ofRbP-00Djwk-2C; Mon, 03 Oct 2022 21:59:47 +0200 From: Aurelien Jarno To: libc-alpha@sourceware.org Subject: [PATCH v3 8/8] x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations Date: Mon, 3 Oct 2022 21:59:44 +0200 Message-Id: <20221003195944.3274548-9-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221003195944.3274548-1-aurelien@aurel32.net> References: <20221003195944.3274548-1-aurelien@aurel32.net> MIME-Version: 1.0 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Aurelien Jarno Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk' instruction which belongs to the BMI1 CPU feature and the 'shrx' instruction, which belongs to the BMI2 CPU feature. Fixes: df7e295d18ff ("x86: Optimize {str|wcs}rchr-avx2") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein --- sysdeps/x86/isa-level.h | 1 + sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 17 ++++++++++++++--- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index bbb90f5c5e..06f6c9663e 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -79,6 +79,7 @@ /* ISA level >= 3 guaranteed includes. */ #define AVX_X86_ISA_LEVEL 3 #define AVX2_X86_ISA_LEVEL 3 +#define BMI1_X86_ISA_LEVEL 3 #define BMI2_X86_ISA_LEVEL 3 #define LZCNT_X86_ISA_LEVEL 3 #define MOVBE_X86_ISA_LEVEL 3 diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h index f1741083fd..f2f5e8a211 100644 --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h @@ -36,6 +36,7 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI1) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index ec1c5b55fb..00a91123d3 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -578,13 +578,19 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strrchr, X86_IFUNC_IMPL_ADD_V4 (array, i, strrchr, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), __strrchr_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), __strrchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strrchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -797,13 +803,18 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, wcsrchr, (CPU_FEATURE_USABLE (AVX512VL) && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI1) && CPU_FEATURE_USABLE (BMI2)), __wcsrchr_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), __wcsrchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wcsrchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2