Message ID | 20221001190911.2994478-1-aurelien@aurel32.net |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0CCEC385416E for <patchwork@sourceware.org>; Sat, 1 Oct 2022 19:09:57 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id 65A033858430 for <libc-alpha@sourceware.org>; Sat, 1 Oct 2022 19:09:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 65A033858430 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=Content-Transfer-Encoding:MIME-Version:Message-Id:Date: Subject:Cc:To:From:Content-Type:From:Reply-To:Subject:Content-ID: Content-Description:In-Reply-To:References:X-Debbugs-Cc; bh=lwHJLztcXV/YOKLP+NPdXT5V7cZiJYUlAOFn422g8TU=; b=m3kXnG077j0StNPgCHRDkwIQVg FIPwrF15rDNerFF+JWT16vurqtaSSkUfjiAmrMaB6uqDc4hLEiaS9CZI0SvO7on+iSrL143YFTnMH Azu7u5dIZ79Bn1+SWyJgRbYQolEAlGeBNszow4d9WCzP8/H/2KDCjOFs9+Lgz9UwWmWd6rERPvUDw UUrKv8wjf5FY3rCNik3Q95TNXAx6O3+L3zNURQ21HF2xt/2QcxwDx61aVZl54/jAJCmq645tv7S8F hgFGdVF3O2hLQ7gqxIrESpWUiLMMqPAMiSNwLtRbbdjX9fULi6KIQWpu5GC8xDmN0VEAHLDxq5n4K zQRhgDgA==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <aurelien@aurel32.net>) id 1oehrQ-00Ermz-3h; Sat, 01 Oct 2022 21:09:16 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from <aurelien@aurel32.net>) id 1oehrN-00CZ0P-10; Sat, 01 Oct 2022 21:09:13 +0200 From: Aurelien Jarno <aurelien@aurel32.net> To: libc-alpha@sourceware.org Subject: [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Date: Sat, 1 Oct 2022 21:09:07 +0200 Message-Id: <20221001190911.2994478-1-aurelien@aurel32.net> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> Cc: Aurelien Jarno <aurelien@aurel32.net> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611)
|
|
Message
Aurelien Jarno
Oct. 1, 2022, 7:09 p.m. UTC
Some early Intel Haswell CPU have AVX2 instructions, but do not have BMI2 instructions. Some AVX2 string functions only check for AVX2, but use BMI2 or LZCNT instructions. This patchset tries to fix that. While most fixes only change ifunc-impl-list.c, and thus only concerns the testsuite, the strn(case)cmp is a real issue affecting early Intel Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. On the other hand, the check for LZCNT in memrchr is purely for correctness, I am not aware of a CPU implementing AVX2 without LZCNT. This has been tested by remplacing all BMI2 and LZCNT instruction in the source code by the "ud2" instruction and disabling the BMI1, BMI2 feature detection, and running the testsuite. Resolves: BZ #29611 Aurelien Jarno (4): x86: include BMI1 and BMI2 in x86-64-v3 level x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations x86-64: Require LZCNT for AVX2 memrchr implementation sysdeps/x86/get-isa-level.h | 2 + sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 44 +++++++++++++++------ sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + sysdeps/x86_64/multiarch/strncmp.c | 4 +- 5 files changed, 38 insertions(+), 14 deletions(-)
Comments
On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > While most fixes only change ifunc-impl-list.c, and thus only concerns > the testsuite, the strn(case)cmp is a real issue affecting early Intel > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > On the other hand, the check for LZCNT in memrchr is purely for > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > source code by the "ud2" instruction and disabling the BMI1, BMI2 > feature detection, and running the testsuite. > > Resolves: BZ #29611 > > Aurelien Jarno (4): > x86: include BMI1 and BMI2 in x86-64-v3 level > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > implementations > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > x86-64: Require LZCNT for AVX2 memrchr implementation > We also need BMI2 check in ifunc-impl-list for: strcasecmp strcmp strcasecmp_l strrchr wcsrchr wcscmp If you want you can make patches, otherwise I can. > sysdeps/x86/get-isa-level.h | 2 + > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 44 +++++++++++++++------ > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > sysdeps/x86_64/multiarch/strncmp.c | 4 +- > 5 files changed, 38 insertions(+), 14 deletions(-) > > -- > 2.35.1 >
On 2022-10-01 15:11, Noah Goldstein wrote: > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > On the other hand, the check for LZCNT in memrchr is purely for > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > feature detection, and running the testsuite. > > > > Resolves: BZ #29611 > > > > Aurelien Jarno (4): > > x86: include BMI1 and BMI2 in x86-64-v3 level > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > implementations > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > We also need BMI2 check in ifunc-impl-list for: > strcasecmp > strcmp > strcasecmp_l I didn't included those, because 'bzhil' is only used the 'n' case. That said with the 'shlx' you mentioned in the other email, we should indeed include that one. > strrchr > wcsrchr > wcscmp Same for those, I missed 'shlx'. I'll fix that.
On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > On the other hand, the check for LZCNT in memrchr is purely for > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > feature detection, and running the testsuite. > > > > Resolves: BZ #29611 > > > > Aurelien Jarno (4): > > x86: include BMI1 and BMI2 in x86-64-v3 level > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > implementations > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > We also need BMI2 check in ifunc-impl-list for: > strcasecmp > strcmp > strcasecmp_l > strrchr > wcsrchr > wcscmp > > If you want you can make patches, otherwise I can. This is a duplicate of a comment I left in the strn(case)cmp patchset, but leaving here so the information is not scattered: The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport to 2.33, 2.34, 2.35. Also separate changes for ifunc need to be backport to strncmp.c: 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs BMI1. Finally a corresponding fix is needed for strcmp.c as well (there is missing BMI2 check in strcmp.c ifunc selection as well as missing checks in the impl list). > > sysdeps/x86/get-isa-level.h | 2 + > > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 44 +++++++++++++++------ > > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > > sysdeps/x86_64/multiarch/strncmp.c | 4 +- > > 5 files changed, 38 insertions(+), 14 deletions(-) > > > > -- > > 2.35.1 > >
On 2022-10-01 15:17, Noah Goldstein via Libc-alpha wrote: > On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > > > On the other hand, the check for LZCNT in memrchr is purely for > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > > feature detection, and running the testsuite. > > > > > > Resolves: BZ #29611 > > > > > > Aurelien Jarno (4): > > > x86: include BMI1 and BMI2 in x86-64-v3 level > > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > > implementations > > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > > > > We also need BMI2 check in ifunc-impl-list for: > > strcasecmp > > strcmp > > strcasecmp_l > > strrchr > > wcsrchr > > wcscmp > > > > If you want you can make patches, otherwise I can. > > This is a duplicate of a comment I left in the strn(case)cmp patchset, > but leaving here so the information is not scattered: > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > to 2.33, 2.34, 2.35. > > Also separate changes for ifunc need to be backport to strncmp.c: > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > BMI1. Is that really correct? According the commit log TZCNT is used in a way that is compatible with BSF: commit 1457016337072d1b6739f571846b619596990cb7 Author: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> Date: Thu May 3 11:09:30 2018 -0500 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector comparison as much as possible. Peak performance observed on a SkyLake machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, respectively. The larger the comparison length, the more benefit using avx2 functions, except on the strcmp, where peak is observed at length == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT.
On Sun, Oct 2, 2022 at 2:35 AM Aurelien Jarno <aurelien@aurel32.net> wrote: > > On 2022-10-01 15:17, Noah Goldstein via Libc-alpha wrote: > > On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > > > use BMI2 or LZCNT instructions. This patchset tries to fix that. Think you're right. > > > > > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > > > > > On the other hand, the check for LZCNT in memrchr is purely for > > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > > > feature detection, and running the testsuite. > > > > > > > > Resolves: BZ #29611 > > > > > > > > Aurelien Jarno (4): > > > > x86: include BMI1 and BMI2 in x86-64-v3 level > > > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > > > implementations > > > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > > > > > > > We also need BMI2 check in ifunc-impl-list for: > > > strcasecmp > > > strcmp > > > strcasecmp_l > > > strrchr > > > wcsrchr > > > wcscmp > > > > > > If you want you can make patches, otherwise I can. > > > > This is a duplicate of a comment I left in the strn(case)cmp patchset, > > but leaving here so the information is not scattered: > > > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > > to 2.33, 2.34, 2.35. > > > > Also separate changes for ifunc need to be backport to strncmp.c: > > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > > BMI1. > > Is that really correct? According the commit log TZCNT is used in a way > that is compatible with BSF: > > commit 1457016337072d1b6739f571846b619596990cb7 > Author: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> > Date: Thu May 3 11:09:30 2018 -0500 > > x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 > > Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector > comparison as much as possible. Peak performance observed on a SkyLake > machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, > respectively. The larger the comparison length, the more benefit using > avx2 functions, except on the strcmp, where peak is observed at length > == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper > is preferred and AVX unaligned load is fast. > > NB: It uses TZCNT instead of BSF since TZCNT produces the same result > as BSF for non-zero input. TZCNT is faster than BSF and is executed > as BSF if machine doesn't support TZCNT. > > -- > Aurelien Jarno GPG: 4096R/1DDD8C9B > aurelien@aurel32.net http://www.aurel32.net