From patchwork Thu Apr 21 03:14:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 53084 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A01B03857368 for ; Thu, 21 Apr 2022 03:16:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A01B03857368 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1650510973; bh=4j/qcAY+RV3UesuhXjnvCKbQ/vwHMsiCRivaEXvZvsg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=AHQaGhGmd1cmBfDzMNOaE/vIg07LKswcphdbsdoJeDGdRPF+FcPihOqiREjrkQlCX ihWvbucGjnvjK1SLvg1EJtu9IlfTUVPLBjRn9RahMm/SB8FMVbOWk756s2Y/HTJ3PG q4dzGuKMi9y1sTOvI3IDfeMhksgll2Ca5rfrtFzI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by sourceware.org (Postfix) with ESMTPS id 887F4385736E for ; Thu, 21 Apr 2022 03:14:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 887F4385736E Received: by mail-pg1-x52c.google.com with SMTP id x191so3518336pgd.4 for ; Wed, 20 Apr 2022 20:14:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4j/qcAY+RV3UesuhXjnvCKbQ/vwHMsiCRivaEXvZvsg=; b=eGnVXmpiebOZtTAnV2uO9pPy/JMU0mYD3BQk3aM64YKBrTTqzWx1qxLoBRSBM5pnaN hTPbmg2Ze/LaNVcOxTmRyJl58Jp94p9RMicNlAZmBFs4VeVkEV2wsPmmKZfrRLu7AVPu 2YpdnbJzT9eTYek2AZ9JK+ijjKB/ZYYxDH8H2epTQhqOXpv9ZymgTMNquaw7z1CguUvS 49EuLsQ6NjJN+zZipGx2XfFuG9pZsTTNzwMDmnFSY4kfaMVasva+40bOjDI8tUgdcLwI dzan6IPpaN2W0C2w4FTo9Zb1WnPDC55uhhSl0CLhdBTgUc3uWDuB59PrRicEuLkq22rs l2vw== X-Gm-Message-State: AOAM530KM5uTH4CrAaAWZo5PpU/wVVEb5E50Z65/ALJuCJmKFzshN/46 wGxHbHwyNA0U5d4z/ZmMq1hfxCUyCTw= X-Google-Smtp-Source: ABdhPJyIPcJMD5BtPcCmpf1U+vTwX6HMEwts0CxmqlSdCpqLFhvxBZGqjEvCfW9KKUY5mI0crYfN5w== X-Received: by 2002:a05:6a00:174f:b0:4fd:aed5:b5e4 with SMTP id j15-20020a056a00174f00b004fdaed5b5e4mr26559806pfc.39.1650510881442; Wed, 20 Apr 2022 20:14:41 -0700 (PDT) Received: from localhost.localdomain ([64.145.94.63]) by smtp.googlemail.com with ESMTPSA id n59-20020a17090a5ac100b001cd498dc153sm1424022pji.3.2022.04.20.20.14.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Apr 2022 20:14:41 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1 3/5] x86: Add wcsrchr optimized with SSE4_1 in wcsrchr-sse4_1.S Date: Wed, 20 Apr 2022 22:14:09 -0500 Message-Id: <20220421031410.2142238-3-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220421031410.2142238-1-goldstein.w.n@gmail.com> References: <20220421031410.2142238-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" wcsrchr-sse2 can't use `pminud` which can speedup the main loop: len, align, pos, seek, max_char, freq, New Time / Old Time 256, 1, 64, 23, 1273, 1, 1.082 256, 1, 64, 23, 2147483647, 1, 1.076 256, 15, 64, 23, 1273, 1, 1.061 256, 15, 64, 23, 2147483647, 1, 1.075 256, 2, 64, 23, 1273, 1, 1.108 256, 2, 64, 23, 2147483647, 1, 1.109 256, 30, 64, 23, 1273, 1, 1.072 256, 30, 64, 23, 2147483647, 1, 1.077 256, 3, 64, 23, 1273, 1, 1.108 256, 3, 64, 23, 2147483647, 1, 1.103 256, 45, 64, 23, 1273, 1, 1.076 256, 45, 64, 23, 2147483647, 1, 1.079 256, 4, 64, 23, 1273, 1, 1.119 256, 4, 64, 23, 2147483647, 1, 1.112 256, 60, 64, 23, 1273, 1, 1.117 256, 60, 64, 23, 2147483647, 1, 1.112 256, 5, 64, 23, 1273, 1, 1.21 256, 5, 64, 23, 2147483647, 1, 1.194 256, 75, 64, 23, 1273, 1, 1.055 256, 75, 64, 23, 2147483647, 1, 1.045 256, 6, 64, 23, 1273, 1, 1.264 256, 6, 64, 23, 2147483647, 1, 1.3 256, 90, 64, 23, 1273, 1, 1.022 256, 90, 64, 23, 2147483647, 1, 1.026 256, 7, 64, 23, 1273, 1, 1.316 256, 7, 64, 23, 2147483647, 1, 1.325 Overall this leads to a 5% performance improvement in the benchmark suite. Full xcheck passes on x86_64 with and without multiarch enabled. --- sysdeps/x86_64/multiarch/Makefile | 1 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 3 +++ sysdeps/x86_64/multiarch/wcsrchr-sse4_1.S | 21 +++++++++++++++++++++ sysdeps/x86_64/multiarch/wcsrchr.c | 3 ++- 4 files changed, 27 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-sse4_1.S diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile index 0400ea332b..5ad7bc8c25 100644 --- a/sysdeps/x86_64/multiarch/Makefile +++ b/sysdeps/x86_64/multiarch/Makefile @@ -154,6 +154,7 @@ sysdep_routines += \ wcsrchr-avx2-rtm \ wcsrchr-evex \ wcsrchr-sse2 \ + wcsrchr-sse4_1 \ wmemchr-avx2 \ wmemchr-avx2-rtm \ wmemchr-evex \ diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index a8afcf81bb..1cbb6938c8 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -685,6 +685,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (AVX512BW) && CPU_FEATURE_USABLE (BMI2)), __wcsrchr_evex) + IFUNC_IMPL_ADD (array, i, wcsrchr, + CPU_FEATURE_USABLE (SSE4_1), + __wcsrchr_sse4_1) IFUNC_IMPL_ADD (array, i, wcsrchr, 1, __wcsrchr_sse2)) /* Support sysdeps/x86_64/multiarch/wcscmp.c. */ diff --git a/sysdeps/x86_64/multiarch/wcsrchr-sse4_1.S b/sysdeps/x86_64/multiarch/wcsrchr-sse4_1.S new file mode 100644 index 0000000000..34b92d28eb --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcsrchr-sse4_1.S @@ -0,0 +1,21 @@ +/* wcsrchr optimized with SSE4. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define USE_AS_WCSRCHR 1 +#define STRRCHR __wcsrchr_sse4_1 +#include "../strrchr.S" diff --git a/sysdeps/x86_64/multiarch/wcsrchr.c b/sysdeps/x86_64/multiarch/wcsrchr.c index 8b30c06f2e..eb18038eec 100644 --- a/sysdeps/x86_64/multiarch/wcsrchr.c +++ b/sysdeps/x86_64/multiarch/wcsrchr.c @@ -23,7 +23,8 @@ # undef wcsrchr # define SYMBOL_NAME wcsrchr -# include "ifunc-avx2.h" + +# include "ifunc-wcslen.h" libc_ifunc_redirected (__redirect_wcsrchr, wcsrchr, IFUNC_SELECTOR ()); #endif