From patchwork Wed Jun 23 06:28:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 43966 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A2938388A81A for ; Wed, 23 Jun 2021 06:30:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A2938388A81A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1624429821; bh=cfjPL3CcLg6NnqYYkibJ2fVJ5TypO2pcOXxaXESH5+0=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=GCmSVqHJ/0qCGlrkyj5Fcu6OFFTWDPmtN0/DZGPMrvkvySWsNxOcdL/hhZ8Ag5NOZ hDyqL3R5HxO6H7VTFHyTON3L6U7AGYUomT1vsmAgJs4C6pD/JnmpiWbRq++QEvIsnM oOjNbYIDW3BIfKnhohHdaJUAfpyzvwR/j8uuLWXc= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by sourceware.org (Postfix) with ESMTPS id 26F6E3858039 for ; Wed, 23 Jun 2021 06:30:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 26F6E3858039 Received: by mail-qk1-x736.google.com with SMTP id o6so2498160qkh.4 for ; Tue, 22 Jun 2021 23:30:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=cfjPL3CcLg6NnqYYkibJ2fVJ5TypO2pcOXxaXESH5+0=; b=WIrWrBYsdZsNKOA2wLtL6rIe5u7km5S64QWTOZkqwitnBOCuH6rVWuWridPsMuJoN8 e6OBIqRv7s5S8jAAHB2XgLOngxMzjXy+Bsm9SpHwWkFuFFyLo7Q7VlIf0FT+jzYkUrGz pvHShTZFHkSs7zHNxSKS4LfgLeUfb8Rom6Ss8nX573AoA2URRCCpGqjyy5B4gzQOIrnt sgtVpcB6CUwZNA4erHFaKR1LbVHcB0Q3m4p/gNWg6eUEbt/MXA7Mm/6t6wE0m7N81HYs WERiDVAlTwdIafxJsvJIxZ9A8mZwT59U5qRQj8ZkozMFdJrgwclDy4vcCqeInbFKQ7Hd m3KA== X-Gm-Message-State: AOAM533OHnSOOWfz//YW1awL5hQ9Kmg1xi+IQtgnrYBxicEHeT9qY6UH I7Jrifbx6eLnHZiBoU2/vr32XUHhWy0= X-Google-Smtp-Source: ABdhPJzN7UwGOPpdCyHDHGkGO5Age1UBhB3wiP/p1IK00NqMVwv1OcNXL4trIzw+Ry5TAuywIuH8EQ== X-Received: by 2002:a37:89c5:: with SMTP id l188mr8610243qkd.27.1624429799597; Tue, 22 Jun 2021 23:29:59 -0700 (PDT) Received: from localhost.localdomain (pool-173-75-15-191.pitbpa.fios.verizon.net. [173.75.15.191]) by smtp.googlemail.com with ESMTPSA id x11sm10080909qki.23.2021.06.22.23.29.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Jun 2021 23:29:59 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1 1/4] x86-64: Add wcslen optimize for sse4.1 Date: Wed, 23 Jun 2021 02:28:22 -0400 Message-Id: <20210623062821.1166822-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" No bug. This comment adds the ifunc / build infrastructure necessary for wcslen to prefer the sse4.1 implementation in strlen-vec.S. test-wcslen.c is passing. Signed-off-by: Noah Goldstein Reviewed-by: H.J. Lu --- Rebased on [PATCH] x86-64: Move strlen.S to multiarch/strlen-vec.S sysdeps/x86_64/multiarch/Makefile | 4 +- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 3 ++ sysdeps/x86_64/multiarch/ifunc-wcslen.h | 52 ++++++++++++++++++++++ sysdeps/x86_64/multiarch/wcslen-sse4_1.S | 4 ++ sysdeps/x86_64/multiarch/wcslen.c | 2 +- sysdeps/x86_64/multiarch/wcsnlen.c | 34 +------------- 6 files changed, 63 insertions(+), 36 deletions(-) create mode 100644 sysdeps/x86_64/multiarch/ifunc-wcslen.h create mode 100644 sysdeps/x86_64/multiarch/wcslen-sse4_1.S diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile index 2c2aad3a7e..26be40959c 100644 --- a/sysdeps/x86_64/multiarch/Makefile +++ b/sysdeps/x86_64/multiarch/Makefile @@ -95,8 +95,8 @@ sysdep_routines += wmemcmp-sse4 wmemcmp-ssse3 wmemcmp-c \ wcscpy-ssse3 wcscpy-c \ wcschr-sse2 wcschr-avx2 \ wcsrchr-sse2 wcsrchr-avx2 \ - wcsnlen-sse4_1 wcsnlen-c \ - wcslen-sse2 wcslen-avx2 wcsnlen-avx2 \ + wcslen-sse2 wcslen-sse4_1 wcslen-avx2 \ + wcsnlen-c wcsnlen-sse4_1 wcsnlen-avx2 \ wcschr-avx2-rtm \ wcscmp-avx2-rtm \ wcslen-avx2-rtm \ diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index 15eda47667..dbd1ebf298 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -684,6 +684,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (AVX512BW) && CPU_FEATURE_USABLE (BMI2)), __wcslen_evex) + IFUNC_IMPL_ADD (array, i, wcsnlen, + CPU_FEATURE_USABLE (SSE4_1), + __wcsnlen_sse4_1) IFUNC_IMPL_ADD (array, i, wcslen, 1, __wcslen_sse2)) /* Support sysdeps/x86_64/multiarch/wcsnlen.c. */ diff --git a/sysdeps/x86_64/multiarch/ifunc-wcslen.h b/sysdeps/x86_64/multiarch/ifunc-wcslen.h new file mode 100644 index 0000000000..39e3347378 --- /dev/null +++ b/sysdeps/x86_64/multiarch/ifunc-wcslen.h @@ -0,0 +1,52 @@ +/* Common definition for ifunc selections for wcslen and wcsnlen + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017-2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{ + const struct cpu_features* cpu_features = __get_cpu_features (); + + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && CPU_FEATURE_USABLE_P (cpu_features, BMI2) + && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) + { + if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) + && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) + return OPTIMIZE (evex); + + if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) + return OPTIMIZE (avx2_rtm); + + if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) + return OPTIMIZE (avx2); + } + + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1)) + return OPTIMIZE (sse4_1); + + return OPTIMIZE (sse2); +} diff --git a/sysdeps/x86_64/multiarch/wcslen-sse4_1.S b/sysdeps/x86_64/multiarch/wcslen-sse4_1.S new file mode 100644 index 0000000000..7e62621afc --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcslen-sse4_1.S @@ -0,0 +1,4 @@ +#define AS_WCSLEN +#define strlen __wcslen_sse4_1 + +#include "strlen-vec.S" diff --git a/sysdeps/x86_64/multiarch/wcslen.c b/sysdeps/x86_64/multiarch/wcslen.c index f89bed42a0..3032061d3b 100644 --- a/sysdeps/x86_64/multiarch/wcslen.c +++ b/sysdeps/x86_64/multiarch/wcslen.c @@ -24,7 +24,7 @@ # undef __wcslen # define SYMBOL_NAME wcslen -# include "ifunc-avx2.h" +# include "ifunc-wcslen.h" libc_ifunc_redirected (__redirect_wcslen, __wcslen, IFUNC_SELECTOR ()); weak_alias (__wcslen, wcslen); diff --git a/sysdeps/x86_64/multiarch/wcsnlen.c b/sysdeps/x86_64/multiarch/wcsnlen.c index 4983f1b222..2963fbe059 100644 --- a/sysdeps/x86_64/multiarch/wcsnlen.c +++ b/sysdeps/x86_64/multiarch/wcsnlen.c @@ -24,39 +24,7 @@ # undef __wcsnlen # define SYMBOL_NAME wcsnlen -# include - -extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; -extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden; -extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; -extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; -extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; - -static inline void * -IFUNC_SELECTOR (void) -{ - const struct cpu_features* cpu_features = __get_cpu_features (); - - if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) - && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) - { - if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) - && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) - && CPU_FEATURE_USABLE_P (cpu_features, BMI2)) - return OPTIMIZE (evex); - - if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) - return OPTIMIZE (avx2_rtm); - - if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) - return OPTIMIZE (avx2); - } - - if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1)) - return OPTIMIZE (sse4_1); - - return OPTIMIZE (sse2); -} +# include "ifunc-wcslen.h" libc_ifunc_redirected (__redirect_wcsnlen, __wcsnlen, IFUNC_SELECTOR ()); weak_alias (__wcsnlen, wcsnlen);