From patchwork Fri Nov 4 23:04:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 59977 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 224FB3858418 for ; Fri, 4 Nov 2022 23:05:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 224FB3858418 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1667603147; bh=YPDwBdnOSwqcJGPvLrbbrlrdQLwn/MQhIQk8IQMdgxQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=bT212pmyqDXw4rvCm3XktG+3/yhYoKbVJf8pNz0yf+AMFgFt4yEudhFFd4WNSY0Op di5lEJW4Ii5CU/JdxApU7GnitQ7wVHeEINdRjRjC2rC3gbk2vfVOTXf1ARmAcfd/TD RiQfWfonLsUcJT4sEOkmXKH/YrhsXywqHJROClBM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id ED7F3385841E for ; Fri, 4 Nov 2022 23:04:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org ED7F3385841E Received: by mail-ej1-x630.google.com with SMTP id ud5so16971582ejc.4 for ; Fri, 04 Nov 2022 16:04:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YPDwBdnOSwqcJGPvLrbbrlrdQLwn/MQhIQk8IQMdgxQ=; b=SFU+RVP+RA4qw3I/0N069ZZcVTS9kEqSIMqOHViWupk+glViFzWNIGZQQjwzkG+Czg dGqKRgeb4/B/YJn4kfj+vwMYwAszt7ZIr4g6ATKOkH+oJnqQ1nkEOyopGR1XDq9uB1Bk RU+H+61yUNehdApv2IHfmzMf0DIcWm8253a5dULuioyeH6M5hJJFUgcNFOTe9uCAeFtU k5RvvKJtTAJQ6+6QLVdqH/W6G/wyCq9zXuAgdKBF3mzb7GIe3v6NQ/njWmwac5SrESTO Tm5KUKDBD/jCf+vD+4Vvp2Jo2n1vSIxrG64C95gVlOpx0WbOrkCjP3u2mFtYCPdPdJhV 910w== X-Gm-Message-State: ANoB5plhd4/l1LZswKCKKqEqz5BEckKMbEuBXmWybYhQBG03BJb0PRid KZaItR9j06XNIwdZOfjkvF3DHXUf+no= X-Google-Smtp-Source: AMsMyM659bPlhAxv+mtrlihHqlGR+/1tk6xiGKJtieixDNvjl/b0PO76KsG9O2bC1lZgLgWcMe2c6g== X-Received: by 2002:a17:907:a087:b0:7ae:44d7:2db8 with SMTP id hu7-20020a170907a08700b007ae44d72db8mr1735865ejc.356.1667603064287; Fri, 04 Nov 2022 16:04:24 -0700 (PDT) Received: from noahgold-desk.intel.com ([2600:1010:b0b0:cff3:c210:78:a46c:fca8]) by smtp.gmail.com with ESMTPSA id q20-20020a170906389400b007a6c25819f1sm117019ejd.145.2022.11.04.16.04.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Nov 2022 16:04:24 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v4 4/4] x86: Add avx2 optimized functions for the wchar_t strcpy family Date: Fri, 4 Nov 2022 16:04:10 -0700 Message-Id: <20221104230410.465050-4-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221104230410.465050-1-goldstein.w.n@gmail.com> References: <20221103085314.1069528-2-goldstein.w.n@gmail.com> <20221104230410.465050-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Implemented: wcscat-avx2{+rtm} (+ 744 * 2 bytes wcscpy-avx2{+rtm} (+ 539 * 2 bytes) wcpcpy-avx2{+rtm} (+ 577 * 2 bytes) wcsncpy-avx2{+rtm} (+1108 * 2 bytes) wcpncpy-avx2{+rtm} (+1214 * 2 bytes) wcsncat-avx2{+rtm} (+1085 * 2 bytes) Performance Changes: Times are from N = 10 runs of the benchmark suite and are reported as geometric mean of all ratios of New Implementation / Best Old Implementation. Best Old Implementation was determined with the highest ISA implementation. wcscat-avx2 -> 0.975 wcscpy-avx2 -> 0.591 wcpcpy-avx2 -> 0.698 wcsncpy-avx2 -> 0.730 wcpncpy-avx2 -> 0.711 wcsncat-avx2 -> 0.954 Code Size Changes: This change increase the size of libc.so by ~11kb bytes. For reference the patch optimizing the normal strcpy family functions decreases libc.so by ~5.2kb. Full check passes on x86-64 and build succeeds for all ISA levels w/ and w/o multiarch. --- sysdeps/x86_64/multiarch/Makefile | 12 ++++ sysdeps/x86_64/multiarch/ifunc-impl-list.c | 66 +++++++++++++++++++++ sysdeps/x86_64/multiarch/ifunc-wcs.h | 11 ++++ sysdeps/x86_64/multiarch/wcpcpy-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/wcpcpy-avx2.S | 8 +++ sysdeps/x86_64/multiarch/wcpcpy-generic.c | 2 +- sysdeps/x86_64/multiarch/wcpncpy-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/wcpncpy-avx2.S | 8 +++ sysdeps/x86_64/multiarch/wcpncpy-generic.c | 2 +- sysdeps/x86_64/multiarch/wcscat-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/wcscat-avx2.S | 10 ++++ sysdeps/x86_64/multiarch/wcscat-generic.c | 2 +- sysdeps/x86_64/multiarch/wcscpy-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/wcscpy-avx2.S | 7 +++ sysdeps/x86_64/multiarch/wcscpy-generic.c | 2 +- sysdeps/x86_64/multiarch/wcscpy.c | 9 +++ sysdeps/x86_64/multiarch/wcsncat-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/wcsncat-avx2.S | 9 +++ sysdeps/x86_64/multiarch/wcsncat-generic.c | 2 +- sysdeps/x86_64/multiarch/wcsncpy-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/wcsncpy-avx2.S | 7 +++ sysdeps/x86_64/multiarch/wcsncpy-generic.c | 2 +- sysdeps/x86_64/wcpcpy-generic.c | 2 +- sysdeps/x86_64/wcpcpy.S | 3 +- sysdeps/x86_64/wcpncpy-generic.c | 2 +- sysdeps/x86_64/wcpncpy.S | 3 +- sysdeps/x86_64/wcscat-generic.c | 2 +- sysdeps/x86_64/wcscat.S | 3 +- sysdeps/x86_64/wcscpy.S | 1 + sysdeps/x86_64/wcsncat-generic.c | 2 +- sysdeps/x86_64/wcsncat.S | 3 +- sysdeps/x86_64/wcsncpy-generic.c | 2 +- sysdeps/x86_64/wcsncpy.S | 3 +- 33 files changed, 187 insertions(+), 16 deletions(-) create mode 100644 sysdeps/x86_64/multiarch/wcpcpy-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcpcpy-avx2.S create mode 100644 sysdeps/x86_64/multiarch/wcpncpy-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcpncpy-avx2.S create mode 100644 sysdeps/x86_64/multiarch/wcscat-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcscat-avx2.S create mode 100644 sysdeps/x86_64/multiarch/wcscpy-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcscpy-avx2.S create mode 100644 sysdeps/x86_64/multiarch/wcsncat-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcsncat-avx2.S create mode 100644 sysdeps/x86_64/multiarch/wcsncpy-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcsncpy-avx2.S diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile index d6e01940c3..f848fc0e28 100644 --- a/sysdeps/x86_64/multiarch/Makefile +++ b/sysdeps/x86_64/multiarch/Makefile @@ -131,10 +131,16 @@ endif ifeq ($(subdir),wcsmbs) sysdep_routines += \ + wcpcpy-avx2 \ + wcpcpy-avx2-rtm \ wcpcpy-evex \ wcpcpy-generic \ + wcpncpy-avx2 \ + wcpncpy-avx2-rtm \ wcpncpy-evex \ wcpncpy-generic \ + wcscat-avx2 \ + wcscat-avx2-rtm \ wcscat-evex \ wcscat-generic \ wcschr-avx2 \ @@ -146,6 +152,8 @@ sysdep_routines += \ wcscmp-avx2-rtm \ wcscmp-evex \ wcscmp-sse2 \ + wcscpy-avx2 \ + wcscpy-avx2-rtm \ wcscpy-evex \ wcscpy-generic \ wcscpy-ssse3 \ @@ -155,11 +163,15 @@ sysdep_routines += \ wcslen-evex512 \ wcslen-sse2 \ wcslen-sse4_1 \ + wcsncat-avx2 \ + wcsncat-avx2-rtm \ wcsncat-evex \ wcsncat-generic \ wcsncmp-avx2 \ wcsncmp-avx2-rtm \ wcsncmp-evex \ + wcsncpy-avx2 \ + wcsncpy-avx2-rtm \ wcsncpy-evex \ wcsncpy-generic \ wcsnlen-avx2 \ diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index 959cb0b420..71e8953e91 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -908,6 +908,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __wcscpy_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, wcscpy, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), + __wcscpy_avx2) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcscpy, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) + && CPU_FEATURE_USABLE (RTM)), + __wcscpy_avx2_rtm) + X86_IFUNC_IMPL_ADD_V2 (array, i, wcscpy, CPU_FEATURE_USABLE (SSSE3), __wcscpy_ssse3) X86_IFUNC_IMPL_ADD_V1 (array, i, wcscpy, @@ -922,6 +933,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI1) && CPU_FEATURE_USABLE (BMI2)), __wcsncpy_evex) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncpy, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), + __wcsncpy_avx2) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncpy, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) + && CPU_FEATURE_USABLE (RTM)), + __wcsncpy_avx2_rtm) X86_IFUNC_IMPL_ADD_V1 (array, i, wcpncpy, 1, __wcsncpy_generic)) @@ -934,6 +956,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI1) && CPU_FEATURE_USABLE (BMI2)), __wcpcpy_evex) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcpcpy, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), + __wcpcpy_avx2) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcpcpy, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) + && CPU_FEATURE_USABLE (RTM)), + __wcpcpy_avx2_rtm) X86_IFUNC_IMPL_ADD_V1 (array, i, wcpcpy, 1, __wcpcpy_generic)) @@ -946,6 +979,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI1) && CPU_FEATURE_USABLE (BMI2)), __wcpncpy_evex) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcpncpy, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), + __wcpncpy_avx2) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcpncpy, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) + && CPU_FEATURE_USABLE (RTM)), + __wcpncpy_avx2_rtm) X86_IFUNC_IMPL_ADD_V1 (array, i, wcsncpy, 1, __wcpncpy_generic)) @@ -958,6 +1002,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI1) && CPU_FEATURE_USABLE (BMI2)), __wcscat_evex) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcscat, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), + __wcscat_avx2) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcscat, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) + && CPU_FEATURE_USABLE (RTM)), + __wcscat_avx2_rtm) X86_IFUNC_IMPL_ADD_V1 (array, i, wcscat, 1, __wcscat_generic)) @@ -970,6 +1025,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI1) && CPU_FEATURE_USABLE (BMI2)), __wcsncat_evex) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncat, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2)), + __wcsncat_avx2) + X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncat, + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI1) + && CPU_FEATURE_USABLE (BMI2) + && CPU_FEATURE_USABLE (RTM)), + __wcsncat_avx2_rtm) X86_IFUNC_IMPL_ADD_V1 (array, i, wcsncat, 1, __wcsncat_generic)) diff --git a/sysdeps/x86_64/multiarch/ifunc-wcs.h b/sysdeps/x86_64/multiarch/ifunc-wcs.h index da6e1b03d0..cda633d8fb 100644 --- a/sysdeps/x86_64/multiarch/ifunc-wcs.h +++ b/sysdeps/x86_64/multiarch/ifunc-wcs.h @@ -27,6 +27,9 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; + extern __typeof (REDIRECT_NAME) OPTIMIZE (GENERIC) attribute_hidden; static inline void * @@ -43,6 +46,14 @@ IFUNC_SELECTOR (void) if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) return OPTIMIZE (evex); + + if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) + return OPTIMIZE (avx2_rtm); + + if (X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, + Prefer_No_VZEROUPPER, !)) + return OPTIMIZE (avx2); + } return OPTIMIZE (GENERIC); diff --git a/sysdeps/x86_64/multiarch/wcpcpy-avx2-rtm.S b/sysdeps/x86_64/multiarch/wcpcpy-avx2-rtm.S new file mode 100644 index 0000000000..756280a3ab --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcpcpy-avx2-rtm.S @@ -0,0 +1,3 @@ +#define WCPCPY __wcpcpy_avx2_rtm +#include "x86-avx2-rtm-vecs.h" +#include "wcpcpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcpcpy-avx2.S b/sysdeps/x86_64/multiarch/wcpcpy-avx2.S new file mode 100644 index 0000000000..0fffd912d3 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcpcpy-avx2.S @@ -0,0 +1,8 @@ +#ifndef WCPCPY +# define WCPCPY __wcpcpy_avx2 +#endif + +#define USE_AS_STPCPY +#define USE_AS_WCSCPY +#define STRCPY WCPCPY +#include "strcpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcpcpy-generic.c b/sysdeps/x86_64/multiarch/wcpcpy-generic.c index 6039196a3e..0ba29b081f 100644 --- a/sysdeps/x86_64/multiarch/wcpcpy-generic.c +++ b/sysdeps/x86_64/multiarch/wcpcpy-generic.c @@ -19,7 +19,7 @@ /* We always need to build this implementation as strspn-sse4 needs to be able to fallback to it. */ #include -#if ISA_SHOULD_BUILD (3) +#if ISA_SHOULD_BUILD (2) # define WCPCPY __wcpcpy_generic # include diff --git a/sysdeps/x86_64/multiarch/wcpncpy-avx2-rtm.S b/sysdeps/x86_64/multiarch/wcpncpy-avx2-rtm.S new file mode 100644 index 0000000000..80600d6b01 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcpncpy-avx2-rtm.S @@ -0,0 +1,3 @@ +#define WCPNCPY __wcpncpy_avx2_rtm +#include "x86-avx2-rtm-vecs.h" +#include "wcpncpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcpncpy-avx2.S b/sysdeps/x86_64/multiarch/wcpncpy-avx2.S new file mode 100644 index 0000000000..b7e594f7b7 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcpncpy-avx2.S @@ -0,0 +1,8 @@ +#ifndef WCPNCPY +# define WCPNCPY __wcpncpy_avx2 +#endif + +#define USE_AS_WCSCPY +#define USE_AS_STPCPY +#define STRNCPY WCPNCPY +#include "strncpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcpncpy-generic.c b/sysdeps/x86_64/multiarch/wcpncpy-generic.c index de8d34320e..4aab4ecdd2 100644 --- a/sysdeps/x86_64/multiarch/wcpncpy-generic.c +++ b/sysdeps/x86_64/multiarch/wcpncpy-generic.c @@ -19,7 +19,7 @@ /* We always need to build this implementation as strspn-sse4 needs to be able to fallback to it. */ #include -#if ISA_SHOULD_BUILD (3) +#if ISA_SHOULD_BUILD (2) # define WCPNCPY __wcpncpy_generic # include diff --git a/sysdeps/x86_64/multiarch/wcscat-avx2-rtm.S b/sysdeps/x86_64/multiarch/wcscat-avx2-rtm.S new file mode 100644 index 0000000000..e99449a2dc --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcscat-avx2-rtm.S @@ -0,0 +1,3 @@ +#define WCSCAT __wcscat_avx2_rtm +#include "x86-avx2-rtm-vecs.h" +#include "wcscat-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcscat-avx2.S b/sysdeps/x86_64/multiarch/wcscat-avx2.S new file mode 100644 index 0000000000..a20f23c09d --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcscat-avx2.S @@ -0,0 +1,10 @@ +#ifndef WCSCAT +# define WCSCAT __wcscat_avx2 +#endif + +#define USE_AS_WCSCPY +#define USE_AS_STRCAT + +#define STRCPY WCSCAT + +#include "strcpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcscat-generic.c b/sysdeps/x86_64/multiarch/wcscat-generic.c index d86b4d5c00..6476f85bbb 100644 --- a/sysdeps/x86_64/multiarch/wcscat-generic.c +++ b/sysdeps/x86_64/multiarch/wcscat-generic.c @@ -19,7 +19,7 @@ /* We always need to build this implementation as strspn-sse4 needs to be able to fallback to it. */ #include -#if ISA_SHOULD_BUILD (3) +#if ISA_SHOULD_BUILD (2) # define WCSCAT __wcscat_generic # include diff --git a/sysdeps/x86_64/multiarch/wcscpy-avx2-rtm.S b/sysdeps/x86_64/multiarch/wcscpy-avx2-rtm.S new file mode 100644 index 0000000000..2f800c8d3e --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcscpy-avx2-rtm.S @@ -0,0 +1,3 @@ +#define WCSCPY __wcscpy_avx2_rtm +#include "x86-avx2-rtm-vecs.h" +#include "wcscpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcscpy-avx2.S b/sysdeps/x86_64/multiarch/wcscpy-avx2.S new file mode 100644 index 0000000000..6bc509da07 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcscpy-avx2.S @@ -0,0 +1,7 @@ +#ifndef WCSCPY +# define WCSCPY __wcscpy_avx2 +#endif + +#define USE_AS_WCSCPY +#define STRCPY WCSCPY +#include "strcpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcscpy-generic.c b/sysdeps/x86_64/multiarch/wcscpy-generic.c index 4a1fffae4b..600d606c45 100644 --- a/sysdeps/x86_64/multiarch/wcscpy-generic.c +++ b/sysdeps/x86_64/multiarch/wcscpy-generic.c @@ -18,7 +18,7 @@ #include -#if ISA_SHOULD_BUILD (3) +#if ISA_SHOULD_BUILD (2) # define WCSCPY __wcscpy_generic # include diff --git a/sysdeps/x86_64/multiarch/wcscpy.c b/sysdeps/x86_64/multiarch/wcscpy.c index efe32e505f..7f6387817b 100644 --- a/sysdeps/x86_64/multiarch/wcscpy.c +++ b/sysdeps/x86_64/multiarch/wcscpy.c @@ -28,6 +28,9 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; + extern __typeof (REDIRECT_NAME) OPTIMIZE (ssse3) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (generic) attribute_hidden; @@ -45,6 +48,12 @@ IFUNC_SELECTOR (void) if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) return OPTIMIZE (evex); + + if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) + return OPTIMIZE (avx2_rtm); + + if (X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) + return OPTIMIZE (avx2); } if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, SSSE3)) diff --git a/sysdeps/x86_64/multiarch/wcsncat-avx2-rtm.S b/sysdeps/x86_64/multiarch/wcsncat-avx2-rtm.S new file mode 100644 index 0000000000..609d6e69c0 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcsncat-avx2-rtm.S @@ -0,0 +1,3 @@ +#define WCSNCAT __wcsncat_avx2_rtm +#include "x86-avx2-rtm-vecs.h" +#include "wcsncat-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcsncat-avx2.S b/sysdeps/x86_64/multiarch/wcsncat-avx2.S new file mode 100644 index 0000000000..a72105b7e9 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcsncat-avx2.S @@ -0,0 +1,9 @@ +#ifndef WCSNCAT +# define WCSNCAT __wcsncat_avx2 +#endif + +#define USE_AS_WCSCPY +#define USE_AS_STRCAT + +#define STRNCAT WCSNCAT +#include "strncat-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcsncat-generic.c b/sysdeps/x86_64/multiarch/wcsncat-generic.c index 4b55cb40bc..9ced02b35e 100644 --- a/sysdeps/x86_64/multiarch/wcsncat-generic.c +++ b/sysdeps/x86_64/multiarch/wcsncat-generic.c @@ -19,7 +19,7 @@ /* We always need to build this implementation as strspn-sse4 needs to be able to fallback to it. */ #include -#if ISA_SHOULD_BUILD (3) +#if ISA_SHOULD_BUILD (2) # define WCSNCAT __wcsncat_generic # include diff --git a/sysdeps/x86_64/multiarch/wcsncpy-avx2-rtm.S b/sysdeps/x86_64/multiarch/wcsncpy-avx2-rtm.S new file mode 100644 index 0000000000..cab5a6b820 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcsncpy-avx2-rtm.S @@ -0,0 +1,3 @@ +#define WCSNCPY __wcsncpy_avx2_rtm +#include "x86-avx2-rtm-vecs.h" +#include "wcsncpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcsncpy-avx2.S b/sysdeps/x86_64/multiarch/wcsncpy-avx2.S new file mode 100644 index 0000000000..3a1a8a372c --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcsncpy-avx2.S @@ -0,0 +1,7 @@ +#ifndef WCSNCPY +# define WCSNCPY __wcsncpy_avx2 +#endif + +#define USE_AS_WCSCPY +#define STRNCPY WCSNCPY +#include "strncpy-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcsncpy-generic.c b/sysdeps/x86_64/multiarch/wcsncpy-generic.c index d0e8a86605..693521713b 100644 --- a/sysdeps/x86_64/multiarch/wcsncpy-generic.c +++ b/sysdeps/x86_64/multiarch/wcsncpy-generic.c @@ -19,7 +19,7 @@ /* We always need to build this implementation as strspn-sse4 needs to be able to fallback to it. */ #include -#if ISA_SHOULD_BUILD (3) +#if ISA_SHOULD_BUILD (2) # define WCSNCPY __wcsncpy_generic # include diff --git a/sysdeps/x86_64/wcpcpy-generic.c b/sysdeps/x86_64/wcpcpy-generic.c index d52525f288..2e4d69a500 100644 --- a/sysdeps/x86_64/wcpcpy-generic.c +++ b/sysdeps/x86_64/wcpcpy-generic.c @@ -24,7 +24,7 @@ #include -#if MINIMUM_X86_ISA_LEVEL <= 3 +#if MINIMUM_X86_ISA_LEVEL <= 2 # include diff --git a/sysdeps/x86_64/wcpcpy.S b/sysdeps/x86_64/wcpcpy.S index 97e9207c16..cfde4309fe 100644 --- a/sysdeps/x86_64/wcpcpy.S +++ b/sysdeps/x86_64/wcpcpy.S @@ -24,11 +24,12 @@ #include -#if MINIMUM_X86_ISA_LEVEL >= 4 +#if MINIMUM_X86_ISA_LEVEL >= 3 # define WCPCPY __wcpcpy # define DEFAULT_IMPL_V4 "multiarch/wcpcpy-evex.S" +# define DEFAULT_IMPL_V3 "multiarch/wcpcpy-avx2.S" /* isa-default-impl.h expects DEFAULT_IMPL_V1 to be defined but it should never be used from here. */ # define DEFAULT_IMPL_V1 "ERROR -- Invalid ISA IMPL" diff --git a/sysdeps/x86_64/wcpncpy-generic.c b/sysdeps/x86_64/wcpncpy-generic.c index 871219a445..1f12a0e4c6 100644 --- a/sysdeps/x86_64/wcpncpy-generic.c +++ b/sysdeps/x86_64/wcpncpy-generic.c @@ -24,7 +24,7 @@ #include -#if MINIMUM_X86_ISA_LEVEL <= 3 +#if MINIMUM_X86_ISA_LEVEL <= 2 # include diff --git a/sysdeps/x86_64/wcpncpy.S b/sysdeps/x86_64/wcpncpy.S index 2169ed5545..2f89482d30 100644 --- a/sysdeps/x86_64/wcpncpy.S +++ b/sysdeps/x86_64/wcpncpy.S @@ -24,11 +24,12 @@ #include -#if MINIMUM_X86_ISA_LEVEL >= 4 +#if MINIMUM_X86_ISA_LEVEL >= 3 # define WCPNCPY __wcpncpy # define DEFAULT_IMPL_V4 "multiarch/wcpncpy-evex.S" +# define DEFAULT_IMPL_V3 "multiarch/wcpncpy-avx2.S" /* isa-default-impl.h expects DEFAULT_IMPL_V1 to be defined but it should never be used from here. */ # define DEFAULT_IMPL_V1 "ERROR -- Invalid ISA IMPL" diff --git a/sysdeps/x86_64/wcscat-generic.c b/sysdeps/x86_64/wcscat-generic.c index 85f981a81f..3552167ebe 100644 --- a/sysdeps/x86_64/wcscat-generic.c +++ b/sysdeps/x86_64/wcscat-generic.c @@ -24,7 +24,7 @@ #include -#if MINIMUM_X86_ISA_LEVEL <= 3 +#if MINIMUM_X86_ISA_LEVEL <= 2 # include diff --git a/sysdeps/x86_64/wcscat.S b/sysdeps/x86_64/wcscat.S index 8432087c7c..2e59987e76 100644 --- a/sysdeps/x86_64/wcscat.S +++ b/sysdeps/x86_64/wcscat.S @@ -24,11 +24,12 @@ #include -#if MINIMUM_X86_ISA_LEVEL >= 4 +#if MINIMUM_X86_ISA_LEVEL >= 3 # define WCSCAT __wcscat # define DEFAULT_IMPL_V4 "multiarch/wcscat-evex.S" +# define DEFAULT_IMPL_V3 "multiarch/wcscat-avx2.S" /* isa-default-impl.h expects DEFAULT_IMPL_V1 to be defined but it should never be used from here. */ # define DEFAULT_IMPL_V1 "ERROR -- Invalid ISA IMPL" diff --git a/sysdeps/x86_64/wcscpy.S b/sysdeps/x86_64/wcscpy.S index ff8bdd3aea..ab9288ed74 100644 --- a/sysdeps/x86_64/wcscpy.S +++ b/sysdeps/x86_64/wcscpy.S @@ -29,6 +29,7 @@ # define WCSCPY __wcscpy # define DEFAULT_IMPL_V4 "multiarch/wcscpy-evex.S" +# define DEFAULT_IMPL_V3 "multiarch/wcscpy-avx2.S" # define DEFAULT_IMPL_V2 "multiarch/wcscpy-ssse3.S" /* isa-default-impl.h expects DEFAULT_IMPL_V1 to be defined but it should never be used from here. */ diff --git a/sysdeps/x86_64/wcsncat-generic.c b/sysdeps/x86_64/wcsncat-generic.c index 2cc0f7b11a..47f6a8ad56 100644 --- a/sysdeps/x86_64/wcsncat-generic.c +++ b/sysdeps/x86_64/wcsncat-generic.c @@ -24,7 +24,7 @@ #include -#if MINIMUM_X86_ISA_LEVEL <= 3 +#if MINIMUM_X86_ISA_LEVEL <= 2 # include diff --git a/sysdeps/x86_64/wcsncat.S b/sysdeps/x86_64/wcsncat.S index 64e144a9c7..9a55499131 100644 --- a/sysdeps/x86_64/wcsncat.S +++ b/sysdeps/x86_64/wcsncat.S @@ -24,11 +24,12 @@ #include -#if MINIMUM_X86_ISA_LEVEL >= 4 +#if MINIMUM_X86_ISA_LEVEL >= 3 # define WCSNCAT wcsncat # define DEFAULT_IMPL_V4 "multiarch/wcsncat-evex.S" +# define DEFAULT_IMPL_V3 "multiarch/wcsncat-avx2.S" /* isa-default-impl.h expects DEFAULT_IMPL_V1 to be defined but it should never be used from here. */ # define DEFAULT_IMPL_V1 "ERROR -- Invalid ISA IMPL" diff --git a/sysdeps/x86_64/wcsncpy-generic.c b/sysdeps/x86_64/wcsncpy-generic.c index 49d06b8ae8..7f19fcaddc 100644 --- a/sysdeps/x86_64/wcsncpy-generic.c +++ b/sysdeps/x86_64/wcsncpy-generic.c @@ -24,7 +24,7 @@ #include -#if MINIMUM_X86_ISA_LEVEL <= 3 +#if MINIMUM_X86_ISA_LEVEL <= 2 # include diff --git a/sysdeps/x86_64/wcsncpy.S b/sysdeps/x86_64/wcsncpy.S index 1450c1aa28..dc44b32395 100644 --- a/sysdeps/x86_64/wcsncpy.S +++ b/sysdeps/x86_64/wcsncpy.S @@ -24,11 +24,12 @@ #include -#if MINIMUM_X86_ISA_LEVEL >= 4 +#if MINIMUM_X86_ISA_LEVEL >= 3 # define WCSNCPY __wcsncpy # define DEFAULT_IMPL_V4 "multiarch/wcsncpy-evex.S" +# define DEFAULT_IMPL_V3 "multiarch/wcsncpy-avx2.S" /* isa-default-impl.h expects DEFAULT_IMPL_V1 to be defined but it should never be used from here. */ # define DEFAULT_IMPL_V1 "ERROR -- Invalid ISA IMPL"