From patchwork Mon Jul 18 10:38:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 56115 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8C0E8385625A for ; Mon, 18 Jul 2022 10:38:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8C0E8385625A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1658140720; bh=TXEGxnewJCQy+SCIgsiRbIGpgGJhu7Y60NkNiAL3pi8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=ROMceYcOHN2YjIV4OPSkiz6tOGGiBeTiW8YGvw78QDKohbYlq0sMRqIkzt7RW8zry xr1kH3vLeImL4JKKtRgoAogAFFgTJPft2T19fBUIfI4MiGyeR0PuWNkFOmEt6Uz8sZ 70M1/R+HpMNa/40YQSK6DsXqwskcSOejjLk95tL4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id 82AAE3857B88 for ; Mon, 18 Jul 2022 10:38:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 82AAE3857B88 Received: by mail-pf1-x435.google.com with SMTP id y9so10205528pff.12 for ; Mon, 18 Jul 2022 03:38:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TXEGxnewJCQy+SCIgsiRbIGpgGJhu7Y60NkNiAL3pi8=; b=XP7k0pKFlGXum9+XQNPbzr32VWFnqFftDbnHAgKJ/q+WdUhuQ3Suv8sw9NnYGxps7A 9YgtSK4XgWXTSEZF/ZVi9xF00V5LEZ86yjjEwReQ/sW3/Lw7dxPHkR0c0E8WNyD3njKX +fpvYGkrTKXwT4S8FYzwM0XaD2QKgllj6nkIQ5pMDaR83NjHYov37Kn194o41mh6S6Pz QF7EVYkemP7WdkWgjq8TKKmb+h9eQJgXmsdHHfXUO7ojCnETCRH/F/Wm3WdM8iyHcXnL ZoxM07Nb2fE3EgyuRSfCy7Lv0d65nZpUa/IS0aFEeYuHrD89D1/bi3FSrICQUlkTxUZF 4KBQ== X-Gm-Message-State: AJIora9b/QgipJNKKrKayIjH2jKk+A2NzHtquO9k6FTNjnYetMYgRNpj EZoAlPLvJk1RTKgvpKri7IjJ4y8Nkd19bQ== X-Google-Smtp-Source: AGRyM1uN6hULJR+8t7op/DhtAWexKnUvVwc/TceeBcGlOhT9Rn0Qjjfr7+nEr2PPT6c3ivTacS32pQ== X-Received: by 2002:a63:1917:0:b0:419:b8e8:233 with SMTP id z23-20020a631917000000b00419b8e80233mr18651329pgl.271.1658140698298; Mon, 18 Jul 2022 03:38:18 -0700 (PDT) Received: from noah-tgl.. ([210.177.125.238]) by smtp.gmail.com with ESMTPSA id n10-20020a170902e54a00b00161e50e2245sm9151442plf.178.2022.07.18.03.38.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 03:38:18 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1] x86: Continue building memmove-ssse3.S as ISA level V3 Date: Mon, 18 Jul 2022 03:38:11 -0700 Message-Id: <20220718103811.1842054-2-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220718103811.1842054-1-goldstein.w.n@gmail.com> References: <20220718103811.1842054-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_ABUSEAT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Some V3 processors still strongly prefer memmove-ssse3.S because it is heavily optimized to avoid unaligned memory accesses. Tested builds for x86-64 v1, v2, v3, and v4 with and without multiarch. --- sysdeps/x86/isa-level.h | 15 +++++++++++ sysdeps/x86_64/multiarch/ifunc-impl-list.c | 30 +++++++++++++--------- sysdeps/x86_64/multiarch/ifunc-memmove.h | 14 +++++----- sysdeps/x86_64/multiarch/memmove-ssse3.S | 4 ++- 4 files changed, 44 insertions(+), 19 deletions(-) diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index fe56af7e2b..f49336acf3 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -90,6 +90,14 @@ /* For X86_ISA_CPU_FEATURES_ARCH_P. */ + +/* NB: This is just an alias to `AVX_Fast_Unaligned_Load` that will + continue doing runtime check up to ISA level >= 4. This is for + some Zhaoxin CPUs which build at ISA level V3 but still have a + strong preference for avoiding unaligned `ymm` loads. */ +#define V4_AVX_Fast_Unaligned_Load_X86_ISA_LEVEL 4 +#define V4_AVX_Fast_Unaligned_Load AVX_Fast_Unaligned_Load + /* NB: This feature is enabled when ISA level >= 3, which was disabled for the following CPUs: - AMD Excavator @@ -106,6 +114,13 @@ this feature don't run on glibc built with ISA level >= 3. */ #define Slow_SSE4_2_X86_ISA_LEVEL 3 +/* NB: This is just an alias to `Fast_Unaligned_Copy` that will + continue doing runtime check up to ISA level >= 3. This is for + some Zhaoxin CPUs which build at ISA level V3 but still have a + strong preference for avoiding unaligned `ymm` loads. */ +#define V3_Fast_Unaligned_Copy_X86_ISA_LEVEL 3 +#define V3_Fast_Unaligned_Copy Fast_Unaligned_Copy + /* Feature(s) enabled when ISA level >= 2. */ #define Fast_Unaligned_Load_X86_ISA_LEVEL 2 diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index a71444eccb..427f127427 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -143,8 +143,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __memmove_chk_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, __memmove_chk, CPU_FEATURE_USABLE (SSSE3), __memmove_chk_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -190,8 +191,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __memmove_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, memmove, CPU_FEATURE_USABLE (SSSE3), __memmove_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -1004,8 +1006,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __memcpy_chk_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, __memcpy_chk, CPU_FEATURE_USABLE (SSSE3), __memcpy_chk_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -1051,8 +1054,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __memcpy_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, memcpy, CPU_FEATURE_USABLE (SSSE3), __memcpy_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -1098,8 +1102,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __mempcpy_chk_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, __mempcpy_chk, CPU_FEATURE_USABLE (SSSE3), __mempcpy_chk_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -1145,8 +1150,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __mempcpy_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, mempcpy, CPU_FEATURE_USABLE (SSSE3), __mempcpy_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 diff --git a/sysdeps/x86_64/multiarch/ifunc-memmove.h b/sysdeps/x86_64/multiarch/ifunc-memmove.h index 1643d32887..be0c758783 100644 --- a/sysdeps/x86_64/multiarch/ifunc-memmove.h +++ b/sysdeps/x86_64/multiarch/ifunc-memmove.h @@ -72,7 +72,7 @@ IFUNC_SELECTOR (void) } if (X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, - AVX_Fast_Unaligned_Load, )) + V4_AVX_Fast_Unaligned_Load, )) { if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)) { @@ -101,11 +101,13 @@ IFUNC_SELECTOR (void) } if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, SSSE3) - /* Leave this as runtime check. The SSSE3 is optimized almost - exclusively for avoiding unaligned memory access during the - copy and by and large is not better than the sse2 - implementation as a general purpose memmove. */ - && !CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Copy)) + /* Leave this as runtime check for V2. By V3 assume it must be + set. The SSSE3 is optimized almost exclusively for avoiding + unaligned memory access during the copy and by and large is + not better than the sse2 implementation as a general purpose + memmove. */ + && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, + V3_Fast_Unaligned_Copy, !)) { return OPTIMIZE (ssse3); } diff --git a/sysdeps/x86_64/multiarch/memmove-ssse3.S b/sysdeps/x86_64/multiarch/memmove-ssse3.S index 57599752c7..15cafee766 100644 --- a/sysdeps/x86_64/multiarch/memmove-ssse3.S +++ b/sysdeps/x86_64/multiarch/memmove-ssse3.S @@ -20,7 +20,9 @@ #include -#if ISA_SHOULD_BUILD (2) +/* Continue building up to ISA level V3 as some V3 CPUs strongly + prefer this implementation. */ +#if ISA_SHOULD_BUILD (3) # include # ifndef MEMMOVE