From patchwork Mon Jul 18 10:38:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 56114 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F05FF3852749 for ; Mon, 18 Jul 2022 10:38:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F05FF3852749 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1658140719; bh=u2BC+GHAX8JRUYAAZlTeZ7IVTxmbt7L2CzzHEH76m5M=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Ua4JFZF9+2LZrO3H++34NG6TDeGGsnRKkh7QVMj81h+xrxn02W4DNtyobFMjrOViS hpfRVkpyxRpTLDu+ickvpqtBjEb4mQfEWHtPmUSjNA993v1yf7WhJ22M9I4lBUzwPB AwfO6HSNerCEFBCBARKI23XmDifrvMZpZCvBnuwI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by sourceware.org (Postfix) with ESMTPS id 9DB213857404 for ; Mon, 18 Jul 2022 10:38:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9DB213857404 Received: by mail-pf1-x430.google.com with SMTP id e16so10208184pfm.11 for ; Mon, 18 Jul 2022 03:38:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=u2BC+GHAX8JRUYAAZlTeZ7IVTxmbt7L2CzzHEH76m5M=; b=mAn9HhkEnQjMGVHhlYrv+J5DAIYob3QK/DXENfILD6IJlZc7rM9xVJEfvyba/SuX7l 08q5uwx1wLHMdDWrZUEavLNQATL3rc+26cgysKfAtD1hTeG+rr8UYJU4CJ7txo+pQ/dk 8d7tCnX36GKdcLU1LURMdjeVtcS0VlFpn4mGFrxChKbafho1R39hHYhCMhGBVvh9Y8YB tSAy/gsJO607bbF+x/fpKu5+kcBbR08EZgLjN7Qb8hfK/VIqjRlLjXTqkNvWoCUxUDRA 0mlqAAbogcSYmtpYH0wR+99lYAZKtn1M1lB5fUReAUxk/ehty2r69syhDZlRIYqDKSgL shCQ== X-Gm-Message-State: AJIora/lcCN3BZdoIbiabA7TEH/vDk4vcryrnwEfiUMQWoDfAoOXSG7h tGCMm/YS4u4T5VRiLmoTR1K5Wz1S5eqfWw== X-Google-Smtp-Source: AGRyM1s+7znCU3q9WXKJAAWILt5hb+cCJn/hC6b3TSVmsyomhv7ErNqkMYCp8D6/m3vCfOa2tMKg9g== X-Received: by 2002:a05:6a00:1386:b0:52a:d5f9:2837 with SMTP id t6-20020a056a00138600b0052ad5f92837mr28094158pfg.5.1658140696403; Mon, 18 Jul 2022 03:38:16 -0700 (PDT) Received: from noah-tgl.. ([210.177.125.238]) by smtp.gmail.com with ESMTPSA id n10-20020a170902e54a00b00161e50e2245sm9151442plf.178.2022.07.18.03.38.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 03:38:15 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1] x86: Fix type of `Slow_SSE4_2` def in isa-level.h Date: Mon, 18 Jul 2022 03:38:10 -0700 Message-Id: <20220718103811.1842054-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_ABUSEAT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Change from `Slow_SSE42_X86_ISA_LEVEL` to `Slow_SSE4_2_X86_ISA_LEVEL`. Currently the def is unused to no need to change anything else. --- sysdeps/x86/isa-level.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index 3c4480aba7..fe56af7e2b 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -104,7 +104,7 @@ /* NB: This feature is disable when ISA level >= 3. All CPUs with this feature don't run on glibc built with ISA level >= 3. */ -#define Slow_SSE42_X86_ISA_LEVEL 3 +#define Slow_SSE4_2_X86_ISA_LEVEL 3 /* Feature(s) enabled when ISA level >= 2. */ #define Fast_Unaligned_Load_X86_ISA_LEVEL 2 From patchwork Mon Jul 18 13:25:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 56125 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7B9FA3851A9A for ; Mon, 18 Jul 2022 13:25:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7B9FA3851A9A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1658150759; bh=TXEGxnewJCQy+SCIgsiRbIGpgGJhu7Y60NkNiAL3pi8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=QkpzCXI2BK7fE5maE9y8pst7C91IGlIDgRbtdHauGwMQyz2RJC3sFYRFrw1Uy4vpQ Kqts1MIbBwiRY+Z+7qTUintw0AivKoWopR5iioFaD1zTeQYDxgLyKiJHBQtuI6Qcqp oodig75aHBArIxnAITHAj7PPtrB4XCksC4XBgiCo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id C10343852751 for ; Mon, 18 Jul 2022 13:25:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C10343852751 Received: by mail-pl1-x632.google.com with SMTP id v21so8983429plo.0 for ; Mon, 18 Jul 2022 06:25:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TXEGxnewJCQy+SCIgsiRbIGpgGJhu7Y60NkNiAL3pi8=; b=JeOnTyLpjjmxf2qIOB6mZRm6RwQebdTckEQKoFTclPcoQOZreExihCS0i1CslO/v1e v6hbh9abuTtZmGqvqnRkjVgL4aP31Q3IDQ3+1daU4ciyROnKWFzjE+cO399cQGnZpQF7 2ekkDOtj0l3DdW3d/gYiroPkXYXYkatC1RswGo83QZCxIG5b92Qm41sZ2pcSS60mDwpc pzaRVq72aLlX5X+xPebwm/DqFCpfjQkLR0xKetubQ5qDrrHV7EMzgQb0okHqCDgFnjcC 74Lq8mRbmTeY43W9JDdaa5NB0jgOA9a0IhlIiU6bbMjT4cJiVm2riDgAR52umxdzm0Ep zuVg== X-Gm-Message-State: AJIora9Y8RepIzxDyShg5fkAfLdfkfqUaEbJSmgp0oXekChh80MClZEh 7F0XtkqJXSAcYvLhaX1eaPD9S8CcU4fCsw== X-Google-Smtp-Source: AGRyM1sA3Rkp9SkHzri70C7H6h+f0YO67zFRfnITvOnniqT5nE2ZAzKH35EgfuhqnGYtOwNW+R4bqQ== X-Received: by 2002:a17:90a:7889:b0:1f1:82ea:fb31 with SMTP id x9-20020a17090a788900b001f182eafb31mr14876479pjk.153.1658150736460; Mon, 18 Jul 2022 06:25:36 -0700 (PDT) Received: from noah-tgl.. ([210.177.125.238]) by smtp.gmail.com with ESMTPSA id p8-20020a170902e74800b0016c1e006b63sm9525214plf.64.2022.07.18.06.25.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 06:25:36 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 2/2] x86: Continue building memmove-ssse3.S as ISA level V3 Date: Mon, 18 Jul 2022 06:25:30 -0700 Message-Id: <20220718132530.1847929-2-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220718132530.1847929-1-goldstein.w.n@gmail.com> References: <20220718103811.1842054-1-goldstein.w.n@gmail.com> <20220718132530.1847929-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_ABUSEAT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Some V3 processors still strongly prefer memmove-ssse3.S because it is heavily optimized to avoid unaligned memory accesses. Tested builds for x86-64 v1, v2, v3, and v4 with and without multiarch. --- sysdeps/x86/isa-level.h | 15 +++++++++++ sysdeps/x86_64/multiarch/ifunc-impl-list.c | 30 +++++++++++++--------- sysdeps/x86_64/multiarch/ifunc-memmove.h | 14 +++++----- sysdeps/x86_64/multiarch/memmove-ssse3.S | 4 ++- 4 files changed, 44 insertions(+), 19 deletions(-) diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index fe56af7e2b..f49336acf3 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -90,6 +90,14 @@ /* For X86_ISA_CPU_FEATURES_ARCH_P. */ + +/* NB: This is just an alias to `AVX_Fast_Unaligned_Load` that will + continue doing runtime check up to ISA level >= 4. This is for + some Zhaoxin CPUs which build at ISA level V3 but still have a + strong preference for avoiding unaligned `ymm` loads. */ +#define V4_AVX_Fast_Unaligned_Load_X86_ISA_LEVEL 4 +#define V4_AVX_Fast_Unaligned_Load AVX_Fast_Unaligned_Load + /* NB: This feature is enabled when ISA level >= 3, which was disabled for the following CPUs: - AMD Excavator @@ -106,6 +114,13 @@ this feature don't run on glibc built with ISA level >= 3. */ #define Slow_SSE4_2_X86_ISA_LEVEL 3 +/* NB: This is just an alias to `Fast_Unaligned_Copy` that will + continue doing runtime check up to ISA level >= 3. This is for + some Zhaoxin CPUs which build at ISA level V3 but still have a + strong preference for avoiding unaligned `ymm` loads. */ +#define V3_Fast_Unaligned_Copy_X86_ISA_LEVEL 3 +#define V3_Fast_Unaligned_Copy Fast_Unaligned_Copy + /* Feature(s) enabled when ISA level >= 2. */ #define Fast_Unaligned_Load_X86_ISA_LEVEL 2 diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index a71444eccb..427f127427 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -143,8 +143,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __memmove_chk_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, __memmove_chk, CPU_FEATURE_USABLE (SSSE3), __memmove_chk_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -190,8 +191,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __memmove_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, memmove, CPU_FEATURE_USABLE (SSSE3), __memmove_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -1004,8 +1006,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __memcpy_chk_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, __memcpy_chk, CPU_FEATURE_USABLE (SSSE3), __memcpy_chk_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -1051,8 +1054,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __memcpy_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, memcpy, CPU_FEATURE_USABLE (SSSE3), __memcpy_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -1098,8 +1102,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __mempcpy_chk_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, __mempcpy_chk, CPU_FEATURE_USABLE (SSSE3), __mempcpy_chk_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -1145,8 +1150,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, (CPU_FEATURE_USABLE (AVX) && CPU_FEATURE_USABLE (RTM)), __mempcpy_avx_unaligned_erms_rtm) - /* By V3 we assume fast aligned copy. */ - X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, + /* Some V3 implementations still heavily prefer aligned + loads so keep SSSE3 implementation around. */ + X86_IFUNC_IMPL_ADD_V3 (array, i, mempcpy, CPU_FEATURE_USABLE (SSSE3), __mempcpy_ssse3) /* ISA V2 wrapper for SSE2 implementation because the SSE2 diff --git a/sysdeps/x86_64/multiarch/ifunc-memmove.h b/sysdeps/x86_64/multiarch/ifunc-memmove.h index 1643d32887..be0c758783 100644 --- a/sysdeps/x86_64/multiarch/ifunc-memmove.h +++ b/sysdeps/x86_64/multiarch/ifunc-memmove.h @@ -72,7 +72,7 @@ IFUNC_SELECTOR (void) } if (X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, - AVX_Fast_Unaligned_Load, )) + V4_AVX_Fast_Unaligned_Load, )) { if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)) { @@ -101,11 +101,13 @@ IFUNC_SELECTOR (void) } if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, SSSE3) - /* Leave this as runtime check. The SSSE3 is optimized almost - exclusively for avoiding unaligned memory access during the - copy and by and large is not better than the sse2 - implementation as a general purpose memmove. */ - && !CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Copy)) + /* Leave this as runtime check for V2. By V3 assume it must be + set. The SSSE3 is optimized almost exclusively for avoiding + unaligned memory access during the copy and by and large is + not better than the sse2 implementation as a general purpose + memmove. */ + && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, + V3_Fast_Unaligned_Copy, !)) { return OPTIMIZE (ssse3); } diff --git a/sysdeps/x86_64/multiarch/memmove-ssse3.S b/sysdeps/x86_64/multiarch/memmove-ssse3.S index 57599752c7..15cafee766 100644 --- a/sysdeps/x86_64/multiarch/memmove-ssse3.S +++ b/sysdeps/x86_64/multiarch/memmove-ssse3.S @@ -20,7 +20,9 @@ #include -#if ISA_SHOULD_BUILD (2) +/* Continue building up to ISA level V3 as some V3 CPUs strongly + prefer this implementation. */ +#if ISA_SHOULD_BUILD (3) # include # ifndef MEMMOVE