From patchwork Mon Mar 15 14:25:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 42549 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1490C3851C13; Mon, 15 Mar 2021 14:25:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1490C3851C13 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1615818330; bh=5olMw7O8C13+HtskK/iopzsMNEiVNwpP3y6ad9Qd5TU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=qaVXhdo36FwbenxQX6yTe/+ds3jgIZpiF+TA/KlwwVE0HAICOdD0SExcqfMmGr7Ny UaeW7RycNC4Lg46kKslW8mCIgqifMj/KRd9zs9a7xW5axBdmDY0wbBAAQfLSpN0ix5 QqFxMEX6tqs3Y1rxUaWCkyuZwFhqEzJnJ540S3sA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by sourceware.org (Postfix) with ESMTPS id 5552A385DC14 for ; Mon, 15 Mar 2021 14:25:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5552A385DC14 Received: by mail-pj1-x1032.google.com with SMTP id ha17so8131796pjb.2 for ; Mon, 15 Mar 2021 07:25:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5olMw7O8C13+HtskK/iopzsMNEiVNwpP3y6ad9Qd5TU=; b=I750Lgj5c8BV6AHgn1Om7BrK5zn0W3T0nC2F8pgYQ21X0NBrtyQqYg96tz9aHJ+IvO 4wKHhCZ5W5LZrP9yWry6dxyGJW2T7IQYsSJJd3GuoQeVczp1NoA4zLoV4tb6qhKKj3Ju hpNiDy3RxRJvyoBiiEcjk4u5qA82hxuUGjog7qqzbKsjFnpn4tWA87EDaB/OVOt9uWMN h4X79knbCX33+ZGnH2UrHsbUndq8LgXoaQheNLlF0VcyNsdHp9bWgjiWvhjEhoOQtMpU zcp/5wHf16B2AkTNfDYZtpgH4iBD7H26wLyvW1XDr9owRPcEcaOUHPlZ19ouFDJtxjuq tvLA== X-Gm-Message-State: AOAM531JQ71tlDY12PNOkJQrb1itjsrGcdJg/F+xCOXlLLMPNm/X8fYg aDiKCR/tZ/g5kBbMuzZ18ZwuYEWjlug= X-Google-Smtp-Source: ABdhPJymxBOmeI/VuVN+wyDtrvvMHJu9grvDaoPXmU8zr4h9AZssbNJ+VAMYWo2+0dnIE65vYhM46Q== X-Received: by 2002:a17:90b:2284:: with SMTP id kx4mr12796226pjb.96.1615818325205; Mon, 15 Mar 2021 07:25:25 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.56.38.48]) by smtp.gmail.com with ESMTPSA id z22sm13694313pfa.41.2021.03.15.07.25.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Mar 2021 07:25:24 -0700 (PDT) Received: from gnu-cfl-2.?040none?041 (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 3D5CB1A0B78 for ; Mon, 15 Mar 2021 07:25:21 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 09/10] x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions Date: Mon, 15 Mar 2021 07:25:19 -0700 Message-Id: <20210315142520.1661407-10-hjl.tools@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210315142520.1661407-1-hjl.tools@gmail.com> References: <20210315142520.1661407-1-hjl.tools@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-3034.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. --- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 14 +++++++++----- sysdeps/x86_64/multiarch/ifunc-memset.h | 13 ++++++++----- sysdeps/x86_64/multiarch/ifunc-wmemset.h | 12 ++++++------ .../multiarch/memset-avx512-unaligned-erms.S | 16 ++++++++-------- 4 files changed, 31 insertions(+), 24 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index 024913065b..37f17075fa 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -211,10 +211,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (AVX512BW)), __memset_chk_evex_unaligned_erms) IFUNC_IMPL_ADD (array, i, __memset_chk, - CPU_FEATURE_USABLE (AVX512F), + (CPU_FEATURE_USABLE (AVX512VL) + && CPU_FEATURE_USABLE (AVX512BW)), __memset_chk_avx512_unaligned_erms) IFUNC_IMPL_ADD (array, i, __memset_chk, - CPU_FEATURE_USABLE (AVX512F), + (CPU_FEATURE_USABLE (AVX512VL) + && CPU_FEATURE_USABLE (AVX512BW)), __memset_chk_avx512_unaligned) IFUNC_IMPL_ADD (array, i, __memset_chk, CPU_FEATURE_USABLE (AVX512F), @@ -252,10 +254,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (AVX512BW)), __memset_evex_unaligned_erms) IFUNC_IMPL_ADD (array, i, memset, - CPU_FEATURE_USABLE (AVX512F), + (CPU_FEATURE_USABLE (AVX512VL) + && CPU_FEATURE_USABLE (AVX512BW)), __memset_avx512_unaligned_erms) IFUNC_IMPL_ADD (array, i, memset, - CPU_FEATURE_USABLE (AVX512F), + (CPU_FEATURE_USABLE (AVX512VL) + && CPU_FEATURE_USABLE (AVX512BW)), __memset_avx512_unaligned) IFUNC_IMPL_ADD (array, i, memset, CPU_FEATURE_USABLE (AVX512F), @@ -719,7 +723,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, CPU_FEATURE_USABLE (AVX512VL), __wmemset_evex_unaligned) IFUNC_IMPL_ADD (array, i, wmemset, - CPU_FEATURE_USABLE (AVX512F), + CPU_FEATURE_USABLE (AVX512VL), __wmemset_avx512_unaligned)) #ifdef SHARED diff --git a/sysdeps/x86_64/multiarch/ifunc-memset.h b/sysdeps/x86_64/multiarch/ifunc-memset.h index 43655fb684..502f946a84 100644 --- a/sysdeps/x86_64/multiarch/ifunc-memset.h +++ b/sysdeps/x86_64/multiarch/ifunc-memset.h @@ -53,13 +53,16 @@ IFUNC_SELECTOR (void) if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F) && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_AVX512)) { - if (CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) - return OPTIMIZE (avx512_no_vzeroupper); + if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) + && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) + { + if (CPU_FEATURE_USABLE_P (cpu_features, ERMS)) + return OPTIMIZE (avx512_unaligned_erms); - if (CPU_FEATURE_USABLE_P (cpu_features, ERMS)) - return OPTIMIZE (avx512_unaligned_erms); + return OPTIMIZE (avx512_unaligned); + } - return OPTIMIZE (avx512_unaligned); + return OPTIMIZE (avx512_no_vzeroupper); } if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)) diff --git a/sysdeps/x86_64/multiarch/ifunc-wmemset.h b/sysdeps/x86_64/multiarch/ifunc-wmemset.h index 8d952eff99..756f0ccdbf 100644 --- a/sysdeps/x86_64/multiarch/ifunc-wmemset.h +++ b/sysdeps/x86_64/multiarch/ifunc-wmemset.h @@ -33,13 +33,13 @@ IFUNC_SELECTOR (void) if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) { - if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F) - && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_AVX512) - && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) - return OPTIMIZE (avx512_unaligned); - if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)) - return OPTIMIZE (evex_unaligned); + { + if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_AVX512)) + return OPTIMIZE (avx512_unaligned); + + return OPTIMIZE (evex_unaligned); + } if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) return OPTIMIZE (avx2_unaligned_rtm); diff --git a/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S index 0783979ca5..22e7b187c8 100644 --- a/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S @@ -1,22 +1,22 @@ #if IS_IN (libc) # define VEC_SIZE 64 -# define VEC(i) zmm##i +# define XMM0 xmm16 +# define YMM0 ymm16 +# define VEC0 zmm16 +# define VEC(i) VEC##i # define VMOVU vmovdqu64 # define VMOVA vmovdqa64 +# define VZEROUPPER # define MEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \ - vmovd d, %xmm0; \ movq r, %rax; \ - vpbroadcastb %xmm0, %xmm0; \ - vpbroadcastq %xmm0, %zmm0 + vpbroadcastb d, %VEC0 # define WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \ - vmovd d, %xmm0; \ movq r, %rax; \ - vpbroadcastd %xmm0, %xmm0; \ - vpbroadcastq %xmm0, %zmm0 + vpbroadcastd d, %VEC0 -# define SECTION(p) p##.avx512 +# define SECTION(p) p##.evex512 # define MEMSET_SYMBOL(p,s) p##_avx512_##s # define WMEMSET_SYMBOL(p,s) p##_avx512_##s