From patchwork Fri Aug 27 05:03:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Tamura X-Patchwork-Id: 44798 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C55D43857426 for ; Fri, 27 Aug 2021 05:04:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C55D43857426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630040640; bh=xn40bdLirOTsKASHE0cec+/9bNT5VBRXlrQ8VQM3QJE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=x6yPyBy8XXeiHD6geJCX60CoTP+/qrDh4MkVGg5EMlShQ4m87yqfn4aEV+gzktHqq vVRYOy8OPm46ud068B2umRz7gvfr49i5EWP6YWNR9nEhUUDsBnpCdD0bOEI5L1VjaB vgpBAqWFrmRBoRd6HJ7E0qttIsP0ErrUnoM9Gr7M= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from esa2.hc1455-7.c3s2.iphmx.com (esa2.hc1455-7.c3s2.iphmx.com [207.54.90.48]) by sourceware.org (Postfix) with ESMTPS id 985A93857429 for ; Fri, 27 Aug 2021 05:03:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 985A93857429 IronPort-SDR: bLaWIHrjAwjEX5RAMBKX2iwNArohMu5frMVd4UyE3bVYBoI3m7v3aHEnHwZCp2fLxTnr/fRG5f SG49XQZa5OhmSnpw+kWjWAiilC39HuGGPOiEL5uTF9E1EqaOTPEDN05C9h3XFepkKFKRFz+a7m E3cy2Wx7T67heAf1bgx/bW4JTQfaSdymY5htCdGvDkuDNS/nlZGA1l9skihr7IDvPxFjwbIILZ +M2+BIaC8EjnfwkG3mQDSQWTbkXlG/Q0dceUg9I/o7hvQm4oar6vp+/xUA0r163J1qKHEW2hHQ saXgleXshkJxfSrcB25rN6gm X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="41936256" X-IronPort-AV: E=Sophos;i="5.84,355,1620658800"; d="scan'208";a="41936256" Received: from unknown (HELO yto-r3.gw.nic.fujitsu.com) ([218.44.52.219]) by esa2.hc1455-7.c3s2.iphmx.com with ESMTP; 27 Aug 2021 14:03:29 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 8C3BF6DB23 for ; Fri, 27 Aug 2021 14:03:27 +0900 (JST) Received: from m3050.s.css.fujitsu.com (msm.b.css.fujitsu.com [10.134.21.208]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id E301FEA0EC for ; Fri, 27 Aug 2021 14:03:26 +0900 (JST) Received: from bionic.lxd (unknown [10.126.53.116]) by m3050.s.css.fujitsu.com (Postfix) with ESMTP id CCFDAAB; Fri, 27 Aug 2021 14:03:26 +0900 (JST) To: Wilco Dijkstra , libc-alpha@sourceware.org Subject: [PATCH] AArch64: Update A64FX memset not to degrade at 16KB Date: Fri, 27 Aug 2021 05:03:04 +0000 Message-Id: <20210827050304.543471-1-naohirot@fujitsu.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Naohiro Tamura via Libc-alpha From: Naohiro Tamura Reply-To: Naohiro Tamura Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch updates unroll8 code so as not to degrade at the peak performance 16KB for both FX1000 and FX700. Inserted 2 instructions at the beginning of the unroll8 loop, cmp and branch, are a workaround that is found heuristically. Reviewed-by: Wilco Dijkstra --- sysdeps/aarch64/multiarch/memset_a64fx.S | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 7bf759b6a753..f7dfdaace7cf 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -96,7 +96,14 @@ L(vl_agnostic): // VL Agnostic L(unroll8): sub count, count, tmp1 .p2align 4 -1: st1b_unroll 0, 7 + // The 2 instructions at the beginning of the following loop, + // cmp and branch, are a workaround so as not to degrade at + // the peak performance 16KB. + // It is found heuristically and the branch condition, b.ne, + // is chosen intentionally never to jump. +1: cmp xzr, xzr + b.ne 1b + st1b_unroll 0, 7 add dst, dst, tmp1 subs count, count, tmp1 b.hi 1b