From patchwork Mon Aug 9 13:11:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44610 X-Patchwork-Delegate: szabolcs.nagy@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C7474389441B for ; Mon, 9 Aug 2021 13:12:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C7474389441B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628514757; bh=Gvo46rkk8tVWp+7BUZdA35R11uGsgDMYGkYaIcOzdQI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=xaGNNXt6w4vc/OflDRj4ezh9nbMHuV/9I7QavCy1aHuw+1Sdb2KqEvEfomP/x/w+M PZPv+YVb3J+pKwvng9zGlb7icEEBUFvLXpFVDpCwGhJ+oWKvvE374k3duvo/g324Ix ks88j+UVLAugDgL4oDgpwyhe44EENL8JWo53lTy4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10057.outbound.protection.outlook.com [40.107.1.57]) by sourceware.org (Postfix) with ESMTPS id 55747385B83F for ; Mon, 9 Aug 2021 13:12:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55747385B83F Received: from FR0P281CA0047.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:48::18) by DB6PR0802MB2614.eurprd08.prod.outlook.com (2603:10a6:4:96::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16; Mon, 9 Aug 2021 13:12:12 +0000 Received: from VE1EUR03FT031.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:48:cafe::fd) by FR0P281CA0047.outlook.office365.com (2603:10a6:d10:48::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4415.7 via Frontend Transport; Mon, 9 Aug 2021 13:12:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT031.mail.protection.outlook.com (10.152.18.69) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 13:12:11 +0000 Received: ("Tessian outbound efa8a7456a86:v101"); Mon, 09 Aug 2021 13:12:11 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6f36f2c093fd769d X-CR-MTA-TID: 64aa7808 Received: from 53b0c2a2f10c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id EB4C0202-C448-4A00-B43E-CCF903C246B3.1; Mon, 09 Aug 2021 13:11:59 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 53b0c2a2f10c.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 13:11:59 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CrD7DQxKB7vgOtBvqlyR8f450h4MG9nubnmkE97J9AQ8sUuOxlOKIYt4+ResJXzpt1wovyGpwsNkxv/tWTKltVyNEGcfnOpHQ+auV6qBYt4Ft/pZHq+Sek/FAPW71EvNXVp3LOwnM8pgYRdQoJIzm+m2D1n5UPuL8RoJXGD9eaeDPh1IN5KkCy9+EB28xHRRjm+xO7CXvYX5vE8gTWgNoQ/et6bJXF0kLnSjPk/nCcplVMmQ9PYzyspArxU5J9qeiyh84avq0YrrSMN6NUiIkHeSY1qL4b38ghR0didxNSPNBN/DzyTxylaSzFJeKAzEdXsJeqbLK6zfDEtm9kvWzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Gvo46rkk8tVWp+7BUZdA35R11uGsgDMYGkYaIcOzdQI=; b=NdbkwUw3zaZ3u29q/sD8aJlXD7aA9aDJwQ0thZe6VdI1R5YMk+NFr8nZZJsh79zY88k+R+dmMZ54gSshszZcG7uWSJ4qWxTnHPC2U6xblMv7iv4fVx4JJm2FrruBwaiF69dcirlcDWQRLngi8S5HxBJavKaP1/DKoqcXJy4hTLBM+JoDBtj3IZc/svUmDmTLMFob/CgdxgyLztDfENucUS69QoQfDdN9g8OXEfzmkk6mMII9S7cj/TKPRa43T1anyLsdoxom/1kG1cnkCTgmxVlnDkofMutSZcJJBPwVaCf5L7mPSAL7M2nXsbFfY0dNd4i29wnXtX3XPUvZGycsOg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5344.eurprd08.prod.outlook.com (2603:10a6:803:13e::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.21; Mon, 9 Aug 2021 13:11:58 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 13:11:58 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 3/5] AArch64: Improve A64FX memset for remaining bytes Thread-Topic: [PATCH v4 3/5] AArch64: Improve A64FX memset for remaining bytes Thread-Index: AQHXjR/v4WjllKp9lUCoKTibIOGGqQ== Date: Mon, 9 Aug 2021 13:11:57 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 0b4c20e8-f10c-4b2f-a077-08d95b374ccd x-ms-traffictypediagnostic: VI1PR08MB5344:|DB6PR0802MB2614: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4125;OLM:4125; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: /VJcEs0UlbwTXdzLl6jqsp1lIJw8gkjb6fXbQc+wJTVimHJqlnuOAR8AVI/+CvOSCABvPSNpKNEUKJIreD+0HQ5HWMfOSGqHU+Ro2MEURRJbrmjN/PareFNz+QVilJdoRRGdzqMktSdA7apmqHxDyYjyTVFHjMJE2ZHDPQaBdva+RlVbQF80afxaK8KFCzfsrcuxzxb0LiOwU68NllYWseM6+uwLLauNt67AnqYhTfZ7J2EvgRhXcFPGlTx9B+U+l/B9rlZdDtr9W16I0ACkMP7HXDqtcRAkkqFXufhEiTeuo1rsRRCig/fGC9K0SRAZc1jg92GA9bxY9lRmyg+Jg7iwmxzgytKebVDMZu6KF0F+DATQwb/O+DT4aUIgYdaxfojFFAsGeDzypvpVjdBboRskHbpUBJd2m4EIB8H42Hq/BdJwGoCrHCK2Pxd2bpO4yP36pasdfdR3sI5Z2JjChAUVwVou2S+dg38hkJJf3U1IsveRuivekk/buD2EGEM6Xg4UTVwNwdqbvn05iPVLDEgC24jSdEMLZQa0vcStlxVjoaHx/eJqjAi7ETVTc40WMIJ9Ob8wWWdPkiHTJm7OJS3in8eFzDhxR5XRHczRAnMtGhlHk9x/OocEl+dyyl/nJHeQZo0yoVA2GuL/7BdiKT085OQ5sMBBPsIGAb5slC56Uso+n6QNKiDsUOhi1R3tFDokjmXoRHz7gbVo7aPHdgD8ah1oCWATaSYfo1UK2IY= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(376002)(366004)(346002)(396003)(39850400004)(66476007)(66556008)(122000001)(38100700002)(76116006)(66946007)(66446008)(8676002)(7696005)(64756008)(6916009)(91956017)(6506007)(4326008)(2906002)(71200400001)(86362001)(478600001)(55016002)(316002)(9686003)(38070700005)(33656002)(52536014)(26005)(186003)(5660300002)(8936002)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?zOcDnDGB/Ng/bRNGBBHwtWj?= =?iso-8859-1?q?2CsRyVxZhgBU4RTuQvrsJwn/1GCIjdlIDpGIsI9zrwS3clLRlAwjDQBSYyV8?= =?iso-8859-1?q?al4V3tdiqCEYXQtUfBJep/rYUkwd53AMqnYe3cEwHQL1M34UhijKR4Fs3GVd?= =?iso-8859-1?q?wm2FovXwnPHhcW2ynpUTGzKp0eLK0ggunvrNuQcn/y81jEPik/Py78QaY6AN?= =?iso-8859-1?q?bU25AoKZQA/5mqxOGc00FkVi0b682yHwkIXFTwZYeNPdSoKbrGT3NuKrJf2a?= =?iso-8859-1?q?d5IGKulcUb+EuiaA6I02DyQaNlp/Hu/8Qg9w/s8qEsPiZAhuKnrG+Cd5RwcS?= =?iso-8859-1?q?dJzJ1eoPJsdmjmepcSzGj4cYsxidpl8+jseQFFPoCNEW+PQINb9pGHJsY5k5?= =?iso-8859-1?q?d8zL5EjeO1kTqWmRP0qvw4dZZO2v6pD7sbimRoJAMOwvG/uEWedGMF2FnX6m?= =?iso-8859-1?q?BwsvuuyfnXJOg8HgqdTPW64oMkU2XlK5YZ19mF0+5MXuMYLTpmIhZPsadpYS?= =?iso-8859-1?q?qCRVfO7dHrT0wyV3ViJnyY3QbCnDqcaMkuDJbcEQLA5QpjHq1NWKilcppCl4?= =?iso-8859-1?q?weTJIQqARdYux/Lc86OiNzFEtvzZ2LqYaV6i3S7B98U5WoTzQ15yrMUGv9Qz?= =?iso-8859-1?q?DHRdTOMC6I+cKcN5i33PK7hHB/OUk61FIiezLX4g/wVQhxVkJot/FP+70RO8?= =?iso-8859-1?q?jhL782XOCBZusldCwLh1XiWC9CtXd9xrZSni2ysFyE9AMCa5+oqdexjb/16H?= =?iso-8859-1?q?edZp3/LlsrliCF49OwKQGeIRfHzS73pK9rRIbCdNwQ1OldFcHIh2BJ9qJmHf?= =?iso-8859-1?q?1luMbsaEnKCzrJCumYGjE20Cm+XqJw10ur3mUcmP62fGScTQIUbP4S2hz2mU?= =?iso-8859-1?q?c3Iu7Tbff+aePIKED5XByccrtEb5RmeyqNEEgIq0iz7mYWAsTjGWD8oX4r3b?= =?iso-8859-1?q?F2jG7pMsAddO0eU/ByFIhzTH4H9Hp5FF1HjRkT0Jxcl/kZZhyKGj81mNlZxQ?= =?iso-8859-1?q?ROtHPUnoNvNDy9+J6u+8dlZhAlAgbb06E5D+SjCWSj0AHKNwb5L9g4kOUtf2?= =?iso-8859-1?q?e6Tf/kEiDdmGYHLy1UJgki1as2QkDW6sWnftwuaVSqNhio8KZWeFNYgMm/Kp?= =?iso-8859-1?q?PP+h1sca4oXB92X0rra0/cOoH58W+welCuctTO2OsIBwAdP/cQClZ6YUZG7u?= =?iso-8859-1?q?yZg07L1nTsTWs/Y6qsMgmAeouyQpy3xQ1m53tYVtapUObXuWya3qQJb0GCcV?= =?iso-8859-1?q?X4+OEemo4FHQo1wbTl4hN7o/UOOJwaL2MGdxTBzD9N+FfYilJNQ05xP8ftoj?= =?iso-8859-1?q?qjE5VJxTKyeNjf0TaBqomy/j3oAGYSE6qhn1e7OY=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5344 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT031.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: b8429fdc-ddaa-440d-d47a-08d95b374493 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: SzWdCR3wXjJdA6AkjQ+StWbxvIxNxoXIPIsRnyFk3HQ77e/AHbeSTF8OkHDcnuxLhQ0dxoO7HmdFOZVTRjaoKjA4HUkj8ynRGQpGy7I6gOvbcNiPMsGE4Sz4PbD8FNcclfCMxsLqZ7P1DtF2RxKZFw9SXgqm0Dy6cDUZ1/hNmylTQlprs/+Zb425RboCbnSW9IkzRAsfL4rz54xjMJVc3HQokN/yb9jzVkcfCb/ucEccYj+i0FJYvYgY8PRegasWN3AiG2Fsgzt1urBa5tCoYudOZveYqn+eBPPFJJtXkJaQon8h5Qg9qsfS2VnVyYL9CDg3TyWtbSjnaOvFyrxPpiNUB3UKlOgE09FptkUuqYtx9wtQnughIdwNyIfXL7YNuNE2eAcLqjXFItOIo27ymKpH9rkdM+Dgpq8w6XbU0KMd01OCssxP/gfxCAQPgAPeLol9u7qlGUdEufGt3BP2/dxAtz7EgrkrGJQJXRVfBq98Jpv0rxKw0XIHnL1Y4ciNrcZ/iyHNKtMf1bLlBZVIgEB0DTBL7i50UGhyaYGVYhsyusPFmeaDp23/VszmLS+Ltr4jLDUoFBnDe00p8k+o7O9N4NgbyFUB/xAlek8eu60j+d1NNPjXbaO5e5leaLdKRrsTocmPuqzgeOA0nGpSXirz1KIYiCoWwVuK3h1UM8bC3HmMCRhdHEsvTF1DmTAIz8eI+yjXddrASxq/RRwXZL0ZhXZdLXBxO0tija9axJw= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39850400004)(396003)(136003)(346002)(376002)(46966006)(36840700001)(33656002)(6862004)(9686003)(336012)(47076005)(55016002)(86362001)(81166007)(8936002)(70586007)(4326008)(82740400003)(356005)(316002)(36860700001)(52536014)(82310400003)(8676002)(70206006)(26005)(186003)(7696005)(6506007)(478600001)(5660300002)(2906002)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 13:12:11.8299 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0b4c20e8-f10c-4b2f-a077-08d95b374ccd X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT031.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0802MB2614 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" v4: no changes Simplify handling of remaining bytes. Avoid lots of taken branches and complex whilelo computations, instead unconditionally write vectors from the end. Reviewed-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 6bc8ef5e0c84dbb59a57d114ae6ec8e3fa3822ad..55f28b644defdffb140c88da0635ef099235546c 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -130,38 +130,19 @@ L(unroll8): b 1b L(last): - whilelo p0.b, xzr, rest - whilelo p1.b, vector_length, rest - b.last 1f - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - ret -1: lsl tmp1, vector_length, 1 // vector_length * 2 - whilelo p2.b, tmp1, rest - incb tmp1 - whilelo p3.b, tmp1, rest - b.last 1f - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - st1b z0.b, p2, [dst, #2, mul vl] - st1b z0.b, p3, [dst, #3, mul vl] - ret -1: lsl tmp1, vector_length, 2 // vector_length * 4 - whilelo p4.b, tmp1, rest - incb tmp1 - whilelo p5.b, tmp1, rest - incb tmp1 - whilelo p6.b, tmp1, rest - incb tmp1 - whilelo p7.b, tmp1, rest - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - st1b z0.b, p2, [dst, #2, mul vl] - st1b z0.b, p3, [dst, #3, mul vl] - st1b z0.b, p4, [dst, #4, mul vl] - st1b z0.b, p5, [dst, #5, mul vl] - st1b z0.b, p6, [dst, #6, mul vl] - st1b z0.b, p7, [dst, #7, mul vl] + cmp count, vector_length, lsl 1 + b.ls 2f + add tmp2, vector_length, vector_length, lsl 2 + cmp count, tmp2 + b.ls 5f + st1b z0.b, p0, [dstend, -8, mul vl] + st1b z0.b, p0, [dstend, -7, mul vl] + st1b z0.b, p0, [dstend, -6, mul vl] +5: st1b z0.b, p0, [dstend, -5, mul vl] + st1b z0.b, p0, [dstend, -4, mul vl] + st1b z0.b, p0, [dstend, -3, mul vl] +2: st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] ret L(L1_prefetch): // if rest >= L1_SIZE @@ -199,7 +180,6 @@ L(L2): subs count, count, CACHE_LINE_SIZE b.hi 1b add count, count, CACHE_LINE_SIZE - add dst, dst, CACHE_LINE_SIZE b L(last) END (MEMSET)