From patchwork Thu Jul 22 16:04:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44458 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 574C1393AC0E for ; Thu, 22 Jul 2021 16:06:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 574C1393AC0E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626969990; bh=qjKhxTYhq1TbVwTi+V76Nxn19An/EfBG4Gnx9eYM0Gg=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=sbHR8RDf6V8YbMpyosk1Pr1nLuvtRZwPWKdOTcNlznF2mkdelFTp5m/3Grw9bRjy+ N7jXFQuDl9tsl0/Ew/Y6FtfPu9Hz/CGmu5UDlnqPscv6ysSZyix/OOG5TX2+P5rs31 BzWVTr+seppNHssQ+6liMwCMujpiIdA49k8sHETw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2083.outbound.protection.outlook.com [40.107.21.83]) by sourceware.org (Postfix) with ESMTPS id A19E23889827 for ; Thu, 22 Jul 2021 16:05:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A19E23889827 Received: from DB6P192CA0008.EURP192.PROD.OUTLOOK.COM (2603:10a6:4:b8::18) by AM6PR08MB3384.eurprd08.prod.outlook.com (2603:10a6:20b:4a::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.28; Thu, 22 Jul 2021 16:05:30 +0000 Received: from DB5EUR03FT057.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:b8:cafe::7b) by DB6P192CA0008.outlook.office365.com (2603:10a6:4:b8::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25 via Frontend Transport; Thu, 22 Jul 2021 16:05:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT057.mail.protection.outlook.com (10.152.20.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 16:05:30 +0000 Received: ("Tessian outbound b81a99a0393d:v99"); Thu, 22 Jul 2021 16:05:30 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 467b0e1e8536e8bd X-CR-MTA-TID: 64aa7808 Received: from a2f4ed4e03b4.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id BB302076-5B80-42BC-9CA4-85F6F99810AF.1; Thu, 22 Jul 2021 16:04:35 +0000 Received: from EUR03-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id a2f4ed4e03b4.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 16:04:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g/fXntQ+1eeDzOnuaVK7ovDldgpsz/GKmYXOejMLd50nmVQGXMKR3/CZLK0/Vj9wTMquTfCoaJz1sdaw5is28iEJiTKXYwHInOflp8KcrspKm+14mexQSLVfKCIq5VpjreTTFNdkSwgH48vBAvLUTYlg08v0iVoWjXW/zBtJ5g2oXjjnSBQvIBBORqFkQwHAIJsEZQlMhZk6EtH0IqHNtZXjBQZS4NnMX9+u4FlCGgMOQ/HIJOlVLroeY0UhYaXonPtcW/IjXciYsaxoCGmNDColY/RvjSoFpeoQyh6/KnSuAj0RtY5a/VahbiElgaL41mAzksCNESZTvSRx59MKZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qjKhxTYhq1TbVwTi+V76Nxn19An/EfBG4Gnx9eYM0Gg=; b=kfq6D+eG5mTdNCFGlNgL9f/+RjybwWVIcZFBQgUbG1ZRgEul4dGW2Oq7YewjugFltqUEEflC3+n7HOaIWNnvaeAGo/tL5kPt+A3XkUt9IJvu1PM3mXNpyGpG88ScMClAm2aQteZJ9cybWVdyqUQiYDAPsyicuPTnpFWsYUi/g4Wv7dRHSYP1oA18ipPd+MA18aeGTvyNgUxD+Cp1dEhdM4LrNdnqdw9t5i88vUZf0h5sQE+108atAPsYslNUnbkQt1X1UZcQPFKe8RC4M6MRWcl6LhTsBc+G+ZIaprfGZAUD1wYUbIW7jWxNGX8TDZiBI2ZG1hMUEsh4zPVyzlQgnw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB3664.eurprd08.prod.outlook.com (2603:10a6:803:81::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25; Thu, 22 Jul 2021 16:04:34 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 16:04:34 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxMmDOwcjJYmP0eOZ48wdBomwQ== Date: Thu, 22 Jul 2021 16:04:34 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: b7b3246b-c183-47e0-b6df-08d94d2a8791 x-ms-traffictypediagnostic: VI1PR08MB3664:|AM6PR08MB3384: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4714;OLM:4714; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 9/7NzTGmxWB2XqPvtpBoGhY2YScXDpiaRzcM6QAPnGl4ADQukp+l+Ak7FK8PQPCSaIRSFW2NG2fC1AdPJmPi/Y9xytbsviTJjY5ts3vwyhbOOXzRxOvNOFGtQhCpHtolkbcwkE4Q25RNDEDKG2v9ceX+fdpDnXmK3ijlgBkGmjo7S2dCkqh5+dCjMk6mQl3kxy1o3mLyDEyrtELV+PO+gB8W4IbJeU5FlyrmMWWDCYtzK3g0dq/JdqCq4BmtidRThkdeb9PeFrh7LJUuUZrNTWDEqZCH8kJuS/a6kjnSLFuQyxq3iowAr8U8h4LH9heQYayw1cnv4/ZqCGjLSzUZw5dYXnPKWj93JvBwbmJdVAlu2aytvo5i9R0o4ja9TSP1AOtfy5qwDaaB68eoGZlGiAfdob+nX0mEwt+qO40Jr2kZxYOlZpWgYYHmFAWxEGxfoT/iK7wYi3/REvvSaSXA8AHZ/X9mt7Sl8nIYHqw+w7w/jm9Asv5AFaxOu8SnpxYN5I9rvxiyZGrlFz1jO6WHwTBLM2kbhcms8i27Zja9lIKkU5Ugg89k+rVc70JkMTwM3cIfHQzCe9WiXWcPOcdw/qKKknF5PfkWYcybZ8QebRHDmpTvT+CguqQZ7KQjCy2AYpr/o1k8eq/KiosPDevwxFw4dYIpWTLEh8/cDSO6zWAb4zn2YqRNQhGYANj4AOSc7m7Y7kxf6nmODNM0mxqJ7kEGXKu9oIrixI4O/0/AmBfKbHGi1w12lRKMkjVDLfPjMK3qZw8pHPNgs4P5P0KCnQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39850400004)(376002)(136003)(346002)(366004)(396003)(55016002)(186003)(26005)(71200400001)(9686003)(478600001)(66476007)(66946007)(91956017)(64756008)(66556008)(5660300002)(66446008)(8936002)(76116006)(86362001)(33656002)(2906002)(52536014)(4326008)(6916009)(8676002)(6506007)(7696005)(316002)(38100700002)(122000001)(38070700004)(473944003)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?CkSOxnNxA1IHGExEzpcV7Rl?= =?iso-8859-1?q?vyzv+InUx3TywfEYKqgGuehlMDeNG3tMlN3DJKRsoDnmdzWikcmr8Qbp8FQe?= =?iso-8859-1?q?qYsp2JA0g25sgbxhHloMwiTnkecGr+lkRLdQMHOrvK4efyFxT3KPGHjALAmX?= =?iso-8859-1?q?EgWcviTjy5jYNM0Bwsbr5U+NTLJ35BjoYfJZurxNfOGSauQEXg8Y1/u0dCbk?= =?iso-8859-1?q?g5uc/p0AChWlBMEdXEKeUDRcXkvZ3k2jlujeqDK9nBLqeV0ys4WqziwuzEwF?= =?iso-8859-1?q?IY/VDK7tfjHFexAm1r+Kr3AiinnJj84sQc/hNxt0T8k+4PJ5h6FdvwbT0Qg7?= =?iso-8859-1?q?ICyERAKwy9Wb5ikG2QO40pNZ7JYgienE4WUFN6sjwVCzBF45xvz+qAyWFgLq?= =?iso-8859-1?q?UVsgPpfxoSnwZTWB7Fe+gDKtJLbW8xYACBoZWW6PyfFhYImKMgLJ99G8Tfyb?= =?iso-8859-1?q?LriNFOJiXqKR0X65b1KPacZonu+nUanMjAVBc2sUE5KCfjjcjNPHSUWjDvr+?= =?iso-8859-1?q?AsomOGaq3Q4QZwBaGGsPbVagcSHAzfkJtpwQPhviqBae+9B/VWiznU4XPGbR?= =?iso-8859-1?q?2ib2uwNTBmeY25kWpVX39UdLkBNeKOV2XEpZw81ga9a2wW6GzruOrm8lg+Zm?= =?iso-8859-1?q?p/WBYCtSUHKCCy4BHOts9b3j5W0z02RziIFXtfAYlwCvs91dXQIyLPJOEdA/?= =?iso-8859-1?q?lsuc7eXgYAYknSkyRaN19H5wfDfxikQNLsp6aMe1/6sFhTnyvGxFX560ETmV?= =?iso-8859-1?q?X1GGffhhLFvOyZwmqMQyWcrhqk4TOc5goKV9JeB15Fb+Q5lNLcp4dcd8srOk?= =?iso-8859-1?q?ZjzB5C0ni1WkMvdBLqOrPPq/kQfErYl4s4vgQ5rOK0kKHOf3PLI7I5NJtpO9?= =?iso-8859-1?q?RpCXtCO9XNeYv1twJeRvVlkAUUtQv01Ajb/Bg2D0mLIoy5mbp0Pj+uAZaKW9?= =?iso-8859-1?q?/6vr/BtsSvM7ZpsEyGVyxqV9al6QbwW4R/EZSZQu3IwXXV8zy6Wh7cJegLME?= =?iso-8859-1?q?523+KTVG3otuuRZvWNu7ctQWS8ShuDEgLNaj/W6eTcFsGObS2V6ZGhyoAB5i?= =?iso-8859-1?q?pxsR75cmb4ATvnjg67smu4r9DMIdXAWdnce7snIxb3Xr5EjBfaTx6ev/00hP?= =?iso-8859-1?q?ylYJru8BLebHDVgZMGmHdjSz8upVhYMeqlbODNs+yFUof5kR79PKOEa2UdI+?= =?iso-8859-1?q?wkJ5jU+wd3oM1OV/JS9Qkkm7uvUPZ59lXKtGcQyYPu26twhuAT55dwbFYeA0?= =?iso-8859-1?q?TSwlB7iU0c/xPrGcZukecXD8aOS7J/lp+fVk/AGwjaI8L/Uocjg9IPU3Tsgr?= =?iso-8859-1?q?vnMXX03aw9HEHC01c80aPuibZIQWutHBasAc/t5o=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3664 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: b4e22b34-f941-4b7d-ff30-08d94d2a65e7 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9mAQe1SGMcya+LpYc+6OBjpaDhIp+wuuVdRk47IlaLoZOXFGUQ2IWlx+wBSr2Zyucc83rcjP8m4nG+6zf0i2eKaK3VE2izbjtc4eb2M1DG3e7t8f2dqjuHctNGa0gf5WBjKXLP6A2vj/QEtZhzjRCcwvGa7pnoo/SjxQRFfFpcw/JZxXgCmixAMO72cpW3tB1VaG3dOPJuAF4YoNxMLMmsU8iqwBe00lbSrjF4bet2BXKpMryiVdlqjJcffBMVPvib0ZWr+LeeX4LoIxcmA6a5pWVG0ai0jdOydYZ9jma7wPrDdpy+eMck4hTy3vcsn0MwEQKfnDvbFTPuLFYqYXfGE9Q8kMjblgsv7DVAPgKTtnlBwBsiCS5sDA/Y9jum0zsO2shngSyYqsFhpfWMMnI/TzaJ4xA5qRgOQ8XFx724RN5QpTQH9PhJ9Yax0rXMM3zlQGKSDz4iKlgkKIq64XGAJdubSXv5czX1354kAaDns5kJ1VFjNplqxpIU74MaeRDsenZHULRPOsPQcRPomSASnxu9KO1ISv2DDaq3oQIIQSm6A8/abYRC/OXWyW3QCc9rWmgMVXOpWJnwf7Ehnw4LuKM5ZYua8jPYU7bYMdnyPo/tZNEMwSi/r7rHmc9NnlsdUSDHai+eLZbBt81j5n6Y/DcuH/aX030rvjIkNK4sGF8aWj3VtVIb9l6VOnvs31vdCJyO2nkKNr0UZePYZt4zt0HMN/98Y8dQk7vOPma9IQmB/mN60gRUD5sPTv/YqP X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(346002)(136003)(396003)(39850400004)(36840700001)(46966006)(2906002)(7696005)(82740400003)(55016002)(33656002)(6862004)(70586007)(26005)(186003)(9686003)(8936002)(4326008)(86362001)(356005)(8676002)(36860700001)(6506007)(52536014)(81166007)(70206006)(478600001)(5660300002)(47076005)(316002)(336012)(82310400003)(473944003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 16:05:30.8004 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b7b3246b-c183-47e0-b6df-08d94d2a8791 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3384 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Simplify the code for memsets smaller than L1. Improve the unroll8 and L1_prefetch loops. Reviewed-by: Naohiro Tamura Tested-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 8665c272431b46dadea53c63ab74829c3aa99312..36628e101db33a9a8ff5234b98dd5a3a5c9ed73c 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -30,7 +30,6 @@ #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB #define CACHE_LINE_SIZE 256 #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1 -#define rest x2 #define vector_length x9 #if HAVE_AARCH64_SVE_ASM @@ -89,29 +88,19 @@ ENTRY (MEMSET) .p2align 4 L(vl_agnostic): // VL Agnostic - mov rest, count mov dst, dstin - add dstend, dstin, count - // if rest >= L2_SIZE && vector_length == 64 then L(L2) - mov tmp1, 64 - cmp rest, L2_SIZE - ccmp vector_length, tmp1, 0, cs - b.eq L(L2) - // if rest >= L1_SIZE && vector_length == 64 then L(L1_prefetch) - cmp rest, L1_SIZE - ccmp vector_length, tmp1, 0, cs - b.eq L(L1_prefetch) - + cmp count, L1_SIZE + b.hi L(L1_prefetch) + // count >= 8 * vector_length L(unroll8): - lsl tmp1, vector_length, 3 - .p2align 3 -1: cmp rest, tmp1 - b.cc L(last) - st1b_unroll + sub count, count, tmp1 + .p2align 4 +1: subs count, count, tmp1 + st1b_unroll 0, 7 add dst, dst, tmp1 - sub rest, rest, tmp1 - b 1b + b.hi 1b + add count, count, tmp1 L(last): cmp count, vector_length, lsl 1 @@ -129,18 +118,22 @@ L(last): st1b z0.b, p0, [dstend, -1, mul vl] ret -L(L1_prefetch): // if rest >= L1_SIZE + // count >= L1_SIZE .p2align 3 +L(L1_prefetch): + cmp count, L2_SIZE + b.hs L(L2) + cmp vector_length, 64 + b.ne L(unroll8) 1: st1b_unroll 0, 3 prfm pstl1keep, [dst, PF_DIST_L1] st1b_unroll 4, 7 prfm pstl1keep, [dst, PF_DIST_L1 + CACHE_LINE_SIZE] add dst, dst, CACHE_LINE_SIZE * 2 - sub rest, rest, CACHE_LINE_SIZE * 2 - cmp rest, L1_SIZE - b.ge 1b - cbnz rest, L(unroll8) - ret + sub count, count, CACHE_LINE_SIZE * 2 + cmp count, PF_DIST_L1 + b.hs 1b + b L(unroll8) // count >= L2_SIZE L(L2):