From patchwork Thu Jul 22 15:59:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44456 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B3849383D011 for ; Thu, 22 Jul 2021 16:00:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B3849383D011 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626969615; bh=ljuM4l5ZLqwCJ/XaFtbQJVj1vy3a6UqQbiVthbtQZDM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=ymOef0LnlPRHI2r66qjZu8ttscllXvPJC/fIdMF4D1TQQ+NwK9fA2IO82eV9rzuwT hDDoozKL8WGkYZLBiWpwTQwAq1Mn/iejkuwJ2tg4nL1TKXRsazVG9oEF5RVci3jRoC CmxuHG115ywavOsO8suxwGU0cK1KBDmQDm6jRDAY= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80070.outbound.protection.outlook.com [40.107.8.70]) by sourceware.org (Postfix) with ESMTPS id C5EB73857C53 for ; Thu, 22 Jul 2021 15:59:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C5EB73857C53 Received: from AS8PR04CA0138.eurprd04.prod.outlook.com (2603:10a6:20b:127::23) by AS8PR08MB5944.eurprd08.prod.outlook.com (2603:10a6:20b:297::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25; Thu, 22 Jul 2021 15:59:49 +0000 Received: from AM5EUR03FT018.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:127:cafe::8e) by AS8PR04CA0138.outlook.office365.com (2603:10a6:20b:127::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 15:59:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT018.mail.protection.outlook.com (10.152.16.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 15:59:48 +0000 Received: ("Tessian outbound 809237f40a36:v99"); Thu, 22 Jul 2021 15:59:48 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: de698130304502b2 X-CR-MTA-TID: 64aa7808 Received: from 9831eb7a0e1a.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id E91266D8-C52A-4DB3-B89C-A0D425112012.1; Thu, 22 Jul 2021 15:59:21 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9831eb7a0e1a.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 15:59:21 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ApMIqsLOayTZ+Jq9YSQhaa1RsTtZN4mpFafNQw4kpT4yW3vYir10DiQHYZYdyws3CBaJ0HxQl8MQpuayW2A0IE7SPU5HXrqOs0QtnyQX1xJqvrT0Kuib/Pc4jmxs/GDOxHklteUpQK48Y8/4fYEy7pLN9ACl9mx6J0CD+8q4s4Yc9e3u8+9fOfNKZmBydFAgNNNanx71gGeXQmt2wyvo2eUVSrDRKq/97EXGEvu0zm25IKJxzNagumUqmAOFm0p13n0E2uf4b45SGX+1PffcA0UA7lhNneM7029oQGbzQN0ETgiMK5VnTMTvZtaLSkgy593yvQFa3gb/mE3CzhSBGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ljuM4l5ZLqwCJ/XaFtbQJVj1vy3a6UqQbiVthbtQZDM=; b=OfeZ9cicOdIGzeWgEd8DWDguDiMWH+BUNDSq6tonASfOQDYJxjEiEcJb+WeVFiZasbbpoSodsfkeLn1fUxazY40YrIRnR5tHdSKUpMNhQyRUu3ZHvOaW4Pcjp+fxyG8p6zJD30VE51olJXzPdXyrMLFeftE2ZIQ5j/x7bfoXHcjrGu1tuMXgz7YrqfRHajiGL734aWuxEwCdlyVFvVOQRpSPIMY+pUKQTOKNqfd4z+MDQBcBcwecls4wpGSsOc8jrGqAnkiUUPjFnSTWwzZuWsKxUHbLMu9fl25K9FhdNL6/OUjCPNvFnVn9TVnyA9LcDvK9EF8PNoIvYRZCc9eciw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR0801MB1710.eurprd08.prod.outlook.com (2603:10a6:800:53::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24; Thu, 22 Jul 2021 15:59:19 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 15:59:19 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 1/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 1/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxJfA7Ac8a4LAESeqDyonguxdA== Date: Thu, 22 Jul 2021 15:59:19 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: d489e7ad-0502-4942-d748-08d94d29bbcb x-ms-traffictypediagnostic: VI1PR0801MB1710:|AS8PR08MB5944: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:2958;OLM:2958; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0qNBustEhDP2KYatovmEdOThKRqAxT1PE4hmpC0yiIYt2wV3H79M2MPjz94ij1P5ubl+cOahnPmEPZ0LbeNvkT2xiTXt2SO+1WIUkiya/ufB4Zj6IQB6g984DjyvZuaviNQXKv9mR0jRtL1x4NsGUp/qu66biAWMfVLlSPSlUXwVQ0Ul/aWju3L+iXSZsnaW4Utp1tuqrSxkBjjlZwLWZzJ+ZYFbgi61SxBVQLBbkwztCQd6t1PvQjDhSmwM+j/GtWGhUJShqsMprxlSEupCIWVOL1AHFgxFekJ4JBl4lhHeu40mKdMQf9d2s0FMcbptkLaNRNNibK+DfHahrPDCVHm4JJ+uhwNKYyWzlGO4HSKynr0kxLjQf3gulw5Q7+GjHJore3lw+QHb63IIxjhzqBWvD279USFN5EaHjXbW3EosmKhbmBmWfnlkX9e3vgvWo+SScwEf3Jnlk/Mx5092inhreljDVwDNvi0y7GOBVDb3SoxZl9kF+XZFelOM+cPi0CxVbknTUVpDXBLOWtkrCrMPDXrA+2lCPa+Vbq1zaaf6tqgtT6AoLlDL1mkP5giL5W9GvTd+yukz1PgkZl2SfsiInW2q+hbMzdMgbXwfl3WCfZ/1oym8ywdznm72vZKYonzv7wyQ9h5yMLvBIb5Vf5xgb98dx4WC1z/RPrqKZAs9NOnOIDpI0BcsMFZyChSUPyvcBOxzosh5xTv17pbct0QIpXgpZLlKEuP796jBQIc= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(346002)(396003)(366004)(136003)(376002)(186003)(66556008)(6506007)(316002)(122000001)(64756008)(66946007)(52536014)(478600001)(76116006)(7696005)(66446008)(86362001)(26005)(66476007)(55016002)(91956017)(33656002)(71200400001)(9686003)(8936002)(4326008)(5660300002)(8676002)(2906002)(38100700002)(6916009)(38070700004)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?2K0U31VwX8VmIqK/F+FzOzX?= =?iso-8859-1?q?t4XoPTOH/owo+0ybk1I61MLWV3IpRKp0aftFsHiunqp03eOLdgHdamfVpxsu?= =?iso-8859-1?q?gh4XsECKEFf9U3wu8oIrO3U+a/dDg86AsneDy5lkhAv1vJZSNUQkldXRsHVu?= =?iso-8859-1?q?FGlkZUU/26uo3dldhpwxQ9eogbW5B1vuzcb3F1lcTx6nYDjct3K9mIkwgBBm?= =?iso-8859-1?q?eHDAWVFuZcsHd/C0dYbVH6mlB/V4PppwgMyR2AZhjKWv6H64HpsnDQKt4+8G?= =?iso-8859-1?q?181/2vVys3rGGU27zTnQLPjrvXOeRHXg7ZnBZbJsdFq9ry35KHyZSkogkEjE?= =?iso-8859-1?q?BuhGHkPqewmlKGlTntRzCRJcW7NDfmKRh6ebMHri3Ov0+ifhkeIiLxUvD5D4?= =?iso-8859-1?q?UzazSQxMxgvsL2OV1VlnFU+hPgF+fW52YXjvkZ+IhSqIJe0BRyTEDKe960MI?= =?iso-8859-1?q?kDx0hxUElj4OOhj0OBuLq3zk68Xv9HDuieLqh7Pgcu9vZ8IaP3Eyzxqdef8q?= =?iso-8859-1?q?UNtXiUFvpGnUdADENRTiJjNMiKcRpJxMdJlAdov+jSTLVLwKh9ofWpC5K8s5?= =?iso-8859-1?q?Q2ef4gmEAd0mttde/g2AjmGvR1D+EYaNlJx7KmFw2aVmczq1HzA+O4ZQW/UD?= =?iso-8859-1?q?VJoKWG+L4DgePuu784EPH5Z8Elbwgfv/ASiooSaCinJdfcp2LLVqDn28ezI3?= =?iso-8859-1?q?QaernskhziwW+0so2kWJS0f/D+0kF93YdaRFNF5GqNzQwOVEFygJm5bQtlXq?= =?iso-8859-1?q?VxL8Nzro0QzMLUvGLDpXFoVe79GoRj022I5Sq2N1/UJ21NKzoAthVhOBGyAm?= =?iso-8859-1?q?QsyTq6HxehOzSa65q4bon6GC463p4oWVgnXFGq7Y7P/wcXeX62CO1PSU7pFa?= =?iso-8859-1?q?BWRA40MIjAoIaQPImUf3WyGB6lIiT1ti748ayCmELLeRD5pBB5MR9Ah6Qyvd?= =?iso-8859-1?q?ksvdUbtFJnmHpFXISr3MXFgTQyi2Fm8KDGg5eCJ0qxrt6mVa/UZVKQnjGFVi?= =?iso-8859-1?q?gkknxJpV51U/3eCVcVbLBh1+GczWtnRPjcKvPEugYvwc8D715Z3zk46fiY8n?= =?iso-8859-1?q?FwY74PSbw/75StZsamzseQ69KW2y91guxJTQ1SAT0xNs4bC19mqP+cAAH2tL?= =?iso-8859-1?q?7+ZdtwJbrD6TajU/szwUSMvrl/TE80rR55Kd35Ob78RcGrNEmQgvXfRf0xPU?= =?iso-8859-1?q?H945WWO0dP0KZAMLYNcq0CHmKh8PuZEnesbb+ZX0cTizz/kt/j/etw04Abi5?= =?iso-8859-1?q?LR3y/Y6r6P4IVf6naOPZOTaHZJp22sEqqrNiZFVPIS5UQEMmwbuoqn0HnVTI?= =?iso-8859-1?q?Gh1V6T/hO7kZODcLXRoUXOYK66bFMQC8zJD50fKk=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1710 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT018.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 29bcf28e-44c9-4574-c689-08d94d29aa64 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wAEAgs2m59z6RsusV1jYGSOgeDIYyMh9z/tZQ4Y6qNVGg50K+IPzLHffpaWOGbSM+L9FaCm6MHJ+TCZiWK/N2KXeJ+EK3zbjwZPVnJeUxbskGo/Y1C8YMs9t9j5dlrHnKzlt9kRJpyEvGGCDSixqSeYKyAEJ3p4SzSNzcDQ7aidWmz3PDNK1bMfXZGT3wodpkI1WQgBt5+HuoqmkNU+t+wVtd0jWoM4cDUGX0wjoe72TAofTZSPGZJ2osSVRbF40/eMYlFamyt4FGoe0/KgvKxU9zMh/qDjwTH9J0dpFb5Fie4nqRQy5ZP14tmubNWSdqG/lBbssACnC4l58I6mx3/KZGKkxitMNXQspopNaXHDZxYvuF2qKnD1xInM1GPQgGf5tsD1QL1QuX8JXp/6yndwV8ozuK/NSKGLgMltzV/0KuO8JPi7BMpzhyvTwoAh+ym5BZmmLupWHl2DZ3lhKQoQtKFNAI7HWxZXOdYqBJVD++0bqJxqD792qevaDMmSxPhDZQFQP5aP+Y0pFtO3AYQqkA/OQJPmCkfbULi5AV5uHgv7IhMmSpYHQdrEp1D1XvwwNZ6tuvQ3sNcTCdo0MxtCCoSKRrewKbnu+hk4WNcVEN8b1lx0jaGCszSS3DEWr3ECIPCuyM3DuDhK41xCPZO+4E+W3Vrbiu6Mkwh2GsROdBJyCWlqnxxBtToL7Bj4QzUX96uGYTWfpKP/b3M5gDHzOTEuJQgxhJFvo4xOs7XQ= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(82310400003)(81166007)(70586007)(6862004)(316002)(36860700001)(2906002)(8936002)(52536014)(70206006)(4326008)(356005)(5660300002)(8676002)(9686003)(6506007)(33656002)(26005)(55016002)(86362001)(47076005)(508600001)(336012)(186003)(7696005)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 15:59:48.8738 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d489e7ad-0502-4942-d748-08d94d29bbcb X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT018.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB5944 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Improve performance of small copies by reducing instruction counts and improving alignment. Bench-memset shows 35-45% performance gain for small sizes. Reviewed-by: Naohiro Tamura Tested-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index ce54e5418b08c8bc0ecc7affff68a59272ba6397..f7fcc7b323e1553f50a2e005b8ccef344a08127d 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -30,7 +30,6 @@ #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB #define CACHE_LINE_SIZE 256 #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1 -#define ZF_DIST (CACHE_LINE_SIZE * 21) // Zerofill distance #define rest x8 #define vector_length x9 #define vl_remainder x10 // vector_length remainder @@ -51,78 +50,54 @@ .endm .macro st1b_unroll first=0, last=7 - st1b z0.b, p0, [dst, #\first, mul vl] + st1b z0.b, p0, [dst, \first, mul vl] .if \last-\first st1b_unroll "(\first+1)", \last .endif .endm - .macro shortcut_for_small_size exit - // if rest <= vector_length * 2 - whilelo p0.b, xzr, count - whilelo p1.b, vector_length, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - ret -1: // if rest > vector_length * 8 - cmp count, vector_length, lsl 3 // vector_length * 8 - b.hi \exit - // if rest <= vector_length * 4 - lsl tmp1, vector_length, 1 // vector_length * 2 - whilelo p2.b, tmp1, count - incb tmp1 - whilelo p3.b, tmp1, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - ret -1: // if rest <= vector_length * 8 - lsl tmp1, vector_length, 2 // vector_length * 4 - whilelo p4.b, tmp1, count - incb tmp1 - whilelo p5.b, tmp1, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - st1b z0.b, p4, [dstin, #4, mul vl] - st1b z0.b, p5, [dstin, #5, mul vl] - ret -1: lsl tmp1, vector_length, 2 // vector_length * 4 - incb tmp1 // vector_length * 5 - incb tmp1 // vector_length * 6 - whilelo p6.b, tmp1, count - incb tmp1 - whilelo p7.b, tmp1, count - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - st1b z0.b, p4, [dstin, #4, mul vl] - st1b z0.b, p5, [dstin, #5, mul vl] - st1b z0.b, p6, [dstin, #6, mul vl] - st1b z0.b, p7, [dstin, #7, mul vl] - ret - .endm -ENTRY (MEMSET) +#undef BTI_C +#define BTI_C +ENTRY (MEMSET) PTR_ARG (0) SIZE_ARG (2) - cbnz count, 1f - ret -1: dup z0.b, valw cntb vector_length - // shortcut for less than vector_length * 8 - // gives a free ptrue to p0.b for n >= vector_length - shortcut_for_small_size L(vl_agnostic) - // end of shortcut + dup z0.b, valw + whilelo p0.b, vector_length, count + b.last 1f + whilelo p1.b, xzr, count + st1b z0.b, p1, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + ret + + // count >= vector_length * 2 +1: cmp count, vector_length, lsl 2 + add dstend, dstin, count + b.hi 1f + st1b z0.b, p0, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] + ret + + // count > vector_length * 4 +1: lsl tmp1, vector_length, 3 + cmp count, tmp1 + b.hi L(vl_agnostic) + st1b z0.b, p0, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + st1b z0.b, p0, [dstin, 2, mul vl] + st1b z0.b, p0, [dstin, 3, mul vl] + st1b z0.b, p0, [dstend, -4, mul vl] + st1b z0.b, p0, [dstend, -3, mul vl] + st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] + ret + .p2align 4 L(vl_agnostic): // VL Agnostic mov rest, count mov dst, dstin From patchwork Thu Jul 22 16:00:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44457 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7D7B8393AC0E for ; Thu, 22 Jul 2021 16:05:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D7B8393AC0E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626969943; bh=jEe+vN1py4oED362ndRHl9Wj05x/N2GFs7JVwPLxQE4=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=S3qNe3xQ26RPHINd2NvWHIDnuQtXHt/48XOtqhoUIXgIfdWcFHdhwmMVcrKNKKA00 QPcH6w0q6WSQfxWl9LNvM7Kg8rIaIIx8OBc9aZcsHt+wIxyhFTjW4Oo5NYehrAL7cN 1goSsPJ55shWf67S7y4DBUjDvm0xeoyCAqJotEAQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150041.outbound.protection.outlook.com [40.107.15.41]) by sourceware.org (Postfix) with ESMTPS id 9DD613848409 for ; Thu, 22 Jul 2021 16:05:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9DD613848409 Received: from PR0P264CA0070.FRAP264.PROD.OUTLOOK.COM (2603:10a6:100:1d::34) by VI1PR0801MB1888.eurprd08.prod.outlook.com (2603:10a6:800:89::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.29; Thu, 22 Jul 2021 16:05:18 +0000 Received: from VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com (2603:10a6:100:1d:cafe::35) by PR0P264CA0070.outlook.office365.com (2603:10a6:100:1d::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25 via Frontend Transport; Thu, 22 Jul 2021 16:05:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT026.mail.protection.outlook.com (10.152.18.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 16:05:18 +0000 Received: ("Tessian outbound 57330d0f8f60:v99"); Thu, 22 Jul 2021 16:05:17 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 95a550db2db729b4 X-CR-MTA-TID: 64aa7808 Received: from bfe171db8e6d.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C56D9E99-28B1-4FC7-BC75-E5FFFA039654.1; Thu, 22 Jul 2021 16:00:46 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id bfe171db8e6d.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 16:00:46 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fvT/2wJ4JxOAcaNViiY9NXlJWYFHRx0hkjGr6CAFwAzR6RehjxtnGTum0dPJKHl8L55KbHCjjHATpu3r0dKFhu+s1h5bYN3JDzlZgQA9YNwf7zlQgg2qwZd2ComotfkTlYh0t09LYjRHRIrv9AWFE66+Gfdv1y0I6zxT3DWHHjndvLH1eEa39ffa1XRddTC/g12R1JvF3hLJxQUosdQLjtQoJhgnTZm4Ql54f2lt+SKVgl5810uCrCLWwMiMuEjLo6zdaJKEPU/3iD+eJB+LCgQ/Ihkw8RiYEBaT72WxQSUZEsW1UQnC3hH5nSDRuhaSIraa0ECwnHKVFx6F84tzEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jEe+vN1py4oED362ndRHl9Wj05x/N2GFs7JVwPLxQE4=; b=PiFgC6hApTBFKnx03Yai72yAEkfXBylXLeuuucmBDD/YqCvnmNFtegDsT3SkdqJwfe/GqeWQtFddyPFB+biN8qIhmcVdW2soz89Ty03A3/1uFAZSTq62d/aEydu6U0nPuCUWe90ryyCs0iK/7T2+5Cwjezg2LnLBoLyeQW7KOb2AieLWZfqZjaO8F0O+ACFrsjhRaSpwWXNRTPeAcXb8MoxJkPOptwF9AH5Zp9xZvbb/JajvrE8G/XnzoxDqWkpfUYA/a0yQtEydPNHia1cjr2S4yuSyLuKMW46syowd724VLOKwIeUpncRbhFCh0rKTvl7PklcrJLYAIg2N50HDAQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VE1PR08MB5872.eurprd08.prod.outlook.com (2603:10a6:800:1aa::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24; Thu, 22 Jul 2021 16:00:44 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 16:00:44 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxKaeH7mpjfTrkCBy3mxGimKXg== Date: Thu, 22 Jul 2021 16:00:44 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: e4ee4af8-ae24-4d02-39c2-08d94d2a801b x-ms-traffictypediagnostic: VE1PR08MB5872:|VI1PR0801MB1888: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:5236;OLM:5236; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: BhySWc1xUkP5+jIGBjMJUynYcR7VlPYnVdS09u8qYaU/qJaQo3I0aatXXtvaJAWw5DdcVEyqY4sRBwZ4/3bZD6MRgXggEI6KYE8twr3W8PSr8UL2SoRmzzPc1bRdmz9MZ1zHUj15jitEOXVSR2FiI/Ww5z8G5PjWdvGkChq+SmgzLSUA9b2Hpj9//6qeFanr8XknojtR0TjaoQs9+oCeUvdQeST/BHld3Sry4N6NvA/hhlnyWtqgz28f9D5U9n2BCAozZPbA3oJvo3MtOxs53SlFnLMEM8mjFNvpgoM1/ifDvbvGg+Hbr5XX6HRFU+qtWo/e+RHXvz8qCg8av/r7gOL7SPe4Oa30dDisx62tJPeD/X2/TMoMOtLJRGCI0ki7yS7M6PQX5iHPH3t0R5aBVjP8GnwXPW4Gn0xZR/5nmRIQWTTtfrkUj8ylHtR6o4DQrvPaMolgHWaKB1/LS0N5FmhuD6F811bfEfFqbErjrfdEYQj+Ao0Su79u1Wue2onLwsujcB/swqunsb3IpmlFw2xHpUvnCrhskSjalvm8Mv+25WAOm2CBJlcfJV9ua0mxZrgBlk6wCI8dex15Y4YNCyUfVZrtFgInuuwlGphycISK7/yynTolVylapKgGq6E8Nx30qpw6aTCCuOAgvaBStSQ8x+8xG+UGU56wB02Q5ZN5iSO+MQhX82WH9u9CCNREeNlnxez2OtaxF+5+Q5EBWnpB6WTsQHVbbgfHpyrK+d8= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(396003)(39860400002)(136003)(376002)(346002)(478600001)(2906002)(86362001)(6506007)(8676002)(6916009)(122000001)(38100700002)(316002)(55016002)(33656002)(7696005)(9686003)(186003)(4326008)(5660300002)(66446008)(52536014)(8936002)(66946007)(66556008)(71200400001)(76116006)(26005)(91956017)(64756008)(66476007)(38070700004)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?jFe3EY0jE7CB0lCG6s6dQwD?= =?iso-8859-1?q?vYS7ipk2kf7nPfs/EMCoFVvxZbTOUY7UzvW9voH+dh3jKuahTNxDhdO1bAPB?= =?iso-8859-1?q?Gom0PD/UypFP1phsH+mN31PIHSNq9KysN8pergZxNahmqX2+S3eYMONS071e?= =?iso-8859-1?q?sRJ4UaqrZriecE51KDj2FulFrV90IhNz24lww3nprOmHeV7nMpbSCn7j/1To?= =?iso-8859-1?q?x3RGIuQ0JZOt2Q6xpEENecsrc5L3nAjRnbxBKl5F9aL6z6L0C53LZDB8t20P?= =?iso-8859-1?q?LBQ6ZAn+D1xrx6awvX6joUWLwStGeT1NC+Ee0WsFtBbWXLCT3pDh3gV1ejw6?= =?iso-8859-1?q?gj32vQPSXMa/iRw9dDlhM9hBgy4kNICK4E6JQAIgiRxUsWHMuKT/uf/fc7oI?= =?iso-8859-1?q?kF0k1KjDXdZUddlzMW7kEEryee7saGGaCcynraXCWsIfYD2NmYHC2oFmiQtI?= =?iso-8859-1?q?m+Fw6mmoaFzAAd48UotYgs7tUu53dJ/ufev+fifLZSULGlODSigXrcRLHx+V?= =?iso-8859-1?q?RUxp01SjZve/iK8hecyVoQ0c9LNfov2bI11iYNF42fnhxUsD1TCHEu4i/tsR?= =?iso-8859-1?q?iHn7Lb7FIQ9DnE7ui3LvlWao+iUPfcozcCSmzOb9+hF+m6aSp5A43fAeWGJc?= =?iso-8859-1?q?xpsZz/fmpR6LzZ2ftpu/u1hZ6dTSSYwsizkOTjo+kEcYeAjGmvNm4OwqlN2Z?= =?iso-8859-1?q?uXsv7/HnpLPX47z6VmcSsXH5a5Ps4ZcJzjoPJi4FKEBbGgZeALyoKWzKEVNT?= =?iso-8859-1?q?zwL2uS11xaXLcbl+mxDRCHkTKp7ByS1KVEb056BLcRqKsUEKYjgTumK4Jt57?= =?iso-8859-1?q?1EBEoJDlRS23l3o0tebTiZVMOIuKYgrkF6x36yk4MZ4JewQ7YIVKyQNJTzuS?= =?iso-8859-1?q?d5PWOZdSXJ7HURqOGsPcpP1ZTeWtYbYU5WM+8hgCVnYpoRLCY3MzDf6M/+/k?= =?iso-8859-1?q?Pg0XRI4W+Tr8hQIauRbm8pEwGBwzfsmOubIoap2xK/1qKjZDwNV1Dgb9J+3r?= =?iso-8859-1?q?2ZZNE8nLAVPe9jwdFpi6Qvq3cGTQCzlr4ClVzYfT3rf6CqZCYxj18nCWthVB?= =?iso-8859-1?q?2f+3CPyfLFIYZNZoBx7lMcrx+baRVi6LfFebJ3RCWK2bH6K5jek+rfF6tcb7?= =?iso-8859-1?q?c5g348s1qA20DJaarjmwZW1T6QCUQUxi2JjIPsZ3FGFCirImAf0Fk2ua5N6v?= =?iso-8859-1?q?FqCoqI65qcbIboGyH9pAjX+o5DT5+IdoD9NiBr7UMOByDENErGLbe5IDM83B?= =?iso-8859-1?q?7nzPCxmWVoaGXYoVD0ZdxiqvM4LvYiN/yrd5dPRbTzPoLIcqxBZfsCZTZ21r?= =?iso-8859-1?q?oB/bs2xt/qsLuYR9Kd/Wec6QW7/0EHKuJnnRheUE=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5872 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: b755f83a-4a6c-422d-eb56-08d94d29dd1c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NcAFIZyZ+0dUD/6qWWt7txq+TKSz+Lw/WtajVUjikb7wXF6sA8hP914y2/i20vhErr98uI+mjUpJEAtYeXCzCo5glBW1Lob7ppugPFlxwoSv70PqlMNJpndXINb1BeGgTPJrl9fOvX08nqOz6cW//EF5Ecclw7XNBoIvAfNHU3awvfU4u0Nw5OO0fq7afBGV/Pb8Jr7B9nHtZqJ44vFBGTMNOA5uglG+1FHbRAj2ZSCE2c/QCx/BF/pJyodtoPbChH6zTSFf/RkyhQEqix8OL3VHU7xwfczWhYUUBdM3zzDzy+jeFrqPsvIGDQVkKHPrUO66qnqkR7r5VkdgkPT7+qqxnmxO+j7GB6EXWqikVmyw0fjvsKccWDHOmhjyKW5t4uaJZx8x5f7snDviUqXXplnWMn7Tr39Z86uJU5cDZBP6P3G6spcmlyxmWf8NecCSqqCwiiqZpy4ZwL0hLMaNDuaitRuaHZI7bIQu6B2cN/UElVu1+oUkIC+4DEYFFuuqjs84gDPrk8NxI7EJpsJ9FTNvKWpslNUlVFrft61yehEo75VzBxNuqvQECn7Fj26b5MpeSYgNWg6D9Ijhbwh4N7N1A0KNu/YELf/5qZXYSP4r+F2M1NPKFAJcB8/be2OWnvJUZdWKRXYmFyGz0/GIAjmWY1ztS17SQ474DN5TCIgwMJzoBZVzrAba7Er6fIZzKuEkUEyqzr42pwUeibK3HF66a7J7E9Hlq6mupIYE6eM= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(39850400004)(376002)(346002)(396003)(36840700001)(46966006)(186003)(4326008)(7696005)(478600001)(47076005)(82740400003)(356005)(81166007)(52536014)(6862004)(36860700001)(26005)(55016002)(5660300002)(9686003)(2906002)(6506007)(316002)(8936002)(86362001)(8676002)(70206006)(33656002)(70586007)(336012)(82310400003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 16:05:18.1656 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e4ee4af8-ae24-4d02-39c2-08d94d2a801b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1888 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Improve performance of large memsets. Simplify alignment code. For zero memset use DC ZVA, which almost doubles performance. For non-zero memsets use the unroll8 loop which is about 10% faster. Reviewed-by: Naohiro Tamura Tested-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index f7fcc7b323e1553f50a2e005b8ccef344a08127d..608e0e2e2ff5259178e2fdadf1eea8816194d879 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -30,10 +30,8 @@ #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB #define CACHE_LINE_SIZE 256 #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1 -#define rest x8 +#define rest x2 #define vector_length x9 -#define vl_remainder x10 // vector_length remainder -#define cl_remainder x11 // CACHE_LINE_SIZE remainder #if HAVE_AARCH64_SVE_ASM # if IS_IN (libc) @@ -41,14 +39,6 @@ .arch armv8.2-a+sve - .macro dc_zva times - dc zva, tmp1 - add tmp1, tmp1, CACHE_LINE_SIZE - .if \times-1 - dc_zva "(\times-1)" - .endif - .endm - .macro st1b_unroll first=0, last=7 st1b z0.b, p0, [dst, \first, mul vl] .if \last-\first @@ -187,54 +177,29 @@ L(L1_prefetch): // if rest >= L1_SIZE cbnz rest, L(unroll32) ret + // count >= L2_SIZE L(L2): - // align dst address at vector_length byte boundary - sub tmp1, vector_length, 1 - ands tmp2, dst, tmp1 - // if vl_remainder == 0 - b.eq 1f - sub vl_remainder, vector_length, tmp2 - // process remainder until the first vector_length boundary - whilelt p2.b, xzr, vl_remainder - st1b z0.b, p2, [dst] - add dst, dst, vl_remainder - sub rest, rest, vl_remainder - // align dstin address at CACHE_LINE_SIZE byte boundary -1: mov tmp1, CACHE_LINE_SIZE - ands tmp2, dst, CACHE_LINE_SIZE - 1 - // if cl_remainder == 0 - b.eq L(L2_dc_zva) - sub cl_remainder, tmp1, tmp2 - // process remainder until the first CACHE_LINE_SIZE boundary - mov tmp1, xzr // index -2: whilelt p2.b, tmp1, cl_remainder - st1b z0.b, p2, [dst, tmp1] - incb tmp1 - cmp tmp1, cl_remainder - b.lo 2b - add dst, dst, cl_remainder - sub rest, rest, cl_remainder - -L(L2_dc_zva): - // zero fill - mov tmp1, dst - dc_zva (ZF_DIST / CACHE_LINE_SIZE) - 1 - mov zva_len, ZF_DIST - add tmp1, zva_len, CACHE_LINE_SIZE * 2 - // unroll - .p2align 3 -1: st1b_unroll 0, 3 - add tmp2, dst, zva_len - dc zva, tmp2 - st1b_unroll 4, 7 - add tmp2, tmp2, CACHE_LINE_SIZE - dc zva, tmp2 - add dst, dst, CACHE_LINE_SIZE * 2 - sub rest, rest, CACHE_LINE_SIZE * 2 - cmp rest, tmp1 // ZF_DIST + CACHE_LINE_SIZE * 2 - b.ge 1b - cbnz rest, L(unroll8) - ret + tst valw, 255 + b.ne L(unroll8) + // align dst to CACHE_LINE_SIZE byte boundary + and tmp2, dst, CACHE_LINE_SIZE - 1 + sub tmp2, tmp2, CACHE_LINE_SIZE + st1b z0.b, p0, [dst, 0, mul vl] + st1b z0.b, p0, [dst, 1, mul vl] + st1b z0.b, p0, [dst, 2, mul vl] + st1b z0.b, p0, [dst, 3, mul vl] + sub dst, dst, tmp2 + add count, count, tmp2 + + // clear cachelines using DC ZVA + sub count, count, CACHE_LINE_SIZE + .p2align 4 +1: dc zva, dst + add dst, dst, CACHE_LINE_SIZE + subs count, count, CACHE_LINE_SIZE + b.hi 1b + add count, count, CACHE_LINE_SIZE + b L(last) END (MEMSET) libc_hidden_builtin_def (MEMSET) From patchwork Thu Jul 22 16:02:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44459 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EDAC33891034 for ; Thu, 22 Jul 2021 16:07:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EDAC33891034 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626970043; bh=OjtyJJO/jDZstu1YvbNcjhf27NxM7eqKsItgC8/uUes=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=utuls35p82vfIDZZnu5yOnu+CoNQTIxFYVkKmGhb6XBIZsCTeLspAiIuOLFovBDPo J2i1QPaKHHzh8RoTl/hdS9TR0e7l5wibByLIxehFzklHA/GGIBI2/gA1DHECnH68Vg sbastfNM24ITf3oZjw6mQvV5f7PP2WdcudJUZE5s= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2066.outbound.protection.outlook.com [40.107.22.66]) by sourceware.org (Postfix) with ESMTPS id 3CDAA393A416 for ; Thu, 22 Jul 2021 16:05:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3CDAA393A416 Received: from DU2PR04CA0288.eurprd04.prod.outlook.com (2603:10a6:10:28c::23) by HE1PR0801MB2028.eurprd08.prod.outlook.com (2603:10a6:3:56::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.29; Thu, 22 Jul 2021 16:05:37 +0000 Received: from DB5EUR03FT008.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:28c:cafe::1b) by DU2PR04CA0288.outlook.office365.com (2603:10a6:10:28c::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.26 via Frontend Transport; Thu, 22 Jul 2021 16:05:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT008.mail.protection.outlook.com (10.152.20.98) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 16:05:37 +0000 Received: ("Tessian outbound 809237f40a36:v99"); Thu, 22 Jul 2021 16:05:37 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 2dda691bf480a030 X-CR-MTA-TID: 64aa7808 Received: from 0dd2aba2be9b.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 5281CEDB-8690-48DB-877D-C704A0C3503E.1; Thu, 22 Jul 2021 16:02:10 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 0dd2aba2be9b.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 16:02:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BZv2a3+DkFmjjpkzsgNLpKM0LU1we/CWBN//hk3k5Ic5vFPXWBIgVdF3cxAWSFfQjcFVLNA6y4STqdL6tllhHED/SOfSH96Vp9ayQWCFD2vAXRRfJuqTUa7QIOI3yf4cC0oLAfX39MTGRG1GpNMHFJEd6Tg5W5UjmedGDY+wLo7g4M2DAvVFWmEvYJz6Nb7Q44u6O81kUjh2GYmi47KD+7r1qCQV+sCNyDFJJwcqtakz5PsC0UMSaVFAi0gQZBqfC0EE+jguajFHqtPD5G9ZMszsuPBkXYEwUISBSvoEuvawq82Fe2qY42zTMRwtgyiki1e+LP7jUP3k/ZvpdCp5sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OjtyJJO/jDZstu1YvbNcjhf27NxM7eqKsItgC8/uUes=; b=fI74gW5qQqbMTjFPY6rRBXuWuaJeQcVHiRct5Rm+eKA3Gy+cb2gylAcZ2Z0l0ekGwp8DNkWOTnsHSeBt1hiE8ka3zbucUz0in20NR+Akv1RSdv89Q74/gsNILfhurm/JY4aunFwrG4F1K/rvmrjNJTlDWnqknJhpNGsIvcVIZGAB2Jsw9b4CgDG9Urh0FQIqiM6mJhhbkxwsforRimmPgndzklrmmjYSxYZfadKRTVsoPtrZgdbILD9WdsVYAHmrq+OeWWLCIiJOh5sR4mkrYXiOtl+dL0juJCdJ3MSkOoyEqYMot6Oz/pwkH1ZnZwB8ASMqUqoM2nCOhQZ8cmYcjQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5551.eurprd08.prod.outlook.com (2603:10a6:803:f1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.29; Thu, 22 Jul 2021 16:02:09 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 16:02:09 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 3/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 3/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxLTfxEck9F5oEu0Sp/qNCbIIw== Date: Thu, 22 Jul 2021 16:02:08 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 7e572ea2-8070-46b1-3384-08d94d2a8b96 x-ms-traffictypediagnostic: VI1PR08MB5551:|HE1PR0801MB2028: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4125;OLM:4125; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 43fCqH0yVUNkBE8whPljKL+LJlxKGG5hhTe4ThwE6mG9W4mRYwneNElDI2ZBq8YqbvdQ38Bkqmts+lrisJkBQa2qU66m9MPfYIOvHNq+ExzXofFZajlunMrvQmYFTUTw9QNYdij96aIn5zHk12b0huvJpYAqR/mivQtl2R5uXO/kQ5H1T7ST7AQbyCL4XDE6cdQg/0g6+hEnePcmS6g4dO0l6EYm3NpG+I7VehehrEWls7IrX5la1teTabnoXV+udrGHUaV07PvbIU07TPWTV2KFnYHt8vsf0VZWPMwklg+XQv1AK303Ilv6467M0hQHA1BrejEEL7DooTSOYL/CU2VvOuFOpO4+1kPFY1kc2qsHJC+flbfP6Ao9gU6nCutdHvzM+rY1AqTtLxH88EETS7cKU4iglrtfTHt7juQ0zvyTYjIg7+27sLd+L9NdaN+u/xqden21XJTuKQjmctyzvJDuyga35uFf/CNZ4fqpxi8jG5IYUvn7iUB4MGLlhFyDo3eYTIgEc3dcOxBL9tZ8Gz/TqCBnwppdeT2hfFJRnbj92Zuu/BbzkdiSkNxpQCfKervROfrfNQIoFjnbPrs3XdKFtVTEyLBpfHQ2+o6UtUgR6CmW3si0pIDsSBq/OulTmXei4RfydtErdXSAoAvvMZu1SzIe8XqfstE9BXD1u9Tgbfh3/alae7YxLnf2XBiJ7FCvOh23xvk0rR2UxMFkOAPNXcjU0f9sYhmyTgy70Ac= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(136003)(39850400004)(346002)(376002)(396003)(76116006)(64756008)(38100700002)(4326008)(478600001)(66556008)(66446008)(2906002)(66946007)(55016002)(5660300002)(66476007)(52536014)(33656002)(9686003)(6506007)(91956017)(8936002)(186003)(122000001)(86362001)(26005)(316002)(71200400001)(6916009)(7696005)(8676002)(38070700004)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?WDw7jnJ6qkMo3WJKn49Cttj?= =?iso-8859-1?q?mDnaq6+0M2y3ZnShLgZwYt4Ccgyyl5qr9Al0zWR8Nq5VMc0NnlNi5CwQsIzO?= =?iso-8859-1?q?HZi/Jkj06hOfExXMzxqsXYJ1DkrOIzn+GifzxwAePh4wdoqYjKFSVHrFEWKN?= =?iso-8859-1?q?1jDjd8ay2bMijJaK2z7Olm6AE9PLfJtGxkAIFDr2kDGlSLnsyrK0Jy+uUdxR?= =?iso-8859-1?q?CMer1qjhJaWK8Y+sKdhrVf4nyhXuKqDdebDS2r/AZQTszjcDf8OHarSl9x7k?= =?iso-8859-1?q?ejdi71YFq6j4/arptI8OhrKZIclBb3GNN/OgkbAhBYkzYn5VgfLd30cQSRbB?= =?iso-8859-1?q?0oUOkwnmDL3OIHBGZ3zmYaCGBpzhTksNPOPhsjPpqCM5Qgr8rcmo8HxDOYeE?= =?iso-8859-1?q?73ZzY6dIwmXH327jhyODsOG9V+kJqDwbz6KiNvD0NE+Mgo16Z0mZXubicYUt?= =?iso-8859-1?q?0R7TCo2DCBkCp768uGaaREMahcEDcivqXr/d88bHgGe8lE18B4rD/YsDnX+F?= =?iso-8859-1?q?aMAvb2PFiI6u973rtBOkhTH2iXa6FK+HEYUbKQXxqdYfEgvFpcxof1M4Kl85?= =?iso-8859-1?q?hkniKJpysWjyhEzlUXFt5rHpGxKrCy1+NqJVzJeX7SX2qq54cKFP80qFMBtK?= =?iso-8859-1?q?LtqNX3PGh0sDmzF71LebICsbrSrxp7xGFidIrErE6rcIgY1e9eOoQzOsU8VD?= =?iso-8859-1?q?6VeQTFoHk2aHoJS6ONYqinXDji6bQdAYRtzJbtKNr76G91vdM8txN133wOt5?= =?iso-8859-1?q?0x4XxBWbpoL7YjZ3ASpGZLe/TVjq6roumedscnHYsfhvR9dfbpM9DOThp2B0?= =?iso-8859-1?q?jU4k5zxj12nDf1lNffWz3EaVGRbJZA/vnFMk5cUO3845xz8F+E70065WyA/1?= =?iso-8859-1?q?J6liUpkBJHwzoJ5W8zpTYUQMbCsarz8SKEqFG6Fv9sU56kZLCjpyYNc2CP1Y?= =?iso-8859-1?q?Hd/TOe4cnGsu6ixOlDKVPFfNvgrDqt3/WZb9C3wOK6PzL3GyEABN94NSv+w3?= =?iso-8859-1?q?STXpn1LoCmPx4XWDNksRK0ncPUTuKr6qzK7qXwKRxv6uCbSGH6fRw39lsOpC?= =?iso-8859-1?q?e4K96Tm+99OrYWnfRvjMUfADklV0MMavG3HAJM0TEkdm5+AHeFD0uraIFAMp?= =?iso-8859-1?q?bSG47j4iTTT3WmCq96nKcO2y2OHQRQXD+F61KAXVwj6Yb8nhoC+hvv1lfrOA?= =?iso-8859-1?q?Mo8sNOKRLeT4rsanUyfMthMXhOtGo7tykRhLGMBKh4W0lrfxQHKn9WJueQOI?= =?iso-8859-1?q?GdPWBgMSXr5WEw2yGrbJ1XvqMgTYhfGFuBl+wGbZCA/vc4TxARdgKNaT3hED?= =?iso-8859-1?q?N6bSFN6puU8DrcPmnVtF2nqISvKVAZdAKF7vbKoU=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5551 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT008.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: eb15861e-1f75-4632-a7f0-08d94d2a0f5c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +YyXx0dslav98PzQHtv/Yh7t/XgoOeeKuULwm2KhhcWCUi7zgWG6r1JJb1urC7ucXf9BVpm3qZ6nEm+LCPNeOs7q9Lup/Bqj8nYqLcbIl9r0sddJbKrLkj9UgJx6c2uPJ3gCzKSGstmhScTrKudsksZ6mH1EO3dcumasb0iu+R7WjG0rtWOJtZgxSlyo6DqF5ciRvfRKoGkUb4NwM7e4a+yBG6O79nQ37nqWGwZ3fN5FahLruDpe3CvnvN5TGZdixknL22/3tP0BkcSZiDit+2g1JOrfQjeS0KmyIuOqfj6Fg067OWfaOPy1MdkhC91ua50DzM+zQaeDW063khmU8Eq2Z4P+KiZXp2jPkyZZ78aX0N2LN+hU/zACvKkL9zAHIJxra+565zuB1D4O3kSPoobwSrE33tqikTOCkkEVEtUZ4vMEPYERUc3AN+JdUoHlkDWvV7AvNrVL2VvpjmHyAAbXfA/lPkcEPo6US8zXIFZ7GAEkEdhBieptcC7wTDb1pfyA6D4RypZN+lJWKtQzfbpmBE7OR2Bbxld9OUISSuKafXBeAWZVW/wLhXYQEOZxPPVrvC9UfHlIsdKNIbPN+kQxyHmoWFab/YOuhvERalbcamWz8qETR1p0lUeTCACUTEVZqh4m2oeLhcxBqrkZxQq0xaA9c+BqF5es39TA8GTYirBe6Z7AODJW1lMx33l3a35iL4fYLq+CZPPu466rxACBDiswYMQuh2NusfRQDaM= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39850400004)(136003)(346002)(376002)(396003)(36840700001)(46966006)(36860700001)(82310400003)(478600001)(4326008)(336012)(26005)(70586007)(8676002)(33656002)(186003)(2906002)(6862004)(86362001)(82740400003)(316002)(52536014)(47076005)(8936002)(356005)(7696005)(5660300002)(9686003)(55016002)(81166007)(70206006)(6506007)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 16:05:37.5407 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7e572ea2-8070-46b1-3384-08d94d2a8b96 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT008.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB2028 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Simplify handling of remaining bytes. Avoid lots of taken branches and complex whilelo computations, instead unconditionally write vectors from the end. Reviewed-by: Naohiro Tamura Tested-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 608e0e2e2ff5259178e2fdadf1eea8816194d879..fce257fa68120c2b101f29b438c397e10b4c275e 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -130,38 +130,19 @@ L(unroll8): b 1b L(last): - whilelo p0.b, xzr, rest - whilelo p1.b, vector_length, rest - b.last 1f - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - ret -1: lsl tmp1, vector_length, 1 // vector_length * 2 - whilelo p2.b, tmp1, rest - incb tmp1 - whilelo p3.b, tmp1, rest - b.last 1f - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - st1b z0.b, p2, [dst, #2, mul vl] - st1b z0.b, p3, [dst, #3, mul vl] - ret -1: lsl tmp1, vector_length, 2 // vector_length * 4 - whilelo p4.b, tmp1, rest - incb tmp1 - whilelo p5.b, tmp1, rest - incb tmp1 - whilelo p6.b, tmp1, rest - incb tmp1 - whilelo p7.b, tmp1, rest - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - st1b z0.b, p2, [dst, #2, mul vl] - st1b z0.b, p3, [dst, #3, mul vl] - st1b z0.b, p4, [dst, #4, mul vl] - st1b z0.b, p5, [dst, #5, mul vl] - st1b z0.b, p6, [dst, #6, mul vl] - st1b z0.b, p7, [dst, #7, mul vl] + cmp count, vector_length, lsl 1 + b.ls 2f + add tmp2, vector_length, vector_length, lsl 2 + cmp count, tmp2 + b.ls 5f + st1b z0.b, p0, [dstend, -8, mul vl] + st1b z0.b, p0, [dstend, -7, mul vl] + st1b z0.b, p0, [dstend, -6, mul vl] +5: st1b z0.b, p0, [dstend, -5, mul vl] + st1b z0.b, p0, [dstend, -4, mul vl] + st1b z0.b, p0, [dstend, -3, mul vl] +2: st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] ret L(L1_prefetch): // if rest >= L1_SIZE From patchwork Thu Jul 22 16:03:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44460 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A4AA3848409 for ; Thu, 22 Jul 2021 16:08:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6A4AA3848409 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626970089; bh=DawHQgeki8ZvZ/CEVflMwrDi2LY1/Pc1w6Kf2TtSQY0=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=joq8X9CPXQZsbngAknxkfRZ9G9mGUGOxPjZODhcqd0mgV3V7NyArAlfAHbtbBgZdY gKhSc1U7lhY2DIlbuttRWGK71xeqhXwWeQtlWuQPMIWeQR8D3KjTTHASU7KDr16Bqo LEiuB4FvaN+SgWVHY/fimfnLwjvTiUfa11F9I5gQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-eopbgr140052.outbound.protection.outlook.com [40.107.14.52]) by sourceware.org (Postfix) with ESMTPS id A304B3839834 for ; Thu, 22 Jul 2021 16:06:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A304B3839834 Received: from DB6PR0601CA0031.eurprd06.prod.outlook.com (2603:10a6:4:17::17) by AM7PR08MB5301.eurprd08.prod.outlook.com (2603:10a6:20b:dd::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.28; Thu, 22 Jul 2021 16:06:53 +0000 Received: from DB5EUR03FT035.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:17:cafe::d) by DB6PR0601CA0031.outlook.office365.com (2603:10a6:4:17::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.26 via Frontend Transport; Thu, 22 Jul 2021 16:06:53 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT035.mail.protection.outlook.com (10.152.20.65) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 16:06:53 +0000 Received: ("Tessian outbound 664b93226e0b:v99"); Thu, 22 Jul 2021 16:06:53 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3b1a9aa6f195cbe3 X-CR-MTA-TID: 64aa7808 Received: from 5446ae3cfc6f.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 91732F29-A1A0-4FED-AA4A-ADF658427AC2.1; Thu, 22 Jul 2021 16:03:21 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 5446ae3cfc6f.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 16:03:21 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VsqD42CehXZGKzBDKVwht5shoWnzqwzJgYlpHfIkdFjXDc4vSW9kdYqxue3brcqqucnLHA5LcEst51SU8A1iKBHPKYbkg0KeayZk4itJpaiOa4VSarGoSXFzm+8cPTdSruAMwqhnfMxMgZypgxW2veTRMiRHMnDGXMlQycZDg3ECarwg8A0t3znx9WlONZjyeOOYT7all+Q213kugdxb81OJIr+zQm8Q0OKyLsEYFyL8gTeDuAiO3wUXil87xF2ULn36tPFW2rvZFluSh6+de1vyj3MJBGNG5iYE9hGldjqIwN3Lpku/Axzi4yoQrREZtyu3MzwPuUe2/kWLc+3f+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DawHQgeki8ZvZ/CEVflMwrDi2LY1/Pc1w6Kf2TtSQY0=; b=SPEhJR92rLZbwDOHN2OxTPkidGZ+JZ32Dm/8n2ibLG4tB0tdqZ+tJNVAB2Jz1eyKLVcaydfOYNw29xAhiPRanOQJcvxUu8FwGiXkHgn4ntxk8iPMSgoZfAVfp/pMQtV3CBoYfmQyePeUdEMjYWEcxrjSFg7azMht6zdp/+Or8PJ4pA+3UyIINQRdRM2zBp2ZkrueskVE7sxTA9ITNKdNQXqH/PVxtt41tVy2wMz/vEL1DCXiAuzZ3u8eR3fo1CLwxP6R8osC9i47YfM4tcmX+Gl78LL1giyjH7UvDryQz1MfxKvzUEN9K56eQDYFbah9Ucqu1+0JfJIGMbnOKYEQaA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5551.eurprd08.prod.outlook.com (2603:10a6:803:f1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.29; Thu, 22 Jul 2021 16:03:20 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 16:03:20 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 4/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 4/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxMAahcBoeFLQU2G2lcRn1XzVA== Date: Thu, 22 Jul 2021 16:03:20 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 25ac83b0-1f73-401d-53a9-08d94d2ab904 x-ms-traffictypediagnostic: VI1PR08MB5551:|AM7PR08MB5301: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:2399;OLM:2399; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: FaV46D/ANL9jtFHsGU/ZAZSqDr5fH7t5l5lfg66bnAMuFz9Ou2LgemGQ5eVWLq0+HUsG1xd1yBO2CB8mr3ya2IVxuv/gERkfTO5CRlImOP4HlC3Pio1guaLCb8dZ6Li++7z+0hmOQ/dawdPfSiJR9yEw21EwTXG3MsCfdtx/uMn2iX8/Pg665eKfR/7ur2RGCLkD21znS0DoMXhYcIK7XfnP8R2nwdUUBZoRKRPxC+JAiqnrB+aXsKNdjZbXzZAs/J0b6w/JS2QnP7/G2GDb8zBj4j8oXkmfbXEcXSVG4hrbsymxYaHnA+SoPACuVZLXkmAZha8AWQrsRyYVUZZ1tObZhvfrsN3jAFKrkCu8m4t/uC78SzwLPYgTbwP4g736AKqwOnQNQGAm1/Z24x2PoqOkKkyiiRsBc1xW13Vs1KX+M1O68zi4b5ZdEA60kaXAENohsfoa8ErtSiQuG/YU9TTH8cbcUyFUjNIQSlb0Mplxj8TMNkernLeufspKHDiIJ3gim6YYnHuqQQjjzSBxR7Y6ZjwSfA/v+43Y37c88jwTzdcaMjx4/Pd4QzPANqANcO1S/cz49y/kKtuvrC70wBHU7oqRvv8S4t8pJNt79OsokunuoR9XpxO0qSEQRYc2u2uQlgholXtLpLM3fPTTSMgLWD3hrq7BO0ag/ZQtdm4NAooaf1I48WM38rxYEiPzyY3hSm0u2tAjCin02L4TjQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(136003)(39860400002)(346002)(376002)(396003)(76116006)(64756008)(38100700002)(4326008)(478600001)(66556008)(66446008)(2906002)(66946007)(55016002)(5660300002)(66476007)(52536014)(33656002)(9686003)(6506007)(91956017)(8936002)(186003)(122000001)(86362001)(26005)(316002)(71200400001)(6916009)(7696005)(8676002)(38070700004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?e3Ph3RXB1DpQ866zX2vCqfn?= =?iso-8859-1?q?VfYafyLds6t4dABam/RJH7q+Wto0vXHcz1g/77D4aXLq/Mlxu0A9CkvjfGK/?= =?iso-8859-1?q?Ckg3tUZ/maLd06CDC6Xz0RB97K0VBzHtePbxNREe1AniyeEuaNtewPpxiG38?= =?iso-8859-1?q?evQJkpmwiHNwfWG9/HRY4K1Ak0KV/+FJA7hXOO2phtXWqzzcKjyzQT3vWXBG?= =?iso-8859-1?q?K/ah1dyP3bJqvFSwTJHB4rxJYfFiLNyh/iaLLMGInGXpMZ8CY1txpTK//Aa8?= =?iso-8859-1?q?qy/IoBykdcJdzFu7wi0G8LzPWSwxiRXbXaHwBnXI12GNF3XhwwMYywfMmRqq?= =?iso-8859-1?q?2OSnmTnF8j1KCUTdpScGULVuOKT2/NMByroxaQeBRidlCpOmdBAgmo4G8gGf?= =?iso-8859-1?q?wcHXkWsEMdkR1RLJq/Ey+GebZKb/bXmuIYEe13QLtYaF69FXJDAuZbgi+eqL?= =?iso-8859-1?q?0DeEgCN0+5E0JSlBppOWXyRffFfSIMkOopXydtM7Mq9CcprDzfXESy9nWQLZ?= =?iso-8859-1?q?8Z0cGoEK1ERCwEOY1X7TK/7+pTOyDfrEFJXagR9ZXqC2Ya0CjreEpAXbk3m8?= =?iso-8859-1?q?6Z+Xp3dWAL1EVEb+HAlsSQialf0lBAkvImWPtF1C3ObOdkRcDtf0afWaYjr+?= =?iso-8859-1?q?0Vj89CgU2JdgG8Qo2uQHQY1qXlMYDXCBxjr057kL6/QJusoIhgAgKqjs7iQX?= =?iso-8859-1?q?nsMeneNU38mmilX0zD5N6/0rrhxDZBG4tCqumvBu434PKcD6hjID5U1Kw8x7?= =?iso-8859-1?q?AI3NL9VBeg0GF+WF1ipKzsmQraQX8g/zVZbmiiVIVdKMgKnARQlGigd9+5bx?= =?iso-8859-1?q?x5NfdYrZhZ9JsaO+Tm/MSGmUWhLIoWlKRhqGDfzODAJawfWFPLtpKJuiPlM0?= =?iso-8859-1?q?pOzPWhGE9C04taPcy1x41sJkjhJDKevmkp+UDB6irgWhnS6Pxzx04CH2KbkU?= =?iso-8859-1?q?99mBMIgB/mz1QY0PhlyaAPOCmv2tMGxwzWDgTeGP4n5qqplGlFvSCEJBitAS?= =?iso-8859-1?q?3A5bznxBdNCGqt3kzPu1YbyV5Cr7paSv+BEXlq6FNgRmkqjT8RgjiGcYLQVn?= =?iso-8859-1?q?urxSx5xLTvIApe787voSebsQIDPf6C6TunPxWGvm711orx5NMGFo7yRyrdQE?= =?iso-8859-1?q?fztMDusrCLp70bhbPIPZQ+T3+TT7l0NL7hE2/k57JV/8f1x++KEUOgY01TFQ?= =?iso-8859-1?q?KqhcCmSWt8Bc0Bex5goCv2Rvg0aGEB//lSKEggIKse4RQFCr5MK1ps20fqR2?= =?iso-8859-1?q?y06PJFTME8ZEIP9cLVs7Ii2OwPEahpscS8GlqqXWab42J3AyFdssBYJY+3lA?= =?iso-8859-1?q?1tibTrWjccaX4XcfvF8xVVTIrpv6CFQ1ymWuWYm4=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5551 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT035.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 50a77a12-0838-4c85-4c80-08d94d2a39d1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: dXPVQ7s5gf01ZdjnOb9+8GpuLpTdQ2Yrg1s7trEc+VRjhj5ROM8iPc7N2NXVzOpvepqTVdzjG2bLNRRjYhNSSFBZY3r/Fk4awr0YRikoctc/sWMsc+sxqjpo3vF8Rqp5d+tGoMk6ohhG9PmdCacHJjRPWiP6nh2ZkW30zJf8ykvdp+SlSksZEFgEDkzWZ1zJZpD4CPzxrEJDFe+X4tYEwiVbsvvF5jLnHevSP17Ymd2i0GKsxKOczolDy9xSz1JMKG+BVw4D6Thaw443JlNH5VauTytTu4uOMImgwwsc7ZkaYC/0thF4GJC03F6pX3HiNqQSkKhSfb+6t47xWmVmJl4L+S0rQ9mM88o0bD7EvGWzm15lfwav26cL1HIR7t96PDU+7YWN85IX/1KLT9hinGxmTcdwVk424ysAz1rzKdiXBhEB00IDkUahNUGVxLoF8/QMDK+uNE6Bct/OdU8GewOjQthq+vducA0W8RDc43VxBxZ9Pkv3QqWWb/DW39gJR4/83fHCOFQCVEUAnBXKBZ/jE08VHrb1m+AWqlpCio5sd5hEDUfj8S/svtrpdLKdDgbkLJxpp4dkAFtVZ5HMsnkKZnEY4UXYzRut0PrOW0nJUSbVtoWEQwoZJWRat7Rjg0k2TCj+ApeVFToNAxisRucZKOsbcjkrWwCUN8lRU7obVE9RHxYD4e5gMAYKYEb0PoLlLTMCu6fdFNd7rY7Lrg== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39850400004)(346002)(396003)(136003)(376002)(36840700001)(46966006)(478600001)(33656002)(36860700001)(356005)(186003)(82310400003)(82740400003)(70206006)(9686003)(6506007)(8676002)(47076005)(70586007)(81166007)(55016002)(8936002)(4326008)(7696005)(86362001)(2906002)(52536014)(316002)(5660300002)(26005)(6862004)(336012); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 16:06:53.7609 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 25ac83b0-1f73-401d-53a9-08d94d2ab904 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT035.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5301 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Remove unroll32 code since it doesn't improve performance. Reviewed-by: Naohiro Tamura Tested-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index fce257fa68120c2b101f29b438c397e10b4c275e..8665c272431b46dadea53c63ab74829c3aa99312 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -102,22 +102,6 @@ L(vl_agnostic): // VL Agnostic ccmp vector_length, tmp1, 0, cs b.eq L(L1_prefetch) -L(unroll32): - lsl tmp1, vector_length, 3 // vector_length * 8 - lsl tmp2, vector_length, 5 // vector_length * 32 - .p2align 3 -1: cmp rest, tmp2 - b.cc L(unroll8) - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - sub rest, rest, tmp2 - b 1b L(unroll8): lsl tmp1, vector_length, 3 @@ -155,7 +139,7 @@ L(L1_prefetch): // if rest >= L1_SIZE sub rest, rest, CACHE_LINE_SIZE * 2 cmp rest, L1_SIZE b.ge 1b - cbnz rest, L(unroll32) + cbnz rest, L(unroll8) ret // count >= L2_SIZE From patchwork Thu Jul 22 16:04:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44458 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 574C1393AC0E for ; Thu, 22 Jul 2021 16:06:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 574C1393AC0E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626969990; bh=qjKhxTYhq1TbVwTi+V76Nxn19An/EfBG4Gnx9eYM0Gg=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=sbHR8RDf6V8YbMpyosk1Pr1nLuvtRZwPWKdOTcNlznF2mkdelFTp5m/3Grw9bRjy+ N7jXFQuDl9tsl0/Ew/Y6FtfPu9Hz/CGmu5UDlnqPscv6ysSZyix/OOG5TX2+P5rs31 BzWVTr+seppNHssQ+6liMwCMujpiIdA49k8sHETw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2083.outbound.protection.outlook.com [40.107.21.83]) by sourceware.org (Postfix) with ESMTPS id A19E23889827 for ; Thu, 22 Jul 2021 16:05:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A19E23889827 Received: from DB6P192CA0008.EURP192.PROD.OUTLOOK.COM (2603:10a6:4:b8::18) by AM6PR08MB3384.eurprd08.prod.outlook.com (2603:10a6:20b:4a::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.28; Thu, 22 Jul 2021 16:05:30 +0000 Received: from DB5EUR03FT057.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:b8:cafe::7b) by DB6P192CA0008.outlook.office365.com (2603:10a6:4:b8::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25 via Frontend Transport; Thu, 22 Jul 2021 16:05:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT057.mail.protection.outlook.com (10.152.20.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 16:05:30 +0000 Received: ("Tessian outbound b81a99a0393d:v99"); Thu, 22 Jul 2021 16:05:30 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 467b0e1e8536e8bd X-CR-MTA-TID: 64aa7808 Received: from a2f4ed4e03b4.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id BB302076-5B80-42BC-9CA4-85F6F99810AF.1; Thu, 22 Jul 2021 16:04:35 +0000 Received: from EUR03-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id a2f4ed4e03b4.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 16:04:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g/fXntQ+1eeDzOnuaVK7ovDldgpsz/GKmYXOejMLd50nmVQGXMKR3/CZLK0/Vj9wTMquTfCoaJz1sdaw5is28iEJiTKXYwHInOflp8KcrspKm+14mexQSLVfKCIq5VpjreTTFNdkSwgH48vBAvLUTYlg08v0iVoWjXW/zBtJ5g2oXjjnSBQvIBBORqFkQwHAIJsEZQlMhZk6EtH0IqHNtZXjBQZS4NnMX9+u4FlCGgMOQ/HIJOlVLroeY0UhYaXonPtcW/IjXciYsaxoCGmNDColY/RvjSoFpeoQyh6/KnSuAj0RtY5a/VahbiElgaL41mAzksCNESZTvSRx59MKZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qjKhxTYhq1TbVwTi+V76Nxn19An/EfBG4Gnx9eYM0Gg=; b=kfq6D+eG5mTdNCFGlNgL9f/+RjybwWVIcZFBQgUbG1ZRgEul4dGW2Oq7YewjugFltqUEEflC3+n7HOaIWNnvaeAGo/tL5kPt+A3XkUt9IJvu1PM3mXNpyGpG88ScMClAm2aQteZJ9cybWVdyqUQiYDAPsyicuPTnpFWsYUi/g4Wv7dRHSYP1oA18ipPd+MA18aeGTvyNgUxD+Cp1dEhdM4LrNdnqdw9t5i88vUZf0h5sQE+108atAPsYslNUnbkQt1X1UZcQPFKe8RC4M6MRWcl6LhTsBc+G+ZIaprfGZAUD1wYUbIW7jWxNGX8TDZiBI2ZG1hMUEsh4zPVyzlQgnw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB3664.eurprd08.prod.outlook.com (2603:10a6:803:81::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25; Thu, 22 Jul 2021 16:04:34 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 16:04:34 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxMmDOwcjJYmP0eOZ48wdBomwQ== Date: Thu, 22 Jul 2021 16:04:34 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: b7b3246b-c183-47e0-b6df-08d94d2a8791 x-ms-traffictypediagnostic: VI1PR08MB3664:|AM6PR08MB3384: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4714;OLM:4714; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 9/7NzTGmxWB2XqPvtpBoGhY2YScXDpiaRzcM6QAPnGl4ADQukp+l+Ak7FK8PQPCSaIRSFW2NG2fC1AdPJmPi/Y9xytbsviTJjY5ts3vwyhbOOXzRxOvNOFGtQhCpHtolkbcwkE4Q25RNDEDKG2v9ceX+fdpDnXmK3ijlgBkGmjo7S2dCkqh5+dCjMk6mQl3kxy1o3mLyDEyrtELV+PO+gB8W4IbJeU5FlyrmMWWDCYtzK3g0dq/JdqCq4BmtidRThkdeb9PeFrh7LJUuUZrNTWDEqZCH8kJuS/a6kjnSLFuQyxq3iowAr8U8h4LH9heQYayw1cnv4/ZqCGjLSzUZw5dYXnPKWj93JvBwbmJdVAlu2aytvo5i9R0o4ja9TSP1AOtfy5qwDaaB68eoGZlGiAfdob+nX0mEwt+qO40Jr2kZxYOlZpWgYYHmFAWxEGxfoT/iK7wYi3/REvvSaSXA8AHZ/X9mt7Sl8nIYHqw+w7w/jm9Asv5AFaxOu8SnpxYN5I9rvxiyZGrlFz1jO6WHwTBLM2kbhcms8i27Zja9lIKkU5Ugg89k+rVc70JkMTwM3cIfHQzCe9WiXWcPOcdw/qKKknF5PfkWYcybZ8QebRHDmpTvT+CguqQZ7KQjCy2AYpr/o1k8eq/KiosPDevwxFw4dYIpWTLEh8/cDSO6zWAb4zn2YqRNQhGYANj4AOSc7m7Y7kxf6nmODNM0mxqJ7kEGXKu9oIrixI4O/0/AmBfKbHGi1w12lRKMkjVDLfPjMK3qZw8pHPNgs4P5P0KCnQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39850400004)(376002)(136003)(346002)(366004)(396003)(55016002)(186003)(26005)(71200400001)(9686003)(478600001)(66476007)(66946007)(91956017)(64756008)(66556008)(5660300002)(66446008)(8936002)(76116006)(86362001)(33656002)(2906002)(52536014)(4326008)(6916009)(8676002)(6506007)(7696005)(316002)(38100700002)(122000001)(38070700004)(473944003)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?CkSOxnNxA1IHGExEzpcV7Rl?= =?iso-8859-1?q?vyzv+InUx3TywfEYKqgGuehlMDeNG3tMlN3DJKRsoDnmdzWikcmr8Qbp8FQe?= =?iso-8859-1?q?qYsp2JA0g25sgbxhHloMwiTnkecGr+lkRLdQMHOrvK4efyFxT3KPGHjALAmX?= =?iso-8859-1?q?EgWcviTjy5jYNM0Bwsbr5U+NTLJ35BjoYfJZurxNfOGSauQEXg8Y1/u0dCbk?= =?iso-8859-1?q?g5uc/p0AChWlBMEdXEKeUDRcXkvZ3k2jlujeqDK9nBLqeV0ys4WqziwuzEwF?= =?iso-8859-1?q?IY/VDK7tfjHFexAm1r+Kr3AiinnJj84sQc/hNxt0T8k+4PJ5h6FdvwbT0Qg7?= =?iso-8859-1?q?ICyERAKwy9Wb5ikG2QO40pNZ7JYgienE4WUFN6sjwVCzBF45xvz+qAyWFgLq?= =?iso-8859-1?q?UVsgPpfxoSnwZTWB7Fe+gDKtJLbW8xYACBoZWW6PyfFhYImKMgLJ99G8Tfyb?= =?iso-8859-1?q?LriNFOJiXqKR0X65b1KPacZonu+nUanMjAVBc2sUE5KCfjjcjNPHSUWjDvr+?= =?iso-8859-1?q?AsomOGaq3Q4QZwBaGGsPbVagcSHAzfkJtpwQPhviqBae+9B/VWiznU4XPGbR?= =?iso-8859-1?q?2ib2uwNTBmeY25kWpVX39UdLkBNeKOV2XEpZw81ga9a2wW6GzruOrm8lg+Zm?= =?iso-8859-1?q?p/WBYCtSUHKCCy4BHOts9b3j5W0z02RziIFXtfAYlwCvs91dXQIyLPJOEdA/?= =?iso-8859-1?q?lsuc7eXgYAYknSkyRaN19H5wfDfxikQNLsp6aMe1/6sFhTnyvGxFX560ETmV?= =?iso-8859-1?q?X1GGffhhLFvOyZwmqMQyWcrhqk4TOc5goKV9JeB15Fb+Q5lNLcp4dcd8srOk?= =?iso-8859-1?q?ZjzB5C0ni1WkMvdBLqOrPPq/kQfErYl4s4vgQ5rOK0kKHOf3PLI7I5NJtpO9?= =?iso-8859-1?q?RpCXtCO9XNeYv1twJeRvVlkAUUtQv01Ajb/Bg2D0mLIoy5mbp0Pj+uAZaKW9?= =?iso-8859-1?q?/6vr/BtsSvM7ZpsEyGVyxqV9al6QbwW4R/EZSZQu3IwXXV8zy6Wh7cJegLME?= =?iso-8859-1?q?523+KTVG3otuuRZvWNu7ctQWS8ShuDEgLNaj/W6eTcFsGObS2V6ZGhyoAB5i?= =?iso-8859-1?q?pxsR75cmb4ATvnjg67smu4r9DMIdXAWdnce7snIxb3Xr5EjBfaTx6ev/00hP?= =?iso-8859-1?q?ylYJru8BLebHDVgZMGmHdjSz8upVhYMeqlbODNs+yFUof5kR79PKOEa2UdI+?= =?iso-8859-1?q?wkJ5jU+wd3oM1OV/JS9Qkkm7uvUPZ59lXKtGcQyYPu26twhuAT55dwbFYeA0?= =?iso-8859-1?q?TSwlB7iU0c/xPrGcZukecXD8aOS7J/lp+fVk/AGwjaI8L/Uocjg9IPU3Tsgr?= =?iso-8859-1?q?vnMXX03aw9HEHC01c80aPuibZIQWutHBasAc/t5o=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3664 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: b4e22b34-f941-4b7d-ff30-08d94d2a65e7 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9mAQe1SGMcya+LpYc+6OBjpaDhIp+wuuVdRk47IlaLoZOXFGUQ2IWlx+wBSr2Zyucc83rcjP8m4nG+6zf0i2eKaK3VE2izbjtc4eb2M1DG3e7t8f2dqjuHctNGa0gf5WBjKXLP6A2vj/QEtZhzjRCcwvGa7pnoo/SjxQRFfFpcw/JZxXgCmixAMO72cpW3tB1VaG3dOPJuAF4YoNxMLMmsU8iqwBe00lbSrjF4bet2BXKpMryiVdlqjJcffBMVPvib0ZWr+LeeX4LoIxcmA6a5pWVG0ai0jdOydYZ9jma7wPrDdpy+eMck4hTy3vcsn0MwEQKfnDvbFTPuLFYqYXfGE9Q8kMjblgsv7DVAPgKTtnlBwBsiCS5sDA/Y9jum0zsO2shngSyYqsFhpfWMMnI/TzaJ4xA5qRgOQ8XFx724RN5QpTQH9PhJ9Yax0rXMM3zlQGKSDz4iKlgkKIq64XGAJdubSXv5czX1354kAaDns5kJ1VFjNplqxpIU74MaeRDsenZHULRPOsPQcRPomSASnxu9KO1ISv2DDaq3oQIIQSm6A8/abYRC/OXWyW3QCc9rWmgMVXOpWJnwf7Ehnw4LuKM5ZYua8jPYU7bYMdnyPo/tZNEMwSi/r7rHmc9NnlsdUSDHai+eLZbBt81j5n6Y/DcuH/aX030rvjIkNK4sGF8aWj3VtVIb9l6VOnvs31vdCJyO2nkKNr0UZePYZt4zt0HMN/98Y8dQk7vOPma9IQmB/mN60gRUD5sPTv/YqP X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(346002)(136003)(396003)(39850400004)(36840700001)(46966006)(2906002)(7696005)(82740400003)(55016002)(33656002)(6862004)(70586007)(26005)(186003)(9686003)(8936002)(4326008)(86362001)(356005)(8676002)(36860700001)(6506007)(52536014)(81166007)(70206006)(478600001)(5660300002)(47076005)(316002)(336012)(82310400003)(473944003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 16:05:30.8004 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b7b3246b-c183-47e0-b6df-08d94d2a8791 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3384 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Simplify the code for memsets smaller than L1. Improve the unroll8 and L1_prefetch loops. Reviewed-by: Naohiro Tamura Tested-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 8665c272431b46dadea53c63ab74829c3aa99312..36628e101db33a9a8ff5234b98dd5a3a5c9ed73c 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -30,7 +30,6 @@ #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB #define CACHE_LINE_SIZE 256 #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1 -#define rest x2 #define vector_length x9 #if HAVE_AARCH64_SVE_ASM @@ -89,29 +88,19 @@ ENTRY (MEMSET) .p2align 4 L(vl_agnostic): // VL Agnostic - mov rest, count mov dst, dstin - add dstend, dstin, count - // if rest >= L2_SIZE && vector_length == 64 then L(L2) - mov tmp1, 64 - cmp rest, L2_SIZE - ccmp vector_length, tmp1, 0, cs - b.eq L(L2) - // if rest >= L1_SIZE && vector_length == 64 then L(L1_prefetch) - cmp rest, L1_SIZE - ccmp vector_length, tmp1, 0, cs - b.eq L(L1_prefetch) - + cmp count, L1_SIZE + b.hi L(L1_prefetch) + // count >= 8 * vector_length L(unroll8): - lsl tmp1, vector_length, 3 - .p2align 3 -1: cmp rest, tmp1 - b.cc L(last) - st1b_unroll + sub count, count, tmp1 + .p2align 4 +1: subs count, count, tmp1 + st1b_unroll 0, 7 add dst, dst, tmp1 - sub rest, rest, tmp1 - b 1b + b.hi 1b + add count, count, tmp1 L(last): cmp count, vector_length, lsl 1 @@ -129,18 +118,22 @@ L(last): st1b z0.b, p0, [dstend, -1, mul vl] ret -L(L1_prefetch): // if rest >= L1_SIZE + // count >= L1_SIZE .p2align 3 +L(L1_prefetch): + cmp count, L2_SIZE + b.hs L(L2) + cmp vector_length, 64 + b.ne L(unroll8) 1: st1b_unroll 0, 3 prfm pstl1keep, [dst, PF_DIST_L1] st1b_unroll 4, 7 prfm pstl1keep, [dst, PF_DIST_L1 + CACHE_LINE_SIZE] add dst, dst, CACHE_LINE_SIZE * 2 - sub rest, rest, CACHE_LINE_SIZE * 2 - cmp rest, L1_SIZE - b.ge 1b - cbnz rest, L(unroll8) - ret + sub count, count, CACHE_LINE_SIZE * 2 + cmp count, PF_DIST_L1 + b.hs 1b + b L(unroll8) // count >= L2_SIZE L(L2):