From patchwork Mon Aug 9 13:07:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44609 X-Patchwork-Delegate: szabolcs.nagy@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E58C43894415 for ; Mon, 9 Aug 2021 13:07:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E58C43894415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628514460; bh=nKXapOC+Pqr6fWMSauiEG6gilLXN0lMDi0xvLcN0Wp8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=YHcF9/jlE7yN/KRM4zYK4yZqL9sJynCzQ82A6Py1wBAtpDplzsDFb85CqQ9gnq+YT Yfbuk78czSsj38Arozk3Opq4TGbGy5Gg5X81h38FOrGsNm0jfJP7HmFWLvADpmDiJ9 YkD3DVeG+a6ejM8sIP+9Ij0b0ZrEh5sZf9XBWsLM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60052.outbound.protection.outlook.com [40.107.6.52]) by sourceware.org (Postfix) with ESMTPS id 46B90385482F for ; Mon, 9 Aug 2021 13:07:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 46B90385482F Received: from AM6P195CA0106.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:86::47) by DB6PR0802MB2135.eurprd08.prod.outlook.com (2603:10a6:4:82::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15; Mon, 9 Aug 2021 13:07:12 +0000 Received: from AM5EUR03FT013.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:86:cafe::54) by AM6P195CA0106.outlook.office365.com (2603:10a6:209:86::47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17 via Frontend Transport; Mon, 9 Aug 2021 13:07:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT013.mail.protection.outlook.com (10.152.16.140) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 13:07:12 +0000 Received: ("Tessian outbound 8529ae990a93:v101"); Mon, 09 Aug 2021 13:07:11 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 95283b44a813caa0 X-CR-MTA-TID: 64aa7808 Received: from 43323deaa531.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 76BD7B72-CAAE-4082-929F-EAFA37ED1815.1; Mon, 09 Aug 2021 13:07:04 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 43323deaa531.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 13:07:04 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SUv3P0vZN2KwAx7l0rq6uLnx5VU8eiHApBitAXFofIIvbfEoiLzwcUlAG1KBQjV7Y7YnXvrJ2r7IEAt0c2jdpic6/InOKYgrOZkx9TmSRxnfSFZZytYlODup7G3ZWN9ezkLcYOnUYSRL+31XYs1hC5n4UDCl5pYwKMh1FPdlZJGF7GuVhiPmRYbaDFhMgO0E/R7H1om7SLnqQNgkesBbykszRT9Yu1o/G/iZLB41YwLNm+TZPt89RGkMZM2eGz+V3UHPIrkMaNuJbBjvfX1URugI6nVEuPMpwRKYDlBruOjagHCZ7NrzBsAYoXU8iLQsPDqvv+edkQ2g9Ztytv7CHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nKXapOC+Pqr6fWMSauiEG6gilLXN0lMDi0xvLcN0Wp8=; b=hl3P8rbfkLQIRVFdBSqLpb+Ahmi+2cWUJC+epee8i2oBw5bYVPwrsLeLeyHLJurZOiHwcOBr/vl9IbRg3x+EvkRyaQZNFFtg7spuHh2mQWXjE3SPQYuDfdkHqWOgGGTIF0S93s8B/7DNmN8rhzWiiyMqbY2U9zJ0SlR9TlsDCKzLb3lz4pFg8UMy6QhLdfx+Fb/LQsupPpkKKE5ITmQBKA0eXPGYZ59YPsfhZIpNQEtvm6xXIGM0iXSzqjO+AKuy349DMibr4oier3dWqaOq/2KFnf4wg2cw+kNMvAK6DYc5Ixfm7J01eA2U5Uhanva9C+UNtVnGc5hdEp82XStfQg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5551.eurprd08.prod.outlook.com (2603:10a6:803:f1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.19; Mon, 9 Aug 2021 13:07:02 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 13:07:02 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 1/5] AArch64: Improve A64FX memset for small sizes Thread-Topic: [PATCH v4 1/5] AArch64: Improve A64FX memset for small sizes Thread-Index: AQHXjR8h8afY86Y+N0CZHA99hv4Q5g== Date: Mon, 9 Aug 2021 13:07:02 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 319db6ee-d51b-47bf-335b-08d95b369a0e x-ms-traffictypediagnostic: VI1PR08MB5551:|DB6PR0802MB2135: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:1775;OLM:1775; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0OKM1QW9z/N6+0q1Jp0fNfaCZNx2DbI5BDTIm11cDkQzC3YA9gw99xFOqkKXVBFylVaA+kGVcGH+Y3u9ntkq60Ah51i1ZttPj5V4fj7Q9n/H2KQ4GAWD2oucEvZftD56PWWOn1WE4DSkxEIW4is+UCzzJFilYiPA0ZvNGH8OoQGOuDkmGQ3z2ZoJ3U2xmOrNODZfzCh2y8yEcnB+9Sp7fB6tIPDGL/62pyRxambs0qEXI8+iSlziYFYrZi/NO4aZH6LKKf9YHzWa36AOt5d94Q/IfGEjAJmkyzu58NLzKJzxvqq3SvKGmyJjoIInSgL+pwct5Yl0aLzKZ4KEpKCGzjysTlEwshe3rJLN3OZMQQvlVBoShh13rH8OdRM7MyBqgLRVIoV8h57PqMzlE00CSe84J/YrxZQdOlwLhN7uQEJaSHUSykmID1aPhzQrg10dv6G+TCSCH/weR4R2eShNSBW7nIDIiY1BLmPddwp0CkoApdKot/g0Ml2yr5z/D5OcFmAFEyGiPcyxBNe/2Wqrzwh2iD0WNHeFfCwAdQSyuWRuTUntjCiIUIAFhaPeuWXBGBkDd9ZEA8QfuxmO12K2JinZd4WWBUPz5zPUkaewyOAkT1k/1lNFI8jLXW9rP9EGk/l1LGYY++R0fJntp8CTm4g36FPhtKM9IzMx8SqKmjamzPGVzJKrklKC8hShUR9jPHuQeSfyyZgKRqVu7QjlXND2jDTwtBnuA16fwkH4fNY= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(396003)(39850400004)(136003)(376002)(366004)(8676002)(316002)(186003)(5660300002)(76116006)(91956017)(64756008)(38100700002)(66446008)(4326008)(55016002)(6916009)(8936002)(66946007)(66556008)(122000001)(6506007)(66476007)(9686003)(52536014)(2906002)(71200400001)(478600001)(7696005)(38070700005)(26005)(86362001)(33656002)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?MzvvbChj7PaMkHPSDOvVwl+?= =?iso-8859-1?q?fvtV3SkkDH8prCY3efImHBk+P1Ym3w+XjlmXJrPIAVvhUKNtFsaeQYvKCr2G?= =?iso-8859-1?q?plDb6Nn9VnvBubL2Ou6W8krHhR3VbU+oJWJUsF7yzc0KHG6QkxCCv56o5vlt?= =?iso-8859-1?q?LlQ5OH0dwGntPf7QWNoCbff7svLisRMAxrYrZow1BLnxZsfCrb+/3wNvA/pw?= =?iso-8859-1?q?pK4bcmZTiqaamHpsPfpR1fzcmtsSi9zsfk3HHLQBc7xx/McbNx8DFMVHJG60?= =?iso-8859-1?q?sRsxKz0KBRY56mo1UdMqRub8cYYn6n4gZnf/gb3isXAk49b0gNlOq/2f8EhI?= =?iso-8859-1?q?X1aYwxPzfLLK5zH6L8O4yP9i3R8VGkLST8wunbEUv8wwjE88W+rMchB7tq5K?= =?iso-8859-1?q?jGMSmkR0R55JJ2BE1Pg8IxOi3WFTDaL1mn/Jq/37Ky/XUI4YNRHmjvnqsC7+?= =?iso-8859-1?q?5l6El+CC3QYpsEh+CUYTLMesKAQb3Mly257BQLQVP5BUsbCZa/5GZtUalkMb?= =?iso-8859-1?q?Rg9uUzsGLK/A+phm9ZIIfdwFzUHl+UpCJu8tIibKV8CxdTEc1HYaO8VG2OMe?= =?iso-8859-1?q?T5/9MZHMTOw42Da4gaXGlZa/8rqdy0acECK7hedozKog2tr+a46YtxbDaQIO?= =?iso-8859-1?q?IVQprEP16+mOlrRPzDm0vx9gd7IN1Tekex/c8T+awlD59m+Cls/rE8KiRmEE?= =?iso-8859-1?q?67u8BvOpMbfBpDG89X2VSrCGwf0vrriL2kleNwTA0acsFCVwP/MeLxS8eSvS?= =?iso-8859-1?q?WrYZoH1ZrORKhDvmlw9r51jJS15AymK1WLR4VoGyHKP5nmwK/dQjuQyVGJ4m?= =?iso-8859-1?q?TgilKB94eVqAYMsXJeY7KSM9IFTPPU50VDSiazvh3zjhBc557ckGzNhRouIl?= =?iso-8859-1?q?SKlbfC1Cv4WNuyV4WqJlHSe0+bvBd4GkmaPaJx+6enYR5rUldVxgdRUk2IpX?= =?iso-8859-1?q?PQotl7jRE9ZeX4tgb86bKZt2c7LOAQIj7Z6NuBq4pn12R1K07U9mBpHhO3zu?= =?iso-8859-1?q?KOlN+cqfBGe4RnvU0Qg9NwsLOYsmvWeZmNCuEgbibKRbH6lXA50Yz6hbdUHF?= =?iso-8859-1?q?g7EeBADM6aZT4p46HHhV+S5O4dmbKTxY1hH7Rr/WDoPxQhFOqv104yWOQUTr?= =?iso-8859-1?q?pSSt33vqMlr3FIPODlZOjh07CNIH/DkcvzEgZ29X3VWB+uYkfbKtM2WpW12s?= =?iso-8859-1?q?2SUzyb3GGsH+Eg5udXOQGagcY1LP1ce3kk6c6BDa/YA2L6tMO75wJDts/umW?= =?iso-8859-1?q?vGj7YUnPulta8QoGIOKEErQRyfVzBqo2Atv9iuHtKj6avKZrMCW2NSyRVzte?= =?iso-8859-1?q?ueTuuu9ezopYlC2i5BH9wYX6qx260wOlS+msz2v4=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5551 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT013.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 43824a4d-0c9d-4b8c-0673-08d95b369438 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tUuv1M5wjfu27n43E8GiFed2UFKVtRe/Z85ao0GT8iTnwBB0xe0RI6tljMEhHBYaJLaMggONOE+riiQkvhPgnRFhet//Ypd6urz5zqlh7Q7dakehSEkwiIT9yPbWK2/YyViL1dF1NuIE3Y+XxJBbAIlgc6/P6pKYbdFkJb/y1sOb0LKFtxgXeCXVxZYokIDv9R7TCXf+I2KPi7p6XN9QchvIE+hDNxFdeRHMyva9X4EwM7DtlL/YX5wEFr03nRwStKERAnEpF0sfRSrfbhGAgRIviglePrtSano4SnjGanMAZa+5CEsREp4zEdA+tm229va2y4q7DjsHG/f5THuptPX5VXxEtyYcjLYs3E0lqVV+tgxvdQJYZYXzaZTROZESyv/TH2gQ7oRcmr+cGDObZrmhWTjdiNmz+xLcstOyIyNV1WbktchkoA7r1vJeGi+IhmHNzVYr81huXbOnIK9LXFfvjO9Mwb8bRlF5b116wclPzstvhn9wgBo/KbFIJhjx31vmhcl+aWn86G1sjkVP6JqvnW1PDFVdJPpLE+A1BdCRBCf+1hZZ8nTI/6pnrpW454+BgSjxWgPuTEPQ92q+3r6LPmnc1906v4kTZHd8Qk769FiCefvmqGxoib2nDQ+m5Xnp3IwklxncCtdHMiVNdfjJixbMHSdtg6NPXiiB/GP5IjyOCTyxggjXMI96NO3fqjRk54wtlYMGM5Wj7opqU5/MJRYsIuRCNUb1THuLuO8= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(346002)(39850400004)(396003)(136003)(376002)(46966006)(36840700001)(8936002)(4326008)(6506007)(2906002)(9686003)(55016002)(82740400003)(7696005)(8676002)(478600001)(33656002)(81166007)(316002)(6862004)(26005)(336012)(52536014)(5660300002)(70206006)(186003)(70586007)(356005)(82310400003)(86362001)(36860700001)(47076005)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 13:07:12.0049 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 319db6ee-d51b-47bf-335b-08d95b369a0e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT013.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0802MB2135 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" v4: Don't remove ZF_DIST yet Improve performance of small memsets by reducing instruction counts and improving alignment. Bench-memset shows 35-45% performance gain for small sizes. diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index ce54e5418b08c8bc0ecc7affff68a59272ba6397..cf3d402ef681a9d98964d1751537945692a1ae68 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -51,78 +51,54 @@ .endm .macro st1b_unroll first=0, last=7 - st1b z0.b, p0, [dst, #\first, mul vl] + st1b z0.b, p0, [dst, \first, mul vl] .if \last-\first st1b_unroll "(\first+1)", \last .endif .endm - .macro shortcut_for_small_size exit - // if rest <= vector_length * 2 - whilelo p0.b, xzr, count - whilelo p1.b, vector_length, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - ret -1: // if rest > vector_length * 8 - cmp count, vector_length, lsl 3 // vector_length * 8 - b.hi \exit - // if rest <= vector_length * 4 - lsl tmp1, vector_length, 1 // vector_length * 2 - whilelo p2.b, tmp1, count - incb tmp1 - whilelo p3.b, tmp1, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - ret -1: // if rest <= vector_length * 8 - lsl tmp1, vector_length, 2 // vector_length * 4 - whilelo p4.b, tmp1, count - incb tmp1 - whilelo p5.b, tmp1, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - st1b z0.b, p4, [dstin, #4, mul vl] - st1b z0.b, p5, [dstin, #5, mul vl] - ret -1: lsl tmp1, vector_length, 2 // vector_length * 4 - incb tmp1 // vector_length * 5 - incb tmp1 // vector_length * 6 - whilelo p6.b, tmp1, count - incb tmp1 - whilelo p7.b, tmp1, count - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - st1b z0.b, p4, [dstin, #4, mul vl] - st1b z0.b, p5, [dstin, #5, mul vl] - st1b z0.b, p6, [dstin, #6, mul vl] - st1b z0.b, p7, [dstin, #7, mul vl] - ret - .endm -ENTRY (MEMSET) +#undef BTI_C +#define BTI_C +ENTRY (MEMSET) PTR_ARG (0) SIZE_ARG (2) - cbnz count, 1f - ret -1: dup z0.b, valw cntb vector_length - // shortcut for less than vector_length * 8 - // gives a free ptrue to p0.b for n >= vector_length - shortcut_for_small_size L(vl_agnostic) - // end of shortcut + dup z0.b, valw + whilelo p0.b, vector_length, count + b.last 1f + whilelo p1.b, xzr, count + st1b z0.b, p1, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + ret + + // count >= vector_length * 2 +1: cmp count, vector_length, lsl 2 + add dstend, dstin, count + b.hi 1f + st1b z0.b, p0, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] + ret + + // count > vector_length * 4 +1: lsl tmp1, vector_length, 3 + cmp count, tmp1 + b.hi L(vl_agnostic) + st1b z0.b, p0, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + st1b z0.b, p0, [dstin, 2, mul vl] + st1b z0.b, p0, [dstin, 3, mul vl] + st1b z0.b, p0, [dstend, -4, mul vl] + st1b z0.b, p0, [dstend, -3, mul vl] + st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] + ret + .p2align 4 L(vl_agnostic): // VL Agnostic mov rest, count mov dst, dstin From patchwork Mon Aug 9 13:11:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44610 X-Patchwork-Delegate: szabolcs.nagy@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C7474389441B for ; Mon, 9 Aug 2021 13:12:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C7474389441B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628514757; bh=Gvo46rkk8tVWp+7BUZdA35R11uGsgDMYGkYaIcOzdQI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=xaGNNXt6w4vc/OflDRj4ezh9nbMHuV/9I7QavCy1aHuw+1Sdb2KqEvEfomP/x/w+M PZPv+YVb3J+pKwvng9zGlb7icEEBUFvLXpFVDpCwGhJ+oWKvvE374k3duvo/g324Ix ks88j+UVLAugDgL4oDgpwyhe44EENL8JWo53lTy4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10057.outbound.protection.outlook.com [40.107.1.57]) by sourceware.org (Postfix) with ESMTPS id 55747385B83F for ; Mon, 9 Aug 2021 13:12:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55747385B83F Received: from FR0P281CA0047.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:48::18) by DB6PR0802MB2614.eurprd08.prod.outlook.com (2603:10a6:4:96::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16; Mon, 9 Aug 2021 13:12:12 +0000 Received: from VE1EUR03FT031.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:48:cafe::fd) by FR0P281CA0047.outlook.office365.com (2603:10a6:d10:48::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4415.7 via Frontend Transport; Mon, 9 Aug 2021 13:12:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT031.mail.protection.outlook.com (10.152.18.69) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 13:12:11 +0000 Received: ("Tessian outbound efa8a7456a86:v101"); Mon, 09 Aug 2021 13:12:11 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6f36f2c093fd769d X-CR-MTA-TID: 64aa7808 Received: from 53b0c2a2f10c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id EB4C0202-C448-4A00-B43E-CCF903C246B3.1; Mon, 09 Aug 2021 13:11:59 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 53b0c2a2f10c.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 13:11:59 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CrD7DQxKB7vgOtBvqlyR8f450h4MG9nubnmkE97J9AQ8sUuOxlOKIYt4+ResJXzpt1wovyGpwsNkxv/tWTKltVyNEGcfnOpHQ+auV6qBYt4Ft/pZHq+Sek/FAPW71EvNXVp3LOwnM8pgYRdQoJIzm+m2D1n5UPuL8RoJXGD9eaeDPh1IN5KkCy9+EB28xHRRjm+xO7CXvYX5vE8gTWgNoQ/et6bJXF0kLnSjPk/nCcplVMmQ9PYzyspArxU5J9qeiyh84avq0YrrSMN6NUiIkHeSY1qL4b38ghR0didxNSPNBN/DzyTxylaSzFJeKAzEdXsJeqbLK6zfDEtm9kvWzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Gvo46rkk8tVWp+7BUZdA35R11uGsgDMYGkYaIcOzdQI=; b=NdbkwUw3zaZ3u29q/sD8aJlXD7aA9aDJwQ0thZe6VdI1R5YMk+NFr8nZZJsh79zY88k+R+dmMZ54gSshszZcG7uWSJ4qWxTnHPC2U6xblMv7iv4fVx4JJm2FrruBwaiF69dcirlcDWQRLngi8S5HxBJavKaP1/DKoqcXJy4hTLBM+JoDBtj3IZc/svUmDmTLMFob/CgdxgyLztDfENucUS69QoQfDdN9g8OXEfzmkk6mMII9S7cj/TKPRa43T1anyLsdoxom/1kG1cnkCTgmxVlnDkofMutSZcJJBPwVaCf5L7mPSAL7M2nXsbFfY0dNd4i29wnXtX3XPUvZGycsOg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5344.eurprd08.prod.outlook.com (2603:10a6:803:13e::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.21; Mon, 9 Aug 2021 13:11:58 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 13:11:58 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 3/5] AArch64: Improve A64FX memset for remaining bytes Thread-Topic: [PATCH v4 3/5] AArch64: Improve A64FX memset for remaining bytes Thread-Index: AQHXjR/v4WjllKp9lUCoKTibIOGGqQ== Date: Mon, 9 Aug 2021 13:11:57 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 0b4c20e8-f10c-4b2f-a077-08d95b374ccd x-ms-traffictypediagnostic: VI1PR08MB5344:|DB6PR0802MB2614: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4125;OLM:4125; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: /VJcEs0UlbwTXdzLl6jqsp1lIJw8gkjb6fXbQc+wJTVimHJqlnuOAR8AVI/+CvOSCABvPSNpKNEUKJIreD+0HQ5HWMfOSGqHU+Ro2MEURRJbrmjN/PareFNz+QVilJdoRRGdzqMktSdA7apmqHxDyYjyTVFHjMJE2ZHDPQaBdva+RlVbQF80afxaK8KFCzfsrcuxzxb0LiOwU68NllYWseM6+uwLLauNt67AnqYhTfZ7J2EvgRhXcFPGlTx9B+U+l/B9rlZdDtr9W16I0ACkMP7HXDqtcRAkkqFXufhEiTeuo1rsRRCig/fGC9K0SRAZc1jg92GA9bxY9lRmyg+Jg7iwmxzgytKebVDMZu6KF0F+DATQwb/O+DT4aUIgYdaxfojFFAsGeDzypvpVjdBboRskHbpUBJd2m4EIB8H42Hq/BdJwGoCrHCK2Pxd2bpO4yP36pasdfdR3sI5Z2JjChAUVwVou2S+dg38hkJJf3U1IsveRuivekk/buD2EGEM6Xg4UTVwNwdqbvn05iPVLDEgC24jSdEMLZQa0vcStlxVjoaHx/eJqjAi7ETVTc40WMIJ9Ob8wWWdPkiHTJm7OJS3in8eFzDhxR5XRHczRAnMtGhlHk9x/OocEl+dyyl/nJHeQZo0yoVA2GuL/7BdiKT085OQ5sMBBPsIGAb5slC56Uso+n6QNKiDsUOhi1R3tFDokjmXoRHz7gbVo7aPHdgD8ah1oCWATaSYfo1UK2IY= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(376002)(366004)(346002)(396003)(39850400004)(66476007)(66556008)(122000001)(38100700002)(76116006)(66946007)(66446008)(8676002)(7696005)(64756008)(6916009)(91956017)(6506007)(4326008)(2906002)(71200400001)(86362001)(478600001)(55016002)(316002)(9686003)(38070700005)(33656002)(52536014)(26005)(186003)(5660300002)(8936002)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?zOcDnDGB/Ng/bRNGBBHwtWj?= =?iso-8859-1?q?2CsRyVxZhgBU4RTuQvrsJwn/1GCIjdlIDpGIsI9zrwS3clLRlAwjDQBSYyV8?= =?iso-8859-1?q?al4V3tdiqCEYXQtUfBJep/rYUkwd53AMqnYe3cEwHQL1M34UhijKR4Fs3GVd?= =?iso-8859-1?q?wm2FovXwnPHhcW2ynpUTGzKp0eLK0ggunvrNuQcn/y81jEPik/Py78QaY6AN?= =?iso-8859-1?q?bU25AoKZQA/5mqxOGc00FkVi0b682yHwkIXFTwZYeNPdSoKbrGT3NuKrJf2a?= =?iso-8859-1?q?d5IGKulcUb+EuiaA6I02DyQaNlp/Hu/8Qg9w/s8qEsPiZAhuKnrG+Cd5RwcS?= =?iso-8859-1?q?dJzJ1eoPJsdmjmepcSzGj4cYsxidpl8+jseQFFPoCNEW+PQINb9pGHJsY5k5?= =?iso-8859-1?q?d8zL5EjeO1kTqWmRP0qvw4dZZO2v6pD7sbimRoJAMOwvG/uEWedGMF2FnX6m?= =?iso-8859-1?q?BwsvuuyfnXJOg8HgqdTPW64oMkU2XlK5YZ19mF0+5MXuMYLTpmIhZPsadpYS?= =?iso-8859-1?q?qCRVfO7dHrT0wyV3ViJnyY3QbCnDqcaMkuDJbcEQLA5QpjHq1NWKilcppCl4?= =?iso-8859-1?q?weTJIQqARdYux/Lc86OiNzFEtvzZ2LqYaV6i3S7B98U5WoTzQ15yrMUGv9Qz?= =?iso-8859-1?q?DHRdTOMC6I+cKcN5i33PK7hHB/OUk61FIiezLX4g/wVQhxVkJot/FP+70RO8?= =?iso-8859-1?q?jhL782XOCBZusldCwLh1XiWC9CtXd9xrZSni2ysFyE9AMCa5+oqdexjb/16H?= =?iso-8859-1?q?edZp3/LlsrliCF49OwKQGeIRfHzS73pK9rRIbCdNwQ1OldFcHIh2BJ9qJmHf?= =?iso-8859-1?q?1luMbsaEnKCzrJCumYGjE20Cm+XqJw10ur3mUcmP62fGScTQIUbP4S2hz2mU?= =?iso-8859-1?q?c3Iu7Tbff+aePIKED5XByccrtEb5RmeyqNEEgIq0iz7mYWAsTjGWD8oX4r3b?= =?iso-8859-1?q?F2jG7pMsAddO0eU/ByFIhzTH4H9Hp5FF1HjRkT0Jxcl/kZZhyKGj81mNlZxQ?= =?iso-8859-1?q?ROtHPUnoNvNDy9+J6u+8dlZhAlAgbb06E5D+SjCWSj0AHKNwb5L9g4kOUtf2?= =?iso-8859-1?q?e6Tf/kEiDdmGYHLy1UJgki1as2QkDW6sWnftwuaVSqNhio8KZWeFNYgMm/Kp?= =?iso-8859-1?q?PP+h1sca4oXB92X0rra0/cOoH58W+welCuctTO2OsIBwAdP/cQClZ6YUZG7u?= =?iso-8859-1?q?yZg07L1nTsTWs/Y6qsMgmAeouyQpy3xQ1m53tYVtapUObXuWya3qQJb0GCcV?= =?iso-8859-1?q?X4+OEemo4FHQo1wbTl4hN7o/UOOJwaL2MGdxTBzD9N+FfYilJNQ05xP8ftoj?= =?iso-8859-1?q?qjE5VJxTKyeNjf0TaBqomy/j3oAGYSE6qhn1e7OY=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5344 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT031.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: b8429fdc-ddaa-440d-d47a-08d95b374493 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: SzWdCR3wXjJdA6AkjQ+StWbxvIxNxoXIPIsRnyFk3HQ77e/AHbeSTF8OkHDcnuxLhQ0dxoO7HmdFOZVTRjaoKjA4HUkj8ynRGQpGy7I6gOvbcNiPMsGE4Sz4PbD8FNcclfCMxsLqZ7P1DtF2RxKZFw9SXgqm0Dy6cDUZ1/hNmylTQlprs/+Zb425RboCbnSW9IkzRAsfL4rz54xjMJVc3HQokN/yb9jzVkcfCb/ucEccYj+i0FJYvYgY8PRegasWN3AiG2Fsgzt1urBa5tCoYudOZveYqn+eBPPFJJtXkJaQon8h5Qg9qsfS2VnVyYL9CDg3TyWtbSjnaOvFyrxPpiNUB3UKlOgE09FptkUuqYtx9wtQnughIdwNyIfXL7YNuNE2eAcLqjXFItOIo27ymKpH9rkdM+Dgpq8w6XbU0KMd01OCssxP/gfxCAQPgAPeLol9u7qlGUdEufGt3BP2/dxAtz7EgrkrGJQJXRVfBq98Jpv0rxKw0XIHnL1Y4ciNrcZ/iyHNKtMf1bLlBZVIgEB0DTBL7i50UGhyaYGVYhsyusPFmeaDp23/VszmLS+Ltr4jLDUoFBnDe00p8k+o7O9N4NgbyFUB/xAlek8eu60j+d1NNPjXbaO5e5leaLdKRrsTocmPuqzgeOA0nGpSXirz1KIYiCoWwVuK3h1UM8bC3HmMCRhdHEsvTF1DmTAIz8eI+yjXddrASxq/RRwXZL0ZhXZdLXBxO0tija9axJw= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39850400004)(396003)(136003)(346002)(376002)(46966006)(36840700001)(33656002)(6862004)(9686003)(336012)(47076005)(55016002)(86362001)(81166007)(8936002)(70586007)(4326008)(82740400003)(356005)(316002)(36860700001)(52536014)(82310400003)(8676002)(70206006)(26005)(186003)(7696005)(6506007)(478600001)(5660300002)(2906002)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 13:12:11.8299 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0b4c20e8-f10c-4b2f-a077-08d95b374ccd X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT031.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0802MB2614 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" v4: no changes Simplify handling of remaining bytes. Avoid lots of taken branches and complex whilelo computations, instead unconditionally write vectors from the end. Reviewed-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 6bc8ef5e0c84dbb59a57d114ae6ec8e3fa3822ad..55f28b644defdffb140c88da0635ef099235546c 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -130,38 +130,19 @@ L(unroll8): b 1b L(last): - whilelo p0.b, xzr, rest - whilelo p1.b, vector_length, rest - b.last 1f - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - ret -1: lsl tmp1, vector_length, 1 // vector_length * 2 - whilelo p2.b, tmp1, rest - incb tmp1 - whilelo p3.b, tmp1, rest - b.last 1f - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - st1b z0.b, p2, [dst, #2, mul vl] - st1b z0.b, p3, [dst, #3, mul vl] - ret -1: lsl tmp1, vector_length, 2 // vector_length * 4 - whilelo p4.b, tmp1, rest - incb tmp1 - whilelo p5.b, tmp1, rest - incb tmp1 - whilelo p6.b, tmp1, rest - incb tmp1 - whilelo p7.b, tmp1, rest - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - st1b z0.b, p2, [dst, #2, mul vl] - st1b z0.b, p3, [dst, #3, mul vl] - st1b z0.b, p4, [dst, #4, mul vl] - st1b z0.b, p5, [dst, #5, mul vl] - st1b z0.b, p6, [dst, #6, mul vl] - st1b z0.b, p7, [dst, #7, mul vl] + cmp count, vector_length, lsl 1 + b.ls 2f + add tmp2, vector_length, vector_length, lsl 2 + cmp count, tmp2 + b.ls 5f + st1b z0.b, p0, [dstend, -8, mul vl] + st1b z0.b, p0, [dstend, -7, mul vl] + st1b z0.b, p0, [dstend, -6, mul vl] +5: st1b z0.b, p0, [dstend, -5, mul vl] + st1b z0.b, p0, [dstend, -4, mul vl] + st1b z0.b, p0, [dstend, -3, mul vl] +2: st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] ret L(L1_prefetch): // if rest >= L1_SIZE @@ -199,7 +180,6 @@ L(L2): subs count, count, CACHE_LINE_SIZE b.hi 1b add count, count, CACHE_LINE_SIZE - add dst, dst, CACHE_LINE_SIZE b L(last) END (MEMSET) From patchwork Mon Aug 9 13:13:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44611 X-Patchwork-Delegate: szabolcs.nagy@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B3543894416 for ; Mon, 9 Aug 2021 13:14:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4B3543894416 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628514854; bh=Ms76dnwfDXRoaSwxFNszvr7aqwCx1x4uftRWZdjmStc=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=MFoS3K0gkTnCBsjcugRl3kEBuvCr0Qb9qMy+xZMDZ2VvaxJt7Go9Lm6u/FY/uh7qb MJvD1FXtw7VwYzpLd/KQekhvkqsN4OXehmeAQ9wz92DXhFW3Aadek/jiib81S+El3v VhDtAgHFGqSZAAdSv3eMRV92t0wLN9oJBV7otjic= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2083.outbound.protection.outlook.com [40.107.21.83]) by sourceware.org (Postfix) with ESMTPS id 26DAF385B83F for ; Mon, 9 Aug 2021 13:13:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 26DAF385B83F Received: from FR0P281CA0062.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:49::23) by DBBPR08MB4505.eurprd08.prod.outlook.com (2603:10a6:10:cf::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17; Mon, 9 Aug 2021 13:13:39 +0000 Received: from VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:49:cafe::1b) by FR0P281CA0062.outlook.office365.com (2603:10a6:d10:49::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4415.5 via Frontend Transport; Mon, 9 Aug 2021 13:13:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT063.mail.protection.outlook.com (10.152.18.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 13:13:39 +0000 Received: ("Tessian outbound ab45ca2b67bc:v101"); Mon, 09 Aug 2021 13:13:39 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: a48a234f18a68194 X-CR-MTA-TID: 64aa7808 Received: from 4b77ab010ec0.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1666AC80-A594-4A03-B599-28C372A7C3B0.1; Mon, 09 Aug 2021 13:13:31 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4b77ab010ec0.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 13:13:31 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Xo+MPYn9EahK9Y/PD4kYyS5f98sRUscKtGPqJQXZIRfr1SZP1vvOMBl/wHsDqfL+SI9izYSt/lZs9lbt3iD6HMH9sJ/d+DJhGln/msp4BniPJr4wKXmh6LVGkQcyIPPdxK3zfUwMtAbEHKtPATcvcMi9oV0JLc3sMDLtQOJd7+GJx/i6Ea4bev7JuJUzc8WMcp6kfp+PWLQaKZbAIkDMM9lvIdDPpMTbHL1CfLbFmbv4htrFNGP67t3J/hy17G7qqx2/JE3moy0m1hubgqzYHUrvEQEj0NleaeJvPK/30Cn9iyux0ZxI+VxB224V/KJhWpYDJdB1n5hDXnRa5ojY+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ms76dnwfDXRoaSwxFNszvr7aqwCx1x4uftRWZdjmStc=; b=ZflchCEkV2XJWI7z5XcXL6dfjzFhY+9nDTw7bwO9J69cQSvIt31FE9vgHDFBg174ydWWIywlL72eqxNLZq9jWAoKJYdfOUZtuEZkAkZsEN3Cs/6se8lInVXu9PSEX8zVLmy9gpI/jJ8PmewqfMNH97yiFonMsNjfgfBp1bE8J4hBJDg6YwcWoIrWm475OADQjdqhNNQCvSQhUeNSeAAEcIeQDK3IqGO6ScmjbS6ZplMcZKXjHFAF27e2lD+OVE/LYfsD3c4tbhc7FZDDb3NkGuvOHMOrEIHO6Q+Gj0GWgjSo2Ozlf4SZHcPerQaLWYCAARrPYzMH1qurgPD4If5JcQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB3919.eurprd08.prod.outlook.com (2603:10a6:803:c4::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.19; Mon, 9 Aug 2021 13:13:30 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 13:13:30 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 4/5] AArch64: Improve A64FX memset by removing unroll32 Thread-Topic: [PATCH v4 4/5] AArch64: Improve A64FX memset by removing unroll32 Thread-Index: AQHXjSA1AzvPW7L65EeaF6sNrqsGUQ== Date: Mon, 9 Aug 2021 13:13:30 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 3fbaf264-bccb-460d-e6d2-08d95b3780e5 x-ms-traffictypediagnostic: VI1PR08MB3919:|DBBPR08MB4505: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:2399;OLM:2399; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: nysDhXBdU91ObCcF4oa24JCD93s/FWayLyNbzyryXpfp2Q7K1GRflNJCGu6C0bes2QPpY8Xp1Yi+gHAxaXO2/3Fss4TdXCbxdiEPpLB1G5/Sm4JHw85cTlL5yP7zULCvEvfLHM3Kz0rjQxGOeSuZTuAHbLpjXYqOn+QtNRiEDHQkzuEKUB78W4xKJ1w8DRiTmCIaeMFJy1vOBaoJ2WXCxhBhPh+eF9YjyMgUL5r7I60SlWyD9vv7J5QDlBe7iO5FgpOSEXCWruR4nepbYzJVqQNM8HbIifichSxh3CbnIJmK2bvI99CPJUJmZIUTRL+L3iuDiUSE81oDJAOgM7VN1XC3m7IZOWr03mgfD1E4n/wtsyzW3a/1Brcuw/ATvQo9yXc8Mlt5mmZ1ByezPWxm+mUJaAwTFyCwtqvTJ7WUBNc0iab8UfYVmyXkWKP5Ul3yHRI1nGZXpB7GqLYbKhqpKFbk2nky90pDSSYaBC6PUsX9DtwJOyRhbnvuRckLSqIaLbBzZMumNdZwmN4xBwSTk2idiqS7rGwVd2TngMmVlj/Bobf9/uNvgpmPW4qitJQ/g7iWbveDJjx0HbArQALQZ7ywwfbzRfkwZ7rr+H11M3zx+eh5y552IQhxDDIk5UmBNRqQCNSzPzfdwTIFMtJltazEw8TAOTWAwptISIxYDOMQWhDKAB6EyTqw2Tg6SmbxH+VXfPt0oaqrDu3iudD9wA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(38100700002)(8936002)(38070700005)(91956017)(316002)(66946007)(76116006)(66476007)(66556008)(8676002)(2906002)(122000001)(64756008)(33656002)(66446008)(7696005)(4326008)(6506007)(86362001)(5660300002)(52536014)(186003)(55016002)(26005)(9686003)(6916009)(508600001)(71200400001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?eS5zqusPEZ+p8Hx88MzI7tq?= =?iso-8859-1?q?VQbyBbX3mvpJjfwX35yGwV2VRK3SOcxFhEB1hSeMSEi8U04D5H7cNTMZn+wW?= =?iso-8859-1?q?fnQDT+8pOObx/uhFqGAkKYvrUqw2AEBFiFUwqqMYXlqUzQxMXD5LDncy3YFo?= =?iso-8859-1?q?DllxNZ5aM44DAuqw9HxKs5zXLE+p5ocngtLYim3iUsavOp13gdMmPCp5Y6Tl?= =?iso-8859-1?q?wnBzFjMcNp6Gcua10Y662JwkuVryQXOZ9v9qxg/YFen1PB11jFIdWlEUrD3S?= =?iso-8859-1?q?4/mMhB8MOpHTIAA81STV6WQQe+usX4s684PjPiURTAtwwzOSgeboe8rCQRkl?= =?iso-8859-1?q?1yd1ucfWjJLPmXxVqCA+YlMRHA5cs/5K/03sy6GU80tIKqReQqpYQ7iSqqUD?= =?iso-8859-1?q?ejO1fKIYmaYHNaDstd+Y1CNNIJye8TiTm/VLhMZlv9bEARmKNrBt9auSeLBX?= =?iso-8859-1?q?wouoaYYMwmEm5cFbIxTWs4FuAYdHFw+h3Ji3Y+MlmH6B8d9K3kbhDdF2qw1J?= =?iso-8859-1?q?HzTVAbW0OXJ1rdrLLt4Qr+edgBxoQTVUpQcr0SjQxw10bkEN5zA9rbuFDETm?= =?iso-8859-1?q?RyeaS5Zr+ZQE4sxEPkVsQHiF8DyJsgs5L4oCubcN5pITpX/70RcdJt3d74f2?= =?iso-8859-1?q?EoJmNc4G1NNOy2JJuAdte67teD3ZFGMqtJj4D4V8VufMLz3ImlLJr4IbuZj7?= =?iso-8859-1?q?TzTxtnqRAajgx9pVgR38covBpuI5wWinqLd8F4Cj8m4uWjnBRMO/yEhNXb3D?= =?iso-8859-1?q?sjeS+0Egxnk0zBVYQ03jQEoVT81C8vRulMHExPU9noB/iAAJ3mxpp/OOcwR4?= =?iso-8859-1?q?SNdhJx9pdtTebi0mv2g+r9fQ3FC6YTqRiBR8mp7G2IXCgkjRBfM/gqMbVOuR?= =?iso-8859-1?q?+NHT0Bxk+yfec0qsW2FodQTK68ARZ3Yy1o+GFOSnUnp4bN5VDaZSQQgvhMnl?= =?iso-8859-1?q?3SHuAL0nk563v3+oGlzssJUSrV9f8Hf+X75xavEJ6qL9ttpyaEdbfzeSbZ6p?= =?iso-8859-1?q?jFMHgCu+mEGha3HZwM5GBTKzhavdCWxNAtzx0WVcpZk1DUBPjkQ7KsQLKUrQ?= =?iso-8859-1?q?mcdCK85euvBTY6rENKoKWXhU7Mz/TiiQkSVFFtH+F1Zc6pDCCXIBG6AqtdQI?= =?iso-8859-1?q?DwdouQwoTBrK8kibSEsn9giM9rgf98zKFr37VRJ8vlH4OL0BGvYRxQZ2IhvS?= =?iso-8859-1?q?LOwJO2LS+8qeYhSzKfn958JxMWjMOo7O+8m3WiqscB4XSs3Jp8zQl7EnU4P4?= =?iso-8859-1?q?7bk9g4+fqbZBdRi3tlPdDNafbJbKNcBLhnF2Ws68teacQGrZf1d4fAyjCUn+?= =?iso-8859-1?q?PgLZ/0wNo+6rib7FUaS2G5UcgmBGsXabVTCDhYBU=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3919 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 0748d01a-9e64-4a4d-f167-08d95b377b9b X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5IOYosstyd+D42rg8XsKLtxmDqDerm9nuEr+mIelPNF25CJbfveVE5nbi+yigU83tIIipnAWizlzlE52VIFXf2UIxZc1pY3YAcOpJboZ8kDsLIlvsfWYnYE6o/oX+NxVjvG5ZlPv4exTU6r5NcShyLjLTEUaTs1kGzxD9Zgr1L/kUwz/9Qk5ZHCbYTIkvJJ7/OYUfzRLr/qAebMGDVCM+kVEJ/NuGb52V0/9tio6iZq7XFIlEPEryZriSdn46ShIip/1x9gpGBkxFFCJZy3Q2fHJwkJtxS6qvLshOZeX3R2LA9qWStbCPNm6OmvxotmhZ7QD4jboXOBzO5BUru9nyD8b+tLSfMhCoFRhUYCKnPvHwoMVxUhSEs/P881Yk8A0e46A6HBvS4P1NNO5GNyuWA6vjIgcpPfJfOwQUs05aZEz3wF9uVRfRvBt0yPcxuwpmaWiQzJTbbLAOLDYeW0IbgMHVdbddxo1aFLsITuJLske26DwH5R+jgaukieyrnOrgonEZUZbImXkrcGD6EcDQM8urf1glbOffQRjS5GZdfUSGEQPBUqDZD5KzUKM+xztZJ3/RYmgclGnK2yAQoglmEeFzUJpXbeZSbo11Erpz26JZp5eZcrmOgTPFNDcKJkYpEpjOeSOAb3er2aeyYqlDKUFtfUnnhBXjo0SpSK5XMhrjfWVYCb8ntxTTZEQQN1p3XEFPmGgtfA7YI8lHdhiiA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(33656002)(186003)(9686003)(8936002)(47076005)(6862004)(82310400003)(86362001)(8676002)(36860700001)(81166007)(6506007)(4326008)(55016002)(336012)(2906002)(508600001)(26005)(356005)(316002)(7696005)(5660300002)(52536014)(70206006)(70586007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 13:13:39.2085 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3fbaf264-bccb-460d-e6d2-08d95b3780e5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB4505 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" v4: no changes Remove unroll32 code since it doesn't improve performance. Reviewed-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 55f28b644defdffb140c88da0635ef099235546c..89dba912588c243e67a9527a56b4d3a44659d542 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -102,22 +102,6 @@ L(vl_agnostic): // VL Agnostic ccmp vector_length, tmp1, 0, cs b.eq L(L1_prefetch) -L(unroll32): - lsl tmp1, vector_length, 3 // vector_length * 8 - lsl tmp2, vector_length, 5 // vector_length * 32 - .p2align 3 -1: cmp rest, tmp2 - b.cc L(unroll8) - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - sub rest, rest, tmp2 - b 1b L(unroll8): lsl tmp1, vector_length, 3 @@ -155,7 +139,7 @@ L(L1_prefetch): // if rest >= L1_SIZE sub rest, rest, CACHE_LINE_SIZE * 2 cmp rest, L1_SIZE b.ge 1b - cbnz rest, L(unroll32) + cbnz rest, L(unroll8) ret // count >= L2_SIZE From patchwork Mon Aug 9 13:15:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44612 X-Patchwork-Delegate: szabolcs.nagy@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 44D223894422 for ; Mon, 9 Aug 2021 13:16:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 44D223894422 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628514989; bh=L/iVi1R3x6VCETU0akmC7RrR28vFuYdCceGOnotz/8U=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=sjY6CbCGzk7WgeMjshGJ1eCPV+LzV4R2ipbJUwfCwkRfMfxrMtQNbPwrS8YA1iu06 R8ofWTAXmjfF0UvAT0A4symbI/XmBHiiDDhLudq5smUoPXc6AqxJgiSy/OUyYyHLk1 LVZhQZzs6JItF8f+o16A/uo2ka87//isSlZe+wF8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40047.outbound.protection.outlook.com [40.107.4.47]) by sourceware.org (Postfix) with ESMTPS id BE8F4385482F for ; Mon, 9 Aug 2021 13:16:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE8F4385482F Received: from AM6P195CA0035.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:81::48) by VE1PR08MB5022.eurprd08.prod.outlook.com (2603:10a6:803:114::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16; Mon, 9 Aug 2021 13:16:01 +0000 Received: from AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:81:cafe::aa) by AM6P195CA0035.outlook.office365.com (2603:10a6:209:81::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Mon, 9 Aug 2021 13:16:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT012.mail.protection.outlook.com (10.152.16.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Mon, 9 Aug 2021 13:16:00 +0000 Received: ("Tessian outbound ab45ca2b67bc:v101"); Mon, 09 Aug 2021 13:16:00 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5205adce920cea79 X-CR-MTA-TID: 64aa7808 Received: from 0368c1a59114.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 21ABFDE9-2689-4145-BFCD-AB9C2670181E.1; Mon, 09 Aug 2021 13:15:54 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 0368c1a59114.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 13:15:54 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=i6Kjox7COIZIIineOO65oYzvUNk5RNe/Vega+H6T6yK/q+gUzSshnnZ8EVOXBISGXvfl91t17rhMm6B5PSAZAw+BUr4RtDYSSCiyyPvDWU0A6eplDRw8ahCR1BGw32zMHgA4eA9YojAO5xRzKlf1EhaSOzf5kQth1Vc9k7ABoJ1WdGdaEotlwZ0N9M2IoGB8JiQBzfLx0tbPvXiG0JoGQl730Jn78s0keB4Iim1z4/YXVswQ1POW5ExVk80/4HnV6AN+FoV6VEhOGiQlzVCTSw+LzZ6vO8i8vfJXrSxR7WUMqrb4+dl++nWsWQ4h8FgQIXXEEjISzHf1mgKRNEAwVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L/iVi1R3x6VCETU0akmC7RrR28vFuYdCceGOnotz/8U=; b=d/zhNxRRmENB3mO2z8Slc0ZMWbtQmWPAMd43meGNLQHeISUT0dwnZj2r2to+AsHLm+uclyP23AR7mCQUtQeVcxYYVdP4VI24PxkVFfDB0ecBWx1UJnI9UyFQtUz9VRrpTq2KpcKILbz204C2M7+dl6ql5k0bgYnksxeihxnDdYdzu6jtGpWjgh+Av4xlfDPufu/JWAwEBzVwNASZCkGSl2kkN+qrFxj7bp+AH97Z6F8X9aztwG3XUA6d1XTpzXc+xT0h3YE0/2QbqpuhrUp+m9lAUL99tDJMCnkUtggVt7C+N3WMk3qoBfiNq+7/m+4eBgYh8WHAfbtopSz4wgcXGQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB3392.eurprd08.prod.outlook.com (2603:10a6:803:7b::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16; Mon, 9 Aug 2021 13:15:45 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 13:15:45 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 5/5] AArch64: Improve A64FX memset medium loops Thread-Topic: [PATCH v4 5/5] AArch64: Improve A64FX memset medium loops Thread-Index: AQHXjSBqQUk1uBbeZkiGnGpCiCyx6w== Date: Mon, 9 Aug 2021 13:15:45 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: ffbe9b0d-22cd-4fd3-a362-08d95b37d552 x-ms-traffictypediagnostic: VI1PR08MB3392:|VE1PR08MB5022: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4714;OLM:4714; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: dc9lCfFWOZoHsqQr80AgIDjvUBbZvs0/cfpR27zVszyj4O5cNSu2hEDhrmZKbrma8K/mz6Md3thlzk8VGECKAYYwoDPnlPYSOBDNViZgU3hUSShfzRD4kFj2IYo8Zr8EbP96pEQE1FC+Zfv5Xrwx/gfQICcvbk3LRa6opIdGoh+S/z0wCl92ZIZVzudp9QA4lv+1mGxJ/MhkAcAsGP8aZIi7bMoKQt0vOXpDip0gJB3xYanwR3s2Y8v2BqPuTT8meZOBSj5kAt+gJGzEbUkB50Okc9E/1Q0azhaCNbui9mMXiA9Ei45Yk6Ial2MOkJkqm8PBhqM7ieRlCuiJdNUUCOkmGGLRk5/sLoqJMwxlLzmpnQul/HrLVjXLDFZJZw/5yq4AgAvfHFH65EPDB+n4NQ+BG5dhggQp3syKlaHkg/PSdzQuCuTP0CAnM3AIq6bZRc2bzcwC0B79YKCo1b7G8j6BIcfZrqYp0mBe3IKVNhb3sna3dDFgiyI81UkbC47+FbsksL9S3m1R5pzGNijxIJAHo0C9PpN4hlxVoOLQo0n+qnJOpfXPt3Su38YOmQitAFEYFNh5zC02HL4Cc1whaNKjmEvrn0jBfQX6hIc5JPdTQocg3Y2agkuRM1NFFj52WVZOXVDvfyHzwJS3pYczTjDjeN0tZQr4R+ZRtNxhAd5caLMLJtXUEiqdI+xAzAfXu6Ue7HIgmwagtFDLhAk3f3KJeMC96vtX2UgcaW40g6nESH0sk0xSl4/ttsik7dDV0ag/cNZxoAwZLdM3aId+kg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(346002)(396003)(39850400004)(136003)(376002)(55016002)(76116006)(9686003)(8936002)(8676002)(91956017)(38070700005)(5660300002)(6916009)(7696005)(66946007)(66476007)(122000001)(66446008)(38100700002)(52536014)(66556008)(64756008)(71200400001)(26005)(186003)(2906002)(86362001)(316002)(33656002)(6506007)(478600001)(4326008)(473944003)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?9qlI6ZIDrlQ5vqX7+5LQA5Z?= =?iso-8859-1?q?KiV+WLwyWTM5ZHYzVgITe1pYWnf9jK5QBl1kJ8I7BBbRhBffJZV38ncfbOeo?= =?iso-8859-1?q?2JOaMl4ulbECEDQ3ztD9gduy/XqwaY9Hw6x8UxYvBAO8L2aYAn4IWEDZCFXM?= =?iso-8859-1?q?cQzNtvnaDzV9eJHvpTWrpmqfctsNisAVuXokrr3CKB15f0pYsOMUb91UkXFN?= =?iso-8859-1?q?5O/B4qgFLWkGmvNKVEOrSm3W7KutqNioSjBaUX1ug+tc202UVGczl8cBK0x9?= =?iso-8859-1?q?1B0oiqfrRvTfH5eZYIChwL1G72uVcmu0/+2kyyl+zps0JWu5my0YCPfZnpcg?= =?iso-8859-1?q?Ca/nFbvg+73eAyNbLqyloDDQ/KOTLfNAyX9nO7PhPjqP6D03C+4UljobLZmN?= =?iso-8859-1?q?qiWs3MWXNFVddi2m21gIlllChM8HTgiXEbj4JFIlUsfnRF5a24UsNaXBtp31?= =?iso-8859-1?q?WmIvLhhhGQ1Hs0ZUHF0bOmADgLWEd5oN89jkfXyOFt23R/t523JxMd/ycku/?= =?iso-8859-1?q?mnC4JKfrFqDQAWiyWV1qrbuGabs/1PSQUbyb2omtT009zaKcGJjyVL0uFiOh?= =?iso-8859-1?q?CHtaLIFui6caXuJWnSswmjVDUvszRJgu5zVArB+YQ2nWzXOsVzcAQeMp/SqQ?= =?iso-8859-1?q?AQB5tPQHpv9r6D6Y+YLh0i96ccR/2Ks5wZ22EoGMkq/LLlVFIEd6c/M61nA2?= =?iso-8859-1?q?1lOz0tf5fxhwRXK7gSzA4pbOqe0UhxAGHB86NE8iQNgoWOYQFrclxm/JqnWr?= =?iso-8859-1?q?/AQ3RM6Rlv4iv5SHqrvAMstrAcWgeiqPm1q2ZpeLt8Kl7IksB9qA9sZ39Z9A?= =?iso-8859-1?q?xHP6SO098AnUgRbQiglcFg++figvpE51mVQJV3mAAwrogrTQm3kJHhMhvHVW?= =?iso-8859-1?q?9aRgoOlBuuJdOjYDBDhKG05S41Ym4Rd/zaJm6NDamM+lEwoZOXF8PILfwa8G?= =?iso-8859-1?q?nJswD7Jt+gKBZEmqfgFyluhmf0vLNEfqsMHBRl/J2p9XFlYi3rxjDm94egmu?= =?iso-8859-1?q?QrrpE9FBz2pJi7ZnaAF5+v8drf0BSDzi77RwdvaCxc/8zRyx9bcMZGIVV7G4?= =?iso-8859-1?q?uKov+jQegE9Q50id7DqcN4ebwPB3LY8CkhbBXJSDRn3kLswi+2o0mGNEbla7?= =?iso-8859-1?q?N2p9jOso4f5NFw4DzOmRwwx3mMTh9Pev8gnmDhG+LfmOl/j49RThefUfF9rE?= =?iso-8859-1?q?l0xfcYwKZe9jYf8PYFX9TffvG1uhyXdeeTewim7/3igzlJK1OhXjxdiHG21M?= =?iso-8859-1?q?41A5WbP1Dznn0BC2fxGT2SMRRZdA/F0PG4yzdnV6d45em7du7GRLt+soN3Lv?= =?iso-8859-1?q?Dg9qcy9EmVgQIerRVFcKaduyiAfzjbRQJ/4c8L8I=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3392 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 4e156455-d941-45d9-5f77-08d95b37cc58 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: jNPEftvvBz9UP/vItRnWP8mVzp043pE+fb2vK8mDlL63tLjOOt6jR++KP3G2t2DnBI7WIdc5EN4jZbiSVkaiT5YhgJhdv1O29+as9oHt9ZOgasxr1UEDqZ1id04x5+Z+7meQHvwuVpUVrnVZNwq7e4nS3c7D77e+FlYN4OQBiSgptABlogLaEmTSCjsxOnngfJxqM+izyOiS+EMlgLI9xmETG9EDwXM7C2y+vpAKckJxmFvYeqS7rfPsUtqHuKiibyvEqu7s24E9yeIJIBXpW8XIdkEcb8oxTB3HqgRayZQI/ely/pRXKvkWOIsKm70JM0ruVtaXoMltQvE7Zj4lAi9lvPpbVZWVdkLQw+QIoAzuUZB2b12x/WmfaH9kuNGKKuAXOvMy2J1M9roDwR4n2ekZU+iV48CNrWrA8E37fSESHA0flGMbj25zzXfbGAgvHspKphTKH4qy/I8XmnSznm033OhMJQuE7nSHVQg1pRreM3ZOBaWh5Tx9P1flN8Xi6jSTNPU1YS2LPyLdMXmxTSRKJixR1Z7o9sa90P8f/MVD33XfFFeE2kl0mFhbBLCbsWl3DpA6FGS3es45T/7+I78GJ45AjJvHDDcgrcuO6/QMuNtobCGB7WHGy38zahUMPhL4bw7kcl6k0Gl0QspZPQGZxh3/e3rCpftT8eUD1VlxGzwY7awgByDy8ozpJjjDZeZVh30APuy3Ojhjql8Detq06HVWv5Pdvk+xq1tpGAPKifNQ3aaUpan7q0RacwTK X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(396003)(346002)(136003)(39850400004)(46966006)(36840700001)(356005)(81166007)(82310400003)(8676002)(36860700001)(86362001)(8936002)(26005)(478600001)(2906002)(316002)(186003)(82740400003)(336012)(33656002)(52536014)(70586007)(70206006)(6862004)(9686003)(5660300002)(4326008)(6506007)(7696005)(47076005)(55016002)(473944003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 13:16:00.9383 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ffbe9b0d-22cd-4fd3-a362-08d95b37d552 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5022 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" v4: minor loop change Simplify the code for memsets smaller than L1. Improve the unroll8 and L1_prefetch loops. Reviewed-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 89dba912588c243e67a9527a56b4d3a44659d542..318c6350a31e0fad788b5f2139de645ddc51493f 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -30,7 +30,6 @@ #define L2_SIZE (8*1024*1024) // L2 8MB #define CACHE_LINE_SIZE 256 #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1 -#define rest x2 #define vector_length x9 #if HAVE_AARCH64_SVE_ASM @@ -89,29 +88,19 @@ ENTRY (MEMSET) .p2align 4 L(vl_agnostic): // VL Agnostic - mov rest, count mov dst, dstin - add dstend, dstin, count - // if rest >= L2_SIZE && vector_length == 64 then L(L2) - mov tmp1, 64 - cmp rest, L2_SIZE - ccmp vector_length, tmp1, 0, cs - b.eq L(L2) - // if rest >= L1_SIZE && vector_length == 64 then L(L1_prefetch) - cmp rest, L1_SIZE - ccmp vector_length, tmp1, 0, cs - b.eq L(L1_prefetch) - + cmp count, L1_SIZE + b.hi L(L1_prefetch) + // count >= 8 * vector_length L(unroll8): - lsl tmp1, vector_length, 3 - .p2align 3 -1: cmp rest, tmp1 - b.cc L(last) - st1b_unroll + sub count, count, tmp1 + .p2align 4 +1: st1b_unroll 0, 7 add dst, dst, tmp1 - sub rest, rest, tmp1 - b 1b + subs count, count, tmp1 + b.hi 1b + add count, count, tmp1 L(last): cmp count, vector_length, lsl 1 @@ -129,18 +118,22 @@ L(last): st1b z0.b, p0, [dstend, -1, mul vl] ret -L(L1_prefetch): // if rest >= L1_SIZE + // count >= L1_SIZE .p2align 3 +L(L1_prefetch): + cmp count, L2_SIZE + b.hs L(L2) + cmp vector_length, 64 + b.ne L(unroll8) 1: st1b_unroll 0, 3 prfm pstl1keep, [dst, PF_DIST_L1] st1b_unroll 4, 7 prfm pstl1keep, [dst, PF_DIST_L1 + CACHE_LINE_SIZE] add dst, dst, CACHE_LINE_SIZE * 2 - sub rest, rest, CACHE_LINE_SIZE * 2 - cmp rest, L1_SIZE - b.ge 1b - cbnz rest, L(unroll8) - ret + sub count, count, CACHE_LINE_SIZE * 2 + cmp count, PF_DIST_L1 + b.hs 1b + b L(unroll8) // count >= L2_SIZE .p2align 3