From patchwork Thu Jan 12 15:53:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 63106 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 21FA6385483F for ; Thu, 12 Jan 2023 15:54:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 21FA6385483F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1673538844; bh=KcTDjvSbrraIIHpdRcjTPiFnQu8HSw8SuZ9TT1R7ajo=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=b+BRChLrUUIlT0a6aqN0mwfNhVALxwgIDtlvocR6xz1ozo2zNVaD1KMZ35tMCIM8C /sxdzcL3+nC3Y5LQmgzZT6JsNc/hCSE/1GSTj/Y4ayzBhDU/sZzVJpAunZ/yC8l25G gL3qcrBgOKy2khaf7oPn2O6DbTwsYcUdr12ltwtI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03on2056.outbound.protection.outlook.com [40.107.103.56]) by sourceware.org (Postfix) with ESMTPS id F40043858D35 for ; Thu, 12 Jan 2023 15:53:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F40043858D35 Received: from FR3P281CA0005.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1d::19) by AS2PR08MB9415.eurprd08.prod.outlook.com (2603:10a6:20b:595::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.12; Thu, 12 Jan 2023 15:53:24 +0000 Received: from VI1EUR03FT032.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:1d:cafe::5c) by FR3P281CA0005.outlook.office365.com (2603:10a6:d10:1d::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6023.6 via Frontend Transport; Thu, 12 Jan 2023 15:53:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VI1EUR03FT032.mail.protection.outlook.com (100.127.145.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:53:23 +0000 Received: ("Tessian outbound b1d3ffe56e73:v132"); Thu, 12 Jan 2023 15:53:23 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c340fa278aecfb0d X-CR-MTA-TID: 64aa7808 Received: from 9c56c2b8b9b9.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 6581C858-0973-4977-B80F-1E51F6C75ACF.1; Thu, 12 Jan 2023 15:53:12 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9c56c2b8b9b9.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:53:12 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ExXznH4aZ/mX16q3F2jtHD+MoA3tDzrnVn4/Yhz8F+pSajtQpiKbRif2hPlbFaiH34AW39a6da0HBSLC4lS+FUffuwRxNWE9xEucxfYSMSXc1UuiDVoRH6BQvYg1vD8+fDpBo9wUj1uIe+Yi6ERkTL+VV4oGJsVeZNcHcs4MFYoxC0t5ej63H8kLQv/jopwxYiVP3X/jJrJBSX28eHVRYejO0SMPuJ/jI4O6eWU+8YUAXY5z9NZCyhzeiRKi+Hk/fMoy0x8VU5a0Pwvi5Hd/G5J9e0QUplfWCgmTlEEPWseXsQk+Tg4DD1VL4o/BTZeSshKFQwzxkkWhKZAe7YyUKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KcTDjvSbrraIIHpdRcjTPiFnQu8HSw8SuZ9TT1R7ajo=; b=j1JYSs0Xpa+gYQ0qrWPxHn7pSYdzZULhD8fi0ksOituoyXL06+Vm8QN8gN0JyvRIRF3pfsGNwznhQYmaFvRwgrLCo9AJYPl45rMQe9Vj9KAGPDk6iaulR8VieBr+WWZR/yEiTig0Mvc2HY2ElfRm82JcrjDXwFEOR5ydiG/Hcz817kVyAl+rVISMbPw6Kdoj8f342kdmzZfIzJr6F1z4DF59KREhKY5ETZ023WfWE1OJY0xsXSWyaLPwRidKlWGDh49L0OPRz4RoAqxjS2IUvGf28mlJvyBwCxC1sWZ6gEY1ZRuMw+4W92g7NhefvYUnhf3T0DzqxIaswibDDclB1Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB10111.eurprd08.prod.outlook.com (2603:10a6:20b:62d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:53:09 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:53:09 +0000 To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Optimize strlen Thread-Topic: [PATCH] AArch64: Optimize strlen Thread-Index: AQHZJp3Wf4+/HJBvoEyQJPlZh5uVSQ== Date: Thu, 12 Jan 2023 15:53:09 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB10111:EE_|VI1EUR03FT032:EE_|AS2PR08MB9415:EE_ X-MS-Office365-Filtering-Correlation-Id: fb1ed7c6-77c8-4e95-d39e-08daf4b522e1 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 4hf8hIaFxFm3N6LcvfL1SQhCT7ixkFNq4JGAuVwUBMprSaRB/AcEXgGBrCU1h8ShGVjuRQq8/RHl1f26JySetP+NDomlsEnZaW/NQX2MupELFlKhvmS97kfa05cAjN/GnLQnDdGobxzhMO6OvvV+MRxnEwPJDdZFJKuQOFumGnF3eo3dRcyCl8LRARN5U+DX1ytsG16thJMrGasZKvpxF8uEjwrpoG77qm97c/r03RTIFmMjYaBPinlDjVIOGm+epzdxOVOcvK/ERM0KLxzL5VvCrBQPOOkWm03woD6kB8gRWrgWXtaANzgbKIXAdmYViqR6udyUd5rFp7wSB4Goxomagjx5GrItK673kafw64/eJ6g3O6NHObsrN44y34I08en6UZPysp41/bSK35SDfbo1r1ub6zdfi3G76k+N2OG4x7DqtxsxenPrphKx8Qo5T/cVrbNYZIvMLeD8LIcBRO8g4tOuPe55/7MDV88FHK4Z6N+rSe/YM0F4KYhThhge0PUtlkEu//9TD4DkvDgSaBevjc9cGJ+8mFZ+2FKOD00bnqUVaAXPAb+OWgN+cdjSNLB1ILgpR6FEOcMN0X6FdosKMY3OhNUvrrHCZdidtvfV7H5TkVo4G0m893bVBAGU3g+HKg3KxlcgrzoMWvrJq7S5+5OBAQahanP3N4WLUFU0AZmwp7bVNd7QyJWqkZPXUzXn2AVvsyiV0Rq7Q5ZVKg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(376002)(396003)(39860400002)(366004)(136003)(346002)(451199015)(86362001)(2906002)(5660300002)(8936002)(66446008)(41300700001)(66556008)(91956017)(33656002)(66476007)(52536014)(8676002)(66946007)(64756008)(76116006)(38100700002)(83380400001)(4326008)(316002)(55016003)(6916009)(6506007)(26005)(9686003)(71200400001)(186003)(7696005)(478600001)(122000001)(38070700005); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB10111 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VI1EUR03FT032.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 39547d8b-c212-477b-8849-08daf4b51a38 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: t99kTbp68BEpppJCymKOD2VL+cDjagnyrgKvDXhzP0yrxmF4eVgACCWBtvvRd1AeC0atUESJn31tMhg/nJ+dsQaSHEH9F5EOFy8hj2OED2ssGwC4UbdQvuhgshcscHQjsUaUZlNccLsc8CoPRhHCXcGXIsvCdEhqSDSmqFVbhaIOKziDv/TU6LvPj/P+6wlRo8qWoEtyfgdomYuZH0L6QER8haNu3V5S8Lrh3QFUdZuVPiUeEuu9aiE5y63ndzw3kQVFFv9Q4yZBTDTpCVPMsgDiM11wimq7gqIQnI2LW48fZv55t0n3a7avVFSInD2yCKZ2SpgKdvjsIAW+FBEwC1br+jqNo5n0U2+t7RrPAubjdrR9XxRSqnvngvPQHbpYKcormVfXUorkSpi85NPx9SZOf10WYWCGRJU3enScaDVwzdqgo95NSdpz/ycvLowJDnzz+iQnFTJcAlUU9HlrBTIMrPGq96TQj9+YZUwrp43TiQEN4l084J5MPpXTwICEIzcJsod/gK5aSjZiazbxXjLIYs+KwohcgIb14xdAGasZhm+NbVcIV5qNXAoHNvAXpvMldMg8Ss9F25aVX/O0gQj0O5pCqPBTRfSllwpZpeDFdFwdAu5ML5L3WSpQmqXdl55IDEUVxLVrihJeaYRba3SDJiq1X35CvpPMvlW3K3DDT6Wir9Mv3IbuRt4iWBqFahSFsw9fhaGq7in2HBmcbQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(346002)(376002)(136003)(39860400002)(396003)(451199015)(36840700001)(40470700004)(46966006)(8936002)(70586007)(70206006)(52536014)(6916009)(4326008)(26005)(86362001)(41300700001)(8676002)(5660300002)(186003)(478600001)(356005)(2906002)(40460700003)(7696005)(33656002)(316002)(9686003)(55016003)(40480700001)(336012)(47076005)(6506007)(82310400005)(83380400001)(81166007)(82740400003)(36860700001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:53:23.6735 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fb1ed7c6-77c8-4e95-d39e-08daf4b522e1 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VI1EUR03FT032.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9415 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Optimize strlen by unrolling the main loop. Large strings are 64% faster on modern CPUs. Passes regress. Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/strlen.S b/sysdeps/aarch64/strlen.S index b3c92d9dc9b3c52e29e05ebbb89b929f177dc2cf..133ef933425fa260e61642a7840d73391168507d 100644 --- a/sysdeps/aarch64/strlen.S +++ b/sysdeps/aarch64/strlen.S @@ -43,12 +43,9 @@ #define dend d2 /* Core algorithm: - - For each 16-byte chunk we calculate a 64-bit nibble mask value with four bits - per byte. We take 4 bits of every comparison byte with shift right and narrow - by 4 instruction. Since the bits in the nibble mask reflect the order in - which things occur in the original string, counting trailing zeros identifies - exactly which byte matched. */ + Process the string in 16-byte aligned chunks. Compute a 64-bit mask with + four bits per byte using the shrn instruction. A count trailing zeros then + identifies the first zero byte. */ ENTRY (STRLEN) PTR_ARG (0) @@ -68,18 +65,25 @@ ENTRY (STRLEN) .p2align 5 L(loop): - ldr data, [src, 16]! + ldr data, [src, 16] + cmeq vhas_nul.16b, vdata.16b, 0 + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b + fmov synd, dend + cbnz synd, L(loop_end) + ldr data, [src, 32]! cmeq vhas_nul.16b, vdata.16b, 0 umaxp vend.16b, vhas_nul.16b, vhas_nul.16b fmov synd, dend cbz synd, L(loop) - + sub src, src, 16 +L(loop_end): shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ sub result, src, srcin fmov synd, dend #ifndef __AARCH64EB__ rbit synd, synd #endif + add result, result, 16 clz tmp, synd add result, result, tmp, lsr 2 ret