From patchwork Thu Jan 12 16:01:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 63112 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E6454385B532 for ; Thu, 12 Jan 2023 16:01:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E6454385B532 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1673539310; bh=135ch0SuuTXpcFOgIBEr8Yll9lQ9K4o+RRu+Gic/0oM=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=O0ZCe4RF2k4TmBgXtjDP/DpIGgyGHZiWwd7pvxYgYzp65BFKW7GuP89maZ+AHARlN Rs5wTeg1XsKVtN4b59u1QfE6KWE+RmF7nuyHOC+pWa9HuZoanfhHLr1yqdvJ5arlsO JCwjDv2EogZHGsjXRFEYBAfdUjlXSpUrWQm+jx3M= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2048.outbound.protection.outlook.com [40.107.8.48]) by sourceware.org (Postfix) with ESMTPS id 2E7C23858C66 for ; Thu, 12 Jan 2023 16:01:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2E7C23858C66 Received: from DB6PR0301CA0051.eurprd03.prod.outlook.com (2603:10a6:4:54::19) by AS2PR08MB8879.eurprd08.prod.outlook.com (2603:10a6:20b:5f6::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.12; Thu, 12 Jan 2023 16:01:25 +0000 Received: from DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:54:cafe::f9) by DB6PR0301CA0051.outlook.office365.com (2603:10a6:4:54::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 16:01:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT057.mail.protection.outlook.com (100.127.142.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.12 via Frontend Transport; Thu, 12 Jan 2023 16:01:25 +0000 Received: ("Tessian outbound baf1b7a96f25:v132"); Thu, 12 Jan 2023 16:01:25 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5f44043ba70c9c17 X-CR-MTA-TID: 64aa7808 Received: from a8ffcf57720f.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 3C68AC42-F33C-4F27-83B0-DD2B474A4722.1; Thu, 12 Jan 2023 16:01:18 +0000 Received: from EUR01-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id a8ffcf57720f.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 16:01:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WKZI7VNIQcpjhGot5h/ZHe0VCTjcNNh/7O0E6wWr6g/aYK8P+dx7VuHtOFj8xUatrjndS+0SspItlivVnMTyVbeLlTMAkjTKPRb/R5lZvvQG/arDH1jP7B2o7gTY2vwFAJyqW6HoBW4dHCkDQ/iDtClkYh8+TVza0zf0aY3o6zE5Ho9y8JQDqRFiQtpuZKJ2aKCx7LfNuIunJnwfmR5/MrtKtzoBXyoQxrIhoZwLplao3gpTXqPf3VzAg3b2F+aiY1KMYfAM9GCiCxY39tS6qE7Qsh79umGOnjku4B27t+OxD0zdVGeX0HQd+5/w5JjDHhbe5sTuXxC1TQHYQPLcnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=135ch0SuuTXpcFOgIBEr8Yll9lQ9K4o+RRu+Gic/0oM=; b=j+p9eyQFCd4kSmwXsmeRkcFQBrjkIfDMu94zPiEWs0ohTTSje2oun928NIsrhE0N/x+NXxpR1EM4bHnx/f4BakOpHh15NsSHSQrFsYMpEBrn3kky+GUTL+2iFw24V6XhXlu/1HVIlDr/3pSof0gNbNHENjPWnNzR2NJfNVSNBbWRZK0qPAN6m3BfVBgn/buqbCSNskjyyOngJC8mgTjpTlCk6XEw2vyPU8G93+GIIm1JFh5C0Vv4gDUFzm8goxV0McXcbFZvP1navOJnvtiPwEUrsdBOP1Ckd03a0f7XeB1iolBpeKYF31UNdDxgUzzjzYy7sx7lnpwhREselaF5Zg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB8263.eurprd08.prod.outlook.com (2603:10a6:20b:552::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 16:01:17 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 16:01:17 +0000 To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Optimize strcpy Thread-Topic: [PATCH] AArch64: Optimize strcpy Thread-Index: AQHZJp8COSI0a7DNTkOPWyEIQsdOfQ== Date: Thu, 12 Jan 2023 16:01:16 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB8263:EE_|DBAEUR03FT057:EE_|AS2PR08MB8879:EE_ X-MS-Office365-Filtering-Correlation-Id: ab6907a5-c667-406c-a134-08daf4b64238 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: P7pHfM+CkBdHwN/hVxOueu/Ey+TG63mEBdjiaZNqVgWZhb1lhr0iB1iwTaH2nhcO2vbYz5qdzESwHrfj0Eo5GHYQ7HjV72f2P4rWLwwRiaD953o9yqb0qONxwgGhLCoJYsLVrTkYFjxWWML/uc9oH3OAKJARoK8treryvKN2ujekf2FDIbF0M01iMHsKOntreXaeoQhqE8WxskPo7IuED8adn0+SGZ36riqxjmrt84lrYOZChwIGQcFNJkkmGueU4jdPJLGsSGq5NEXymPo/U6a/8d3WMPU1JRO7OlCMGeD0+Rgu4ytkNI98XRW1UnW2w69JPZDuk2L4boAZCnCw04+dje7cw6B/RdC6xHyz01lKA1VzpMtG9Az8iQmD/Hib6YnRk5TUg95PRgRn/XfpklGoyBbIO5hG1/jqLM29efeOCSiFbUnjmeQAsiEZkNRXOtH+k4Go8GPjH22G+X6peEyggNJGPJFIQTYxmxG/Zs90St2iAg1+HuXxibyWEYkcgJfVEDtQQDqYDMVkAQgTOI2VpBCySUIu4s8nrnN+2nU/WQPPcRzGNu5ICN2DVxWh5Bc/WGBJiHEnLAyQnZ+fCrDZ7E3JvTfJwb8GbfvwHJv8moLu3n2dTNUfL+5vRvsu2k4d+WOtqcmPeDCgY0DJ+kRj0c69sEG0Ya/bcVHe6NdNxfJ4i7yYP/zqivN7sevKze6/u8c6VcjH+gBSSuxyyA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(346002)(376002)(366004)(39860400002)(396003)(136003)(451199015)(86362001)(186003)(26005)(9686003)(7696005)(55016003)(66476007)(478600001)(91956017)(4326008)(8676002)(52536014)(66446008)(64756008)(66556008)(33656002)(76116006)(66946007)(71200400001)(8936002)(5660300002)(316002)(38100700002)(6916009)(2906002)(41300700001)(38070700005)(6506007)(122000001); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB8263 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 64537fd2-c439-4425-c38c-08daf4b63ce0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 3qw+3a97d738Ygd/ZYymlWXgbgJcEFsWrfgxDz+OWrebxm2uQjR/WIYO+PiYvFJG+r1BMQW0wBbd7KaeC37+7455ZIRXu/P9FPj0I6ixh5MuLAcLmUNW2ra9fIcuRoQLWT7lBMGNVG33XnWHQZgL4Z7Z491TLwjssBYGUKtvWZSiJcd8ZSulP82+tFuVIQP/EpqWzmLNid8hUwDOv47vYzCzvuJssj/1nB7sEhnyoS/KR3V81Nzes7JDCmEVnORJtashQY5YKEp9aUxzGCJaNraCxG1q+Fp8A292VS5jvRqBjU1WXBiM87K3a99DINdjroeH06cTS1WEBOH0FEuEsFDmlpa0mK3NR8wLMIdkfYYnc0pI4QgbcijDNb8mrbifY6VakLK+o60T70UB9sA6WuB6FxQzUvQlxa+zRraVV+Q92gwcyMibGqsaXkkpxVppULGu4v4CVogZNwBGTqSah3NveFHLJVhLGb7HSNV7aFJf3z5Uee9V0vspoAqc23Uad/u47G5fqq6Kw4X3tV3o3BJkK0xrWYcJK4WdTTlK79Vcf+ueczSMhAiDquAJ/FpLQ5bTp0PJ9htsFVBU6RPImxXZTg7XEkUx1BjFBLQNjNLsMxiyc3c9FP9wNhz6LVQPDBqAidvzjEfhSrzrlerGODPxvNQOwq46MtRiGaRcn5fgyODr3Ia/pH3E4nlhrGbxW6livWY0F2kH2O4pQ9/nig== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(136003)(396003)(39860400002)(346002)(376002)(451199015)(36840700001)(46966006)(40470700004)(7696005)(70586007)(70206006)(6916009)(8676002)(336012)(4326008)(478600001)(47076005)(41300700001)(52536014)(36860700001)(5660300002)(2906002)(6506007)(8936002)(33656002)(82740400003)(316002)(81166007)(86362001)(356005)(9686003)(26005)(186003)(40480700001)(40460700003)(55016003)(82310400005); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 16:01:25.8423 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ab6907a5-c667-406c-a134-08daf4b64238 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB8879 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Unroll the main loop. Large strings are around 20% faster on modern CPUs. Passes regress. Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/strcpy.S b/sysdeps/aarch64/strcpy.S index bc14dd9f79d1d0d17727cad522761807e7eef5b8..8b045a92c2b329c6351d523d9375a799d38d168e 100644 --- a/sysdeps/aarch64/strcpy.S +++ b/sysdeps/aarch64/strcpy.S @@ -30,7 +30,6 @@ * MTE compatible. */ -/* Arguments and results. */ #define dstin x0 #define srcin x1 #define result x0 @@ -76,14 +75,14 @@ ENTRY (STRCPY) ld1 {vdata.16b}, [src] cmeq vhas_nul.16b, vdata.16b, 0 lsl shift, srcin, 2 - shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ + shrn vend.8b, vhas_nul.8h, 4 fmov synd, dend lsr synd, synd, shift cbnz synd, L(tail) ldr dataq, [src, 16]! cmeq vhas_nul.16b, vdata.16b, 0 - shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ + shrn vend.8b, vhas_nul.8h, 4 fmov synd, dend cbz synd, L(start_loop) @@ -102,13 +101,10 @@ ENTRY (STRCPY) IFSTPCPY (add result, dstin, len) ret - .p2align 4,,8 L(tail): rbit synd, synd clz len, synd lsr len, len, 2 - - .p2align 4 L(less16): tbz len, 3, L(less8) sub tmp, len, 7 @@ -141,31 +137,37 @@ L(zerobyte): .p2align 4 L(start_loop): - sub len, src, srcin + sub tmp, srcin, dstin ldr dataq2, [srcin] - add dst, dstin, len + sub dst, src, tmp str dataq2, [dstin] - - .p2align 5 L(loop): - str dataq, [dst], 16 - ldr dataq, [src, 16]! + str dataq, [dst], 32 + ldr dataq, [src, 16] + cmeq vhas_nul.16b, vdata.16b, 0 + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b + fmov synd, dend + cbnz synd, L(loopend) + str dataq, [dst, -16] + ldr dataq, [src, 32]! cmeq vhas_nul.16b, vdata.16b, 0 umaxp vend.16b, vhas_nul.16b, vhas_nul.16b fmov synd, dend cbz synd, L(loop) - + add dst, dst, 16 +L(loopend): shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ fmov synd, dend + sub dst, dst, 31 #ifndef __AARCH64EB__ rbit synd, synd #endif clz len, synd lsr len, len, 2 - sub tmp, len, 15 - ldr dataq, [src, tmp] - str dataq, [dst, tmp] - IFSTPCPY (add result, dst, len) + add dst, dst, len + ldr dataq, [dst, tmp] + str dataq, [dst] + IFSTPCPY (add result, dst, 15) ret END (STRCPY)