From patchwork Thu Jan 12 16:05:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 63113 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 06DA9385438C for ; Thu, 12 Jan 2023 16:05:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 06DA9385438C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1673539555; bh=nuVoLhqz9aKlScqoVvRW7VyxjdQIKLpgBEjMJcVSkC8=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=dIAAng8o6B5G6I+1LvTd0CdDFqThA+hs2aK7aFAIVoZjfV7Q0SG5bAUKtfepz3d0J qLUqXVTOWKIoacXIhUDFQXmBTZVe+RlSG8bTBd5ETOgke5p9XqX2y4eDkviHuxAxZk ZXq2T4/ya0Wh94Hg8jyneMljhCotVgLF9HOBAfFw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on2043.outbound.protection.outlook.com [40.107.104.43]) by sourceware.org (Postfix) with ESMTPS id 1984B3858D35 for ; Thu, 12 Jan 2023 16:05:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1984B3858D35 Received: from AM9P193CA0017.EURP193.PROD.OUTLOOK.COM (2603:10a6:20b:21e::22) by AS8PR08MB9720.eurprd08.prod.outlook.com (2603:10a6:20b:614::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 16:05:25 +0000 Received: from AM7EUR03FT034.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:21e:cafe::6c) by AM9P193CA0017.outlook.office365.com (2603:10a6:20b:21e::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 16:05:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT034.mail.protection.outlook.com (100.127.140.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18 via Frontend Transport; Thu, 12 Jan 2023 16:05:25 +0000 Received: ("Tessian outbound 0d7b2ab0f13d:v132"); Thu, 12 Jan 2023 16:05:25 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5efcb6ec45b4201c X-CR-MTA-TID: 64aa7808 Received: from 67f44b20ea9a.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id AA2821A1-7192-46DC-9825-91A9BA646FD7.1; Thu, 12 Jan 2023 16:05:18 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 67f44b20ea9a.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 16:05:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=f7aq8bBuQvmCUmDn8hiMN5rjqNpA+TrVvsvHa549q1x7qRjeCfDmA2wkRTi9KAZydHcUa/N49qJXF4U6Yk2tIY7tWN6CK131TeHQ3SHLnggKG9uwPmH1Z6ZcIZFo2W78RJX8uIDM8wCVz88RmWjpXTm15gZIgQEoaKBnxhqMuVsYT0ExjX2iRrQiWjeAkzfg3ZB1hzLb0zoFxsK8hqeUZYbZ3r37wn5wi4f0pto19pHOplutTrfkpxO/eVkZYJxA9Y1BgMpJCg+FjG2XjQKIuyLk3s1y1yXE+dayHOq452cvEROyaTcgsdetp39h3Q9RzKE0V9KPDQLZEnnLfQ61mA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nuVoLhqz9aKlScqoVvRW7VyxjdQIKLpgBEjMJcVSkC8=; b=NXzKAuOn992dbA5tv6JD7X0r2vmOoEMH6SHDhy32zUYurWKBVd4d/6jE2uiEz6zp5jvkCpoPLhvm1DdUpxXuiYEXBbmqfLH+Up4K08/WoLek9wgSAH4K5MLtSPqOu9yG2OkHMXTIlLRhr5xCPZHMdjBGqgTbDcY51vUH+Eli8c5tWvVoxFMr2c2oRPNiFs9121lsitxiM/PXl1CvhsV1/DhzMHeikGA0dlaKOt35jnXj8Gg+B8EFjP8Eqt26zfXCEmiOHyvQT1J79C3n2FXONm9EHOC0cqSWhSqyzACYNfbcRZhBv5ntI7NCCE4c+9IJUK8UypJZYT0JCLyEMrhD9g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by PAXPR08MB7382.eurprd08.prod.outlook.com (2603:10a6:102:22d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 16:05:17 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 16:05:17 +0000 To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Improve strrchr Thread-Topic: [PATCH] AArch64: Improve strrchr Thread-Index: AQHZJp8tcnbqRH/baE6NZjmkFkn+Vw== Date: Thu, 12 Jan 2023 16:05:17 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|PAXPR08MB7382:EE_|AM7EUR03FT034:EE_|AS8PR08MB9720:EE_ X-MS-Office365-Filtering-Correlation-Id: 18990b4c-e330-4b38-262a-08daf4b6d110 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: f1yLWXUTfRJ0sph3lqNofidH+g6CGcJX1pHf5jZDfEGrT/RKrlStJnhQ6shXerb7KHDOsTdk13cphyureoV7JF65McAEUrgfVJ/6kbC4EWVDMlQgt+CLaeTkga082WvVdVcDI7Kr3vTfnmzz1ibGscrEXjyTyKj6DPk7hT3AsOaUN0lpTV1o18mdOvp3da3K8ryRsu18tx8WEvUDhzrlddZ2aJcf6NRqzdfb4oign+1Yrnahx1kruQW6in7enYbKmEaOHJEPci1iR2VXPIiIRjtH7UCVKKwjG6v8Zrba2M4VK/t05GlFYY+AaS/ChhihfCKLWYv5NzmeKN8LiYtt9OQ2VN03iSgvXFdM/5g5JlEfow86MTqAAxH02pxUPEjptKMALa6ec1fwgPWBha4TKrEtp/GH1X97VM1IcD3jb8T/r9DoqlPRSxdb+2T+IWeOlwFBv57sQAGyWyIGtkLRI6Uxs7cbQftymgmQsLXOzRi13jb9Cm2cX5XJ26PwDUijoYxDJdA36/o+qk02rme6Fcn+1Pp1SDvq8OjJGzN/LGqERnuI1pWrkjO16aC5nAdTO3B74n0J14w7JbjbXGpGaTRxu3HtZyepezQnSJhO/iXoHTlX3VcDE2FYBpNRnrfT9mw3YhUKiMFEzNMmgLkxiWWET2OVfvFpLUgAU0WOamaL8f6jMkOLJWvYqxrJfWRPxVK+PAHOUMxW+EF5BddOvw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(346002)(39860400002)(136003)(396003)(366004)(376002)(451199015)(5660300002)(52536014)(8936002)(41300700001)(64756008)(2906002)(55016003)(66946007)(8676002)(83380400001)(6916009)(38070700005)(4326008)(91956017)(76116006)(66556008)(66446008)(316002)(33656002)(66476007)(71200400001)(7696005)(186003)(6506007)(9686003)(86362001)(26005)(478600001)(122000001)(38100700002); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB7382 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT034.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 96083a96-ffba-4b0e-07fc-08daf4b6cc12 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 6i9NUHm04frSbHkk010Qb/CK0g9Y/bPpJgxj204Fafbqdv746mlpxqSWJJgMPDzsKvUlxbQm7NzTO3XfeIxtB4OaHH19TcJeKrrr73+eqCrHsDfMflagWkHlaHM8y5B2fdPwjGwA3bHkFhZL09HyXmM6UICDMZU9NjYR2/SXhaik6+eT/hr0E7AU8AxjDBaEHYha6vEV1psmf+i51OUrjP9xCQB1CjXFsSvcAbZK+gZv6TnG96Dhb10Nt245QJCRj3z/+szVLfP/KWH8rLRgXdYy/cCD2ukHii3BbFOO744I1PgaIQUye++8MF18eFUYSAVhOoKmEDLDWfTcUlHwMxw0QDvHPWfbJ5wKPrRub8pWOOXKc2UwwMhaUu/zdx+VIGvzZe8ARlA4qVKZT9AMn5046J3Md901As/eD/Hcg0FY4ml5WYvIONnwIY4h5VX1RdxSBS2J/R3nUzwPBNgRUMHhppMrNal4qBlyvesN2ue+qNp732EYPRWHHXku2zu7z9qzHl2hikeyZ6Otx2Br+N8+2HyazzMFNnMoZwMuETMJ2aqg7J7070H4E+Xs/Pp7r1QTrCA9yPE67vycI14PIBLHbo+J87xuZC4HeQgiYuminOv26tYZWtcf+cmcWKwluUZhQJuAw+c+KPGCrXHZXoPt7xiRiFipissc1FYIuBd49sp6pvZ2cC0bPdRj7nocCMSsleC6/1d0mmbTCCjv3g== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(396003)(39860400002)(376002)(346002)(136003)(451199015)(40470700004)(46966006)(36840700001)(8676002)(4326008)(70206006)(70586007)(41300700001)(52536014)(8936002)(478600001)(316002)(2906002)(33656002)(40460700003)(5660300002)(26005)(356005)(186003)(86362001)(55016003)(83380400001)(40480700001)(6916009)(6506007)(7696005)(36860700001)(9686003)(81166007)(82740400003)(47076005)(82310400005)(336012); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 16:05:25.4443 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 18990b4c-e330-4b38-262a-08daf4b6d110 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT034.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9720 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Use shrn for narrowing the mask which simplifies code and speeds up small strings. Unroll the first search loop to improve performance on large strings. Passes regress. Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/strrchr.S b/sysdeps/aarch64/strrchr.S index b4b8b44aadf75d900589e25ebbc49640ef43f0bf..528034a5183b4479634dfb346426102d6aa4b64a 100644 --- a/sysdeps/aarch64/strrchr.S +++ b/sysdeps/aarch64/strrchr.S @@ -22,19 +22,16 @@ /* Assumptions: * - * ARMv8-a, AArch64 - * Neon Available. + * ARMv8-a, AArch64, Advanced SIMD. * MTE compatible. */ -/* Arguments and results. */ #define srcin x0 #define chrin w1 #define result x0 #define src x2 #define tmp x3 -#define wtmp w3 #define synd x3 #define shift x4 #define src_match x4 @@ -46,7 +43,6 @@ #define vhas_nul v2 #define vhas_chr v3 #define vrepmask v4 -#define vrepmask2 v5 #define vend v5 #define dend d5 @@ -58,59 +54,71 @@ the relevant byte matched the requested character; bits 2-3 are set if the relevant byte matched the NUL end of string. */ -ENTRY(strrchr) +ENTRY (strrchr) PTR_ARG (0) bic src, srcin, 15 dup vrepchr.16b, chrin - mov wtmp, 0x3003 - dup vrepmask.8h, wtmp - tst srcin, 15 - beq L(loop1) - - ld1 {vdata.16b}, [src], 16 + movi vrepmask.16b, 0x33 + ld1 {vdata.16b}, [src] cmeq vhas_nul.16b, vdata.16b, 0 cmeq vhas_chr.16b, vdata.16b, vrepchr.16b - mov wtmp, 0xf00f - dup vrepmask2.8h, wtmp bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b - addp vend.16b, vhas_nul.16b, vhas_nul.16b + shrn vend.8b, vhas_nul.8h, 4 lsl shift, srcin, 2 fmov synd, dend lsr synd, synd, shift lsl synd, synd, shift ands nul_match, synd, 0xcccccccccccccccc bne L(tail) - cbnz synd, L(loop2) + cbnz synd, L(loop2_start) - .p2align 5 + .p2align 4 L(loop1): - ld1 {vdata.16b}, [src], 16 + ldr q1, [src, 16] + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b + fmov synd, dend + cbnz synd, L(loop1_end) + ldr q1, [src, 32]! cmeq vhas_chr.16b, vdata.16b, vrepchr.16b cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b umaxp vend.16b, vhas_nul.16b, vhas_nul.16b fmov synd, dend cbz synd, L(loop1) - + sub src, src, 16 +L(loop1_end): + add src, src, 16 cmeq vhas_nul.16b, vdata.16b, 0 +#ifdef __AARCH64EB__ + bif vhas_nul.16b, vhas_chr.16b, vrepmask.16b + shrn vend.8b, vhas_nul.8h, 4 + fmov synd, dend + rbit synd, synd +#else bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b - bic vhas_nul.8h, 0x0f, lsl 8 - addp vend.16b, vhas_nul.16b, vhas_nul.16b + shrn vend.8b, vhas_nul.8h, 4 fmov synd, dend +#endif ands nul_match, synd, 0xcccccccccccccccc - beq L(loop2) - + beq L(loop2_start) L(tail): sub nul_match, nul_match, 1 and chr_match, synd, 0x3333333333333333 ands chr_match, chr_match, nul_match - sub result, src, 1 + add result, src, 15 clz tmp, chr_match sub result, result, tmp, lsr 2 csel result, result, xzr, ne ret .p2align 4 + nop + nop +L(loop2_start): + add src, src, 16 + bic vrepmask.8h, 0xf0 + L(loop2): cmp synd, 0 csel src_match, src, src_match, ne