From patchwork Thu Jan 12 15:54:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 63107 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6641338543A0 for ; Thu, 12 Jan 2023 15:55:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6641338543A0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1673538916; bh=sP5qvxmxtuP9yxS4voQdzKsPb01CCZEmDHaLURDwe4k=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=pfCSysWB+vBCsDLNtU/8Q8usC0HvLjXeuHW38S5BwmUlPq+osYm4Tbq9Cek64EN4p DuJ3bkcWupCPvxOhGkRj+ce888ONP/bOSFqz+zuDXwYLdBeTLmT1W7hQfAPP8k/rzu llfJBQ+vUOIefF6HIvxg0uueyDEXD3vKwwjiuciQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03on2074.outbound.protection.outlook.com [40.107.103.74]) by sourceware.org (Postfix) with ESMTPS id 713273858D35 for ; Thu, 12 Jan 2023 15:54:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 713273858D35 Received: from AS9PR05CA0224.eurprd05.prod.outlook.com (2603:10a6:20b:494::26) by AS8PR08MB9696.eurprd08.prod.outlook.com (2603:10a6:20b:614::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.19; Thu, 12 Jan 2023 15:54:49 +0000 Received: from AM7EUR03FT008.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:494:cafe::84) by AS9PR05CA0224.outlook.office365.com (2603:10a6:20b:494::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:54:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT008.mail.protection.outlook.com (100.127.141.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18 via Frontend Transport; Thu, 12 Jan 2023 15:54:48 +0000 Received: ("Tessian outbound 8038f0863a52:v132"); Thu, 12 Jan 2023 15:54:48 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: d21dbe2cac53dca4 X-CR-MTA-TID: 64aa7808 Received: from 9f63b5594769.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id FA81A811-F4D3-4844-8EAD-A3130EC54E7A.1; Thu, 12 Jan 2023 15:54:42 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9f63b5594769.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:54:42 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kWJnKM4NwBeMOrGfmuNEooIMQO6GWayxE/ryLo2la+c3BEgVoRS765wzWPCVEquCosyy4ruAc76Vl7wiPWJaxqaSL/piDerdzD7nhVZfw9+QqsB3WGaPygsC5qpMW5HIyRwin6zF9ubSESGqjTEWdJz9arrLkVHScG3gH5UM/V/k//xCky2Bq8eQhCKeSEKeww9Mydeeg5n8DzrOqX95nWThtdrCTpc43T6ClCUE0DYlv5Qy0cxhamzl0ofLHPTEXywcbMZ0xJBkOZJmG/ITfw8rRWFBaUlHj5SkAaMeu6Dm/a/M8bI9mUOL1bc2juTKx1UxMhnT78kcUP8zzrofFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sP5qvxmxtuP9yxS4voQdzKsPb01CCZEmDHaLURDwe4k=; b=XNbkyE0YgdoeDa8k1ElPqxNNBlKBFJkX0yZ2F+zwLCTrPVuRZMqdU4B16lVBvHURVNYIslE2njezjjStpt7wAN/vVst1pa4r1hEW/0sod+RawSlVgKjX1ZDMZS1/P1YgbB9WAbmz7bcM4+LDb/7Hpp6bIotowMY9anWLEC93VoLJNx1hgKM8CoD2T9WWdbaGRssD2k9U79rkDnezs1yranM1USHf1toBA71vTpexkpxQnLF9ehcWfF9DsGa1VRt5zrQ+S+oXev/P26sndnUF7QjtUTD++FjNS+QcFiMUndHR8udV6HMP1nTXDs4B3+LDaHCLFMUFmJgisREtuIPMiA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB10111.eurprd08.prod.outlook.com (2603:10a6:20b:62d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:54:39 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:54:39 +0000 To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Optimize strnlen Thread-Topic: [PATCH] AArch64: Optimize strnlen Thread-Index: AQHZJp4Imj9sF1dVSUaCaFGgCjmP9A== Date: Thu, 12 Jan 2023 15:54:39 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB10111:EE_|AM7EUR03FT008:EE_|AS8PR08MB9696:EE_ X-MS-Office365-Filtering-Correlation-Id: 89976e2b-cd04-4e6b-e79a-08daf4b555a4 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: P1t0BUKAYlbfmPIykxp44YBDVrCGDfwZOUwl2Gq+KFF9VoY6BByuubVCt3c2/BXa90RYHEFZCqnqftL6iOHsmWlo5L536JnKNc+RlzxDUVdxmEYeoXM9vVjynoEZ6SnxC+znPuoSkufmJJ2HIRZRrNwu0fYz8jdmj+gMX3GfDY3EohlBOSq9DixjKLs9MizgwckPTbGEmwMoTJQcD4Vx/L/vO4KihPKdjI+52wCADQfYbaTdF7eyyJvYWOBt40nclyZtS7j3Dw9TPn5H5/Lh6gEZWo74y0jV9qRpUdS8usCtDGibVHsD125hvxyqRxrmA2DEGoQ096H5yzGwgG5QSopShT1mWq70ShpZZz8XJhgHzb1U6wrhXnlSBSGZs3pxDG2mWJ83jW2ETlpZPMBgWX4aGqKz2OdyYZf9yX4izURYU22ZLecbWR2h+8AUVLdwU7RFnhZH4YgnCBWUxOXmbeHBTU9OAw8aqg8yd/IU5lAx5xUSqQNxCBJlKNxqcd7/dL1vOt3uZdNCSt8I3jDSmyQKRkpn6FUxcGchOXsY/RBcf3R4uiB0Bt987j7yPOvq7tbmJjIVntITJi0fEWItd+o561avvz+Ht8tTjtF14ku8Y46xe+ncj6WGnkJfWjMVDck73BEGjhWP5K0V9myXS5nbnICFBZOq17nOFh9Rnybaouzk2Hqx0uLEPbOSfw3Ed2+ZMi0tns5Ow3pX275GOA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(376002)(396003)(39860400002)(366004)(136003)(346002)(451199015)(86362001)(2906002)(5660300002)(8936002)(66446008)(41300700001)(66556008)(91956017)(33656002)(66476007)(52536014)(8676002)(66946007)(64756008)(76116006)(38100700002)(83380400001)(4326008)(316002)(55016003)(6916009)(6506007)(26005)(9686003)(71200400001)(186003)(7696005)(478600001)(122000001)(38070700005); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB10111 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT008.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 563bcf43-fdaa-428f-12e5-08daf4b54fc3 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KwMVubRlmkhxo/1jX2hyIAEgBNsieiD5PLdKg3MCSfTn8GznGVT9zZ0k79n1SBCxb7WoxzUuz6KzkhirxEnbuSWm230chjD3nE4xO9dvqMRdmKxEXcPrTUf6JSkgyWCAlBCC+oaxLXx/aTmcEOJMxTBWlDXGDqG6m3taKh2pacTxHh595VbEr5NDp7q9pcgRGycI1e9EEmCaBCwOPCcc5qQeyiMelJX22iue2qDr7/4b5ctb5whaOBXtuMpNRVLFrms75x1B/FwagJf5kx77nZF217LSwuuKryvmQ7FjypisyzAxd8MBhz1+WnhwqFtZ5cpa6BFPCvjPLBZNlcGoavA9BX3VXgpuJrvH1E0WIZnmM4r5765PYGe2mSWJtr4qjTMFKMsIb0QTSAZEzAhijnPbU4WxbJOmIHW5f5aiOCqHoIrv2ag2fFQ+KuVI90rBXhdRJSIYjhrqAiqEMFFaafzvZLfd8kJfALpBMUyIe5sQAds9nfpp8mOjnnA2YoNOCBnTc+oQihOCOcwD2fgLKNEZFDTJSRURbw6fl4Dj0wh1vwcAowjwByPEkP21w5N6EbRUtmiAcrS4BFrgkr3LVRwvsrOa5js8jBI+LpIC1Hu6ff6icIiRUdVDIHuDv7oEKAzGrS0+0JKMbfkubtN6uuiMlIVIl9kRhvOHJTtRlFAixRPPSc3PDmCubRL3nPf2ULTFDtZd+LuSEbrnPVrXUw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(39860400002)(136003)(376002)(396003)(346002)(451199015)(36840700001)(40470700004)(46966006)(83380400001)(41300700001)(47076005)(6916009)(70586007)(70206006)(336012)(8676002)(4326008)(316002)(33656002)(86362001)(82740400003)(356005)(81166007)(52536014)(40460700003)(8936002)(5660300002)(40480700001)(2906002)(36860700001)(55016003)(186003)(9686003)(26005)(478600001)(7696005)(82310400005)(6506007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:54:48.9006 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 89976e2b-cd04-4e6b-e79a-08daf4b555a4 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT008.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9696 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Optimize strnlen using the shrn instruction and improve the main loop. Small strings are around 10% faster, large strings are 40% faster on modern CPUs. Passes regress. Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/strnlen.S b/sysdeps/aarch64/strnlen.S index 35fd14804d42ab90573f995b20cf65ba75042978..21112fbf760b7a99a6d153c00f9cd6b6bc144f3a 100644 --- a/sysdeps/aarch64/strnlen.S +++ b/sysdeps/aarch64/strnlen.S @@ -44,19 +44,16 @@ /* Core algorithm: - - For each 16-byte chunk we calculate a 64-bit nibble mask value with four bits - per byte. We take 4 bits of every comparison byte with shift right and narrow - by 4 instruction. Since the bits in the nibble mask reflect the order in - which things occur in the original string, counting trailing zeros identifies - exactly which byte matched. */ + Process the string in 16-byte aligned chunks. Compute a 64-bit mask with + four bits per byte using the shrn instruction. A count trailing zeros then + identifies the first zero byte. */ ENTRY (__strnlen) PTR_ARG (0) SIZE_ARG (1) bic src, srcin, 15 cbz cntin, L(nomatch) - ld1 {vdata.16b}, [src], 16 + ld1 {vdata.16b}, [src] cmeq vhas_chr.16b, vdata.16b, 0 lsl shift, srcin, 2 shrn vend.8b, vhas_chr.8h, 4 /* 128->64 */ @@ -71,36 +68,40 @@ L(finish): csel result, cntin, result, ls ret +L(nomatch): + mov result, cntin + ret + L(start_loop): sub tmp, src, srcin + add tmp, tmp, 17 subs cntrem, cntin, tmp - b.ls L(nomatch) + b.lo L(nomatch) /* Make sure that it won't overread by a 16-byte chunk */ - add tmp, cntrem, 15 - tbnz tmp, 4, L(loop32_2) - + tbz cntrem, 4, L(loop32_2) + sub src, src, 16 .p2align 5 L(loop32): - ldr qdata, [src], 16 + ldr qdata, [src, 32]! cmeq vhas_chr.16b, vdata.16b, 0 umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ fmov synd, dend cbnz synd, L(end) L(loop32_2): - ldr qdata, [src], 16 + ldr qdata, [src, 16] subs cntrem, cntrem, 32 cmeq vhas_chr.16b, vdata.16b, 0 - b.ls L(end) + b.lo L(end_2) umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ fmov synd, dend cbz synd, L(loop32) - +L(end_2): + add src, src, 16 L(end): shrn vend.8b, vhas_chr.8h, 4 /* 128->64 */ - sub src, src, 16 - mov synd, vend.d[0] sub result, src, srcin + fmov synd, dend #ifndef __AARCH64EB__ rbit synd, synd #endif @@ -110,10 +111,6 @@ L(end): csel result, cntin, result, ls ret -L(nomatch): - mov result, cntin - ret - END (__strnlen) libc_hidden_def (__strnlen) weak_alias (__strnlen, strnlen)