From patchwork Wed Jun 23 15:22:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 43972 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9C22A3987C3E for ; Wed, 23 Jun 2021 15:23:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9C22A3987C3E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1624461825; bh=uUQ7BjbwhsxPgDgsRT0XQkkV85oNT6835swEx0fhkcA=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Z4CqeWi6+hmQGPCj+ibtcqkaJPqxqZPinmzIhsTxutmODTvONGaOHKHab1o3xZ6nH HtI96m7OlNFN3AZPVGwdEKFAuN/3eyPFIwxM4MJO0aiJ3g8vN5mLhuLcjlZrvnqh7p OX2WcGa8xGpU1zn8vH7clBnjI/89OT7ZjB11Q7zU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80040.outbound.protection.outlook.com [40.107.8.40]) by sourceware.org (Postfix) with ESMTPS id 20D37386FC1B for ; Wed, 23 Jun 2021 15:23:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 20D37386FC1B Received: from AS8PR04CA0074.eurprd04.prod.outlook.com (2603:10a6:20b:313::19) by PR3PR08MB5833.eurprd08.prod.outlook.com (2603:10a6:102:81::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.19; Wed, 23 Jun 2021 15:23:20 +0000 Received: from VE1EUR03FT029.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:313:cafe::c3) by AS8PR04CA0074.outlook.office365.com (2603:10a6:20b:313::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.18 via Frontend Transport; Wed, 23 Jun 2021 15:23:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT029.mail.protection.outlook.com (10.152.18.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4264.18 via Frontend Transport; Wed, 23 Jun 2021 15:23:19 +0000 Received: ("Tessian outbound d6f95fd272ef:v96"); Wed, 23 Jun 2021 15:23:19 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7fee2fcb4ee607eb X-CR-MTA-TID: 64aa7808 Received: from a4b57f135008.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D2640058-0924-4707-8262-0F1163FE04D9.1; Wed, 23 Jun 2021 15:22:58 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id a4b57f135008.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 23 Jun 2021 15:22:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hDnIs9pypVlmOF2t7Jimcwz6ZR415V5hbjANqh/lFTWI5a2euCzslUy3ojumIjr76+JsOGBNb96HMSb7v7vKT8+PU5rXG5SOjFFHbsc/mXHi7vaHTjU9gn1F+6H1Fey1EObm6124y4TqD6FwEQNmdJ+kyB5Q5nHs6pLd0y6kWDC7YzBsVduZP7DNP9B8KbBw52hPtgQjLs0TcyXz1WIdDnnqn9try9WhStToQ39cnarMywB/4U8N/74QmLjyLNKbrv3eRkfvitFs6NO+/OBSBWuYa/vwfJMIfZ8lw8AyYEe7/66rzgH77kFVzodRkNZeef/PRzPvhBoVWUrvSYX0Mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uUQ7BjbwhsxPgDgsRT0XQkkV85oNT6835swEx0fhkcA=; b=OMCBBBp0uwrSq0YsvE0Oc+I8EDrYgthpJLc3eE5pg0LO0mUiFXxNN71V2bl6sXBpsLcSct4/VlNWk5TILjawp6OrQQUiL+r0nACODaKMs0OYw3qN3vFHUo3iZPiEWwz2IUbouSFNHEf4r09IE7jX8kat601nh+0U9dJ5UB4iXAJ6tMHmvsq1GFTK2E15zyhsfLEt0+w1jWDAiMAG4VpTBzZILcdWYq4s915ndN01VJjpjOkxITfGtxe6bY2yYk/3j6/jexmezNQq0/vtTeUfR0pzceiZ0JjXYeoHkAnAgWKHLWe8YFFIhedgCo9ARcZ4hMBBEgyZfpgOfqECEdjiTA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB4285.eurprd08.prod.outlook.com (2603:10a6:803:f8::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.21; Wed, 23 Jun 2021 15:22:57 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::8c25:b5e8:b9be:13ac]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::8c25:b5e8:b9be:13ac%5]) with mapi id 15.20.4242.023; Wed, 23 Jun 2021 15:22:57 +0000 To: 'GNU C Library' Subject: [PATCH] AArch64: Improve strnlen performance Thread-Topic: [PATCH] AArch64: Improve strnlen performance Thread-Index: AQHXaENiuxB0dSViCUOzdrD54eNJmw== Date: Wed, 23 Jun 2021 15:22:57 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: sourceware.org; dkim=none (message not signed) header.d=none;sourceware.org; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.24.249.100] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 1541570f-c519-4e6f-32bd-08d9365ad511 x-ms-traffictypediagnostic: VI1PR08MB4285:|PR3PR08MB5833: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:5797;OLM:5797; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: KtrKgqu3vtF1V3fGssXp1SzqR2bkc8FUh8zYmlYsQURX4RYV5CV0dByagG6pfCYkgPVCJ1+cV/9Jv8nFi5og1iR+aaK71e+YsEdm6QNcAXO+kJJL23OVng9Cz2gBPZqvnokrl9mMR5ddLyjvn/SH9JgsnIyarV1YItvGzMR2ou2gfDk0FOqkowKOL8lQ222bfKqybtWtgEKS+1Qr6ZhYiWjuUGjJGVi8R58mHKLodCve/WSt3X+fg0vsEgWfXj4DQfRLeI1nIslWIKIOoxDhYC8HyyMObjQCjKM34Qg0Znhc2ulOEjJZo14V1LV1WGKTLLx/ejMJMdQKdnlziYxMiSJtWoxZzmvp/prDE5ytJQS+ZtueN8bX+6FffRSiplwsAUXkYsppiviy0Cu0QJwxnlS3mXOM7G4gXGR/k7vHrlyx6yCaG6gqqfA1NA/3GX6RjBP1mVZkE2zFe69YA75/5MgmJgaRyV28PYB8V6z+I96O+Izctmcyw6L6q25iDBOQbRtKlGI0TNry0TK+h6hsbxFgkNnsi3/0q7PjwBePgCLvoGBLRw2LA09sUG6Ssjfj3rdgT12nb0BGZQFgTV/++M/zAD/xrgCGsu9qFAMb8G8zq8SwMQPI0fAINKwMK+we4n2W05H86mWbP+o2GfSQUA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(366004)(39850400004)(346002)(136003)(396003)(5660300002)(83380400001)(38100700002)(33656002)(52536014)(26005)(7696005)(186003)(86362001)(6506007)(478600001)(4326008)(71200400001)(2906002)(9686003)(55016002)(6916009)(316002)(8936002)(8676002)(122000001)(64756008)(66556008)(66446008)(66476007)(76116006)(66946007)(91956017); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?4jo/hIGP6H4ZOtuhjArrJF4?= =?iso-8859-1?q?LmAiN5p4DdPkhNNDvQu98gCZVuG6Xl5hYeIhTssL1ZsmFA8UXftdCn9aR8mh?= =?iso-8859-1?q?bvc7/sZS0OBRc5L2DbpAcyqSdzX+UxUDbd+LK+K5F8BC6TvODK+w8HkfDBIN?= =?iso-8859-1?q?rZPtUcT7AI53tVaeQe9wiJf8j0A3isghdpYYvlLb7buAv6OaA2M4wflJBa4T?= =?iso-8859-1?q?qxVmpoMNZf7H6lZ1EJGoSQ5uXLIdoGnxxdz/zfI/b5ukDfTCpM2BSWzqzs3E?= =?iso-8859-1?q?Yhdu+/NL5RveJfRTJoctH9gfKdIjudYmeEq+H5SHhxcIo+k/19Vw8vC6APGy?= =?iso-8859-1?q?qHI1pSIWBiitKhC3dc/Qf8+t98HhQ70GRACDM+mlaBonh7BPtyZSJTNoDb1q?= =?iso-8859-1?q?ByMxwZ/Cc11/uXHCPQrAbbWwXKaj3gkdCEOwiJ8m+ZtMbjFRzpBjuge41yUN?= =?iso-8859-1?q?Gv+i3RQr+yUfu8pxPIMl6/HRuPZI+QkeeBwLzw5GB2jtYYXlQmN2QYqzrqoE?= =?iso-8859-1?q?CJGyHSgJSIykQmyIWPEiZ3bBy1VTbti8U2x4wgNG5ylLfjgdUTzglcGCdhXw?= =?iso-8859-1?q?I9Cnakh6Fw1tNbac5IxerviRxJ8043tkQCevWzPNazpIFudPbEvNLA3vsFiI?= =?iso-8859-1?q?kZsVrvXSUTH7QErXO3EWUAd/BStKWOSEkkHaFj8n8ojrDl+MoLtTPW3Juary?= =?iso-8859-1?q?Nitj48f012gxF7bI+j1nrGMLWBXOuNRWJXKQ8gP5XKd4SkpU/kSAiz2gB5My?= =?iso-8859-1?q?pKx57NLE7QJQM4OyB4IgbsQ8iCLIvfum8zNLjk4tPPj/xO/C/ygwPqmhPJ8/?= =?iso-8859-1?q?T/o2fZ3f9HweLcy4dBYTIn3F0t84Z0u5/S9zp9+zitzRh+KCvrOWGx7eIK5e?= =?iso-8859-1?q?Y0zREHBjAeRwWY34sMVzeNcbl+f/IVAE2Kcsi8ibSmKNissyOJ+7+ecOLXG0?= =?iso-8859-1?q?AkqTlYSrX4UAyIo6foxwwQ6O9YrKR4P6O/PZNAC3DLxkg/rffwPT1+WG8kAk?= =?iso-8859-1?q?+3QN5lj/HGM7AKFqK39Ujnt1FJqCCxU6NC0vEFZqISnl8Oiv0mOiKhepHVnz?= =?iso-8859-1?q?+sGOrxHeD6Pv0LZwy5jjA/JwemFMzYL+Fob93pekt/X4OwQzHdfu1hzt2ATf?= =?iso-8859-1?q?C0nE56/vKjcWcU4Yd+OunW/q2SO7TgjDTT/D9+UxaBJuZin04v3MKfABAJrh?= =?iso-8859-1?q?0a41xEjEEH5i8yiBnVs+XyuJqyhSIEgbAqG+DqmLHBUB7xYfxNjqUitWuQW1?= =?iso-8859-1?q?euOUQ0qWImLhFaxZajgpSOcuFaCuFH9+Mq17XXmMsobXkw9UrzI2QGRxI4/K?= =?iso-8859-1?q?uqHudtUedIfQP4T8XRdaDslcg1acbiU2QjvTTjIc=3D?= MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4285 Original-Authentication-Results: sourceware.org; dkim=none (message not signed) header.d=none; sourceware.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 50e1a661-52b4-4f5e-a38e-08d9365ac7a0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Uwx5AtiJr6K+BG2wIxngHzpIDvdS4fl82G2qPhEbGCm6yZZ5aG1uUbCYuRrrIDYkS1XCbLFFKZnLfaziUqI+3fNORPx0Qlia7a3n44+3RvftipjrKEPZ+dCevI+xooKBIoYNG/B17QNPULlbui4g57I9DOLKhS6VtPYnD3ueevkbdUT8DobITB/MnYP7ZzsPLe9kHCgD8c1pTVdUr+wce+jHST4pJ6VgD+mkY12lvjaEvj6JwEWgBN4hayDXcvE0SEpZPvBVM9dZ4XLceBPSKH3ec4DqeSU37ispEb4NqGGUzrVkTxAFmcIoO8zdWWN61uFwT/VtIlOWMUqyuD96fJ2PQUHH31ut8Fbs9gbIc6SWr3WuAG6kdij0WDZCHyyddITdJf3TlqiaDEcUKtZD2dy87y5MBEJoTjqqKzKOeKi6xFe0wxjg+0pA34bz2Q1A1gIZXKv68c84Xti+SUfxRLH85F9LpxKZe2Q1HWF3iJ20+QXst9OFXw8dAvZ3Mvpn6bD3pwBv2cuzVdRVJ4j+cE7doToB+PKW2pT1fa8tsXh6bVHgzz6vGXTygNVZwhIMvRGqRn34GQpquUTtj5FRl9B4SDe2qMZVuHD8AgbywDAXnGU9I3CE6ZqTVVvokLpZ7SENpCPBnSDY79hjSDN03A== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(346002)(39850400004)(136003)(396003)(46966006)(36840700001)(7696005)(2906002)(86362001)(70586007)(52536014)(70206006)(5660300002)(336012)(316002)(6916009)(36860700001)(55016002)(478600001)(83380400001)(82310400003)(186003)(26005)(47076005)(8936002)(8676002)(356005)(82740400003)(81166007)(33656002)(6506007)(9686003)(4326008); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jun 2021 15:23:19.7687 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1541570f-c519-4e6f-32bd-08d9365ad511 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3PR08MB5833 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: Szabolcs Nagy Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Optimize strnlen by avoiding UMINV which is slow on most cores. On Neoverse N1 large strings are 1.8x faster than the current version, and bench-strnlen is 50% faster overall. This version is MTE compatible. Passes GLIBC regress, OK for commit? Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/strnlen.S b/sysdeps/aarch64/strnlen.S index 2b57575c55cc41a5c6aa813af216c6e34f6cb7b0..37e9eed4120750f4e03d563938438b8c5384f75d 100644 --- a/sysdeps/aarch64/strnlen.S +++ b/sysdeps/aarch64/strnlen.S @@ -22,197 +22,105 @@ /* Assumptions: * - * ARMv8-a, AArch64 + * ARMv8-a, AArch64, Advanced SIMD. + * MTE compatible. */ -/* Arguments and results. */ #define srcin x0 -#define len x0 -#define limit x1 +#define cntin x1 +#define result x0 -/* Locals and temporaries. */ #define src x2 -#define data1 x3 -#define data2 x4 -#define data2a x5 -#define has_nul1 x6 -#define has_nul2 x7 -#define tmp1 x8 -#define tmp2 x9 -#define tmp3 x10 -#define tmp4 x11 -#define zeroones x12 -#define pos x13 -#define limit_wd x14 - -#define dataq q2 -#define datav v2 -#define datab2 b3 -#define dataq2 q3 -#define datav2 v3 -#define REP8_01 0x0101010101010101 -#define REP8_7f 0x7f7f7f7f7f7f7f7f -#define REP8_80 0x8080808080808080 - -ENTRY_ALIGN_AND_PAD (__strnlen, 6, 9) +#define synd x3 +#define shift x4 +#define wtmp w4 +#define tmp x4 +#define cntrem x5 + +#define qdata q0 +#define vdata v0 +#define vhas_chr v1 +#define vrepmask v2 +#define vend v3 +#define dend d3 + +/* + Core algorithm: + + For each 16-byte chunk we calculate a 64-bit syndrome value with four bits + per byte. For even bytes, bits 0-3 are set if the relevant byte matched the + requested character or the byte is NUL. Bits 4-7 must be zero. Bits 4-7 are + set likewise for odd bytes so that adjacent bytes can be merged. Since the + bits in the syndrome reflect the order in which things occur in the original + string, counting trailing zeros identifies exactly which byte matched. */ + +ENTRY (__strnlen) PTR_ARG (0) SIZE_ARG (1) - cbz limit, L(hit_limit) - mov zeroones, #REP8_01 - bic src, srcin, #15 - ands tmp1, srcin, #15 - b.ne L(misaligned) - /* Calculate the number of full and partial words -1. */ - sub limit_wd, limit, #1 /* Limit != 0, so no underflow. */ - lsr limit_wd, limit_wd, #4 /* Convert to Qwords. */ - - /* NUL detection works on the principle that (X - 1) & (~X) & 0x80 - (=> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and - can be done in parallel across the entire word. */ - /* The inner loop deals with two Dwords at a time. This has a - slightly higher start-up cost, but we should win quite quickly, - especially on cores with a high number of issue slots per - cycle, as we get much better parallelism out of the operations. */ - - /* Start of critial section -- keep to one 64Byte cache line. */ - - ldp data1, data2, [src], #16 -L(realigned): - sub tmp1, data1, zeroones - orr tmp2, data1, #REP8_7f - sub tmp3, data2, zeroones - orr tmp4, data2, #REP8_7f - bic has_nul1, tmp1, tmp2 - bic has_nul2, tmp3, tmp4 - subs limit_wd, limit_wd, #1 - orr tmp1, has_nul1, has_nul2 - ccmp tmp1, #0, #0, pl /* NZCV = 0000 */ - b.eq L(loop) - /* End of critical section -- keep to one 64Byte cache line. */ - - orr tmp1, has_nul1, has_nul2 - cbz tmp1, L(hit_limit) /* No null in final Qword. */ - - /* We know there's a null in the final Qword. The easiest thing - to do now is work out the length of the string and return - MIN (len, limit). */ - - sub len, src, srcin - cbz has_nul1, L(nul_in_data2) -#ifdef __AARCH64EB__ - mov data2, data1 + bic src, srcin, 15 + mov wtmp, 0xf00f + cbz cntin, L(nomatch) + ld1 {vdata.16b}, [src], 16 + dup vrepmask.8h, wtmp + cmeq vhas_chr.16b, vdata.16b, 0 + lsl shift, srcin, 2 + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov synd, dend + lsr synd, synd, shift + cbz synd, L(start_loop) +L(finish): + rbit synd, synd + clz synd, synd + lsr result, synd, 2 + cmp cntin, result + csel result, cntin, result, ls + ret + +L(start_loop): + sub tmp, src, srcin + subs cntrem, cntin, tmp + b.ls L(nomatch) + + /* Make sure that it won't overread by a 16-byte chunk */ + add tmp, cntrem, 15 + tbnz tmp, 4, L(loop32_2) + + .p2align 5 +L(loop32): + ldr qdata, [src], 16 + cmeq vhas_chr.16b, vdata.16b, 0 + umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov synd, dend + cbnz synd, L(end) +L(loop32_2): + ldr qdata, [src], 16 + subs cntrem, cntrem, 32 + cmeq vhas_chr.16b, vdata.16b, 0 + b.ls L(end) + umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov synd, dend + cbz synd, L(loop32) + +L(end): + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + sub src, src, 16 + mov synd, vend.d[0] + sub result, src, srcin +#ifndef __AARCH64EB__ + rbit synd, synd #endif - sub len, len, #8 - mov has_nul2, has_nul1 -L(nul_in_data2): -#ifdef __AARCH64EB__ - /* For big-endian, carry propagation (if the final byte in the - string is 0x01) means we cannot use has_nul directly. The - easiest way to get the correct byte is to byte-swap the data - and calculate the syndrome a second time. */ - rev data2, data2 - sub tmp1, data2, zeroones - orr tmp2, data2, #REP8_7f - bic has_nul2, tmp1, tmp2 -#endif - sub len, len, #8 - rev has_nul2, has_nul2 - clz pos, has_nul2 - add len, len, pos, lsr #3 /* Bits to bytes. */ - cmp len, limit - csel len, len, limit, ls /* Return the lower value. */ - RET - -L(loop): - ldr dataq, [src], #16 - uminv datab2, datav.16b - mov tmp1, datav2.d[0] - subs limit_wd, limit_wd, #1 - ccmp tmp1, #0, #4, pl /* NZCV = 0000 */ - b.eq L(loop_end) - ldr dataq, [src], #16 - uminv datab2, datav.16b - mov tmp1, datav2.d[0] - subs limit_wd, limit_wd, #1 - ccmp tmp1, #0, #4, pl /* NZCV = 0000 */ - b.ne L(loop) -L(loop_end): - /* End of critical section -- keep to one 64Byte cache line. */ - - cbnz tmp1, L(hit_limit) /* No null in final Qword. */ - - /* We know there's a null in the final Qword. The easiest thing - to do now is work out the length of the string and return - MIN (len, limit). */ - -#ifdef __AARCH64EB__ - rev64 datav.16b, datav.16b -#endif - /* Set te NULL byte as 0xff and the rest as 0x00, move the data into a - pair of scalars and then compute the length from the earliest NULL - byte. */ - - cmeq datav.16b, datav.16b, #0 -#ifdef __AARCH64EB__ - mov data1, datav.d[1] - mov data2, datav.d[0] -#else - mov data1, datav.d[0] - mov data2, datav.d[1] -#endif - cmp data1, 0 - csel data1, data1, data2, ne - sub len, src, srcin - sub len, len, #16 - rev data1, data1 - add tmp2, len, 8 - clz tmp1, data1 - csel len, len, tmp2, ne - add len, len, tmp1, lsr 3 - cmp len, limit - csel len, len, limit, ls /* Return the lower value. */ - RET - -L(misaligned): - /* Deal with a partial first word. - We're doing two things in parallel here; - 1) Calculate the number of words (but avoiding overflow if - limit is near ULONG_MAX) - to do this we need to work out - limit + tmp1 - 1 as a 65-bit value before shifting it; - 2) Load and mask the initial data words - we force the bytes - before the ones we are interested in to 0xff - this ensures - early bytes will not hit any zero detection. */ - sub limit_wd, limit, #1 - neg tmp4, tmp1 - cmp tmp1, #8 - - and tmp3, limit_wd, #15 - lsr limit_wd, limit_wd, #4 - mov tmp2, #~0 - - ldp data1, data2, [src], #16 - lsl tmp4, tmp4, #3 /* Bytes beyond alignment -> bits. */ - add tmp3, tmp3, tmp1 - -#ifdef __AARCH64EB__ - /* Big-endian. Early bytes are at MSB. */ - lsl tmp2, tmp2, tmp4 /* Shift (tmp1 & 63). */ -#else - /* Little-endian. Early bytes are at LSB. */ - lsr tmp2, tmp2, tmp4 /* Shift (tmp1 & 63). */ -#endif - add limit_wd, limit_wd, tmp3, lsr #4 - - orr data1, data1, tmp2 - orr data2a, data2, tmp2 + clz synd, synd + add result, result, synd, lsr 2 + cmp cntin, result + csel result, cntin, result, ls + ret - csinv data1, data1, xzr, le - csel data2, data2, data2a, le - b L(realigned) +L(nomatch): + mov result, cntin + ret -L(hit_limit): - mov len, limit - RET END (__strnlen) libc_hidden_def (__strnlen) weak_alias (__strnlen, strnlen)