From patchwork Thu Jan 12 15:58:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 63110 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A6DC33854398 for ; Thu, 12 Jan 2023 15:59:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6DC33854398 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1673539181; bh=qP4sI8bcZy8BRAqRShkUnce/t64Mjfd3aAQFfB+bt10=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=NlXX2/QHki+0qwWddUaDoaBIADUWEZnaMu2dqNcLr4l4e6/pTRr4OhKIBmjQy+1QW /ki8zbFORvQZfE0WEREB8ZS4IFVpBuxsygd1GJuqxoWwXKW5ba3V2Soe6Mthccopbr Tkwmo+3JWDp/OLugMcoQXyU46FGdyI0YrUJdmo/0= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2061.outbound.protection.outlook.com [40.107.6.61]) by sourceware.org (Postfix) with ESMTPS id 859DA3858D35 for ; Thu, 12 Jan 2023 15:59:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 859DA3858D35 Received: from DB8P191CA0011.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:130::21) by GV2PR08MB8390.eurprd08.prod.outlook.com (2603:10a6:150:bc::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:59:14 +0000 Received: from DBAEUR03FT003.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:130:cafe::95) by DB8P191CA0011.outlook.office365.com (2603:10a6:10:130::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:59:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT003.mail.protection.outlook.com (100.127.142.89) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:59:13 +0000 Received: ("Tessian outbound baf1b7a96f25:v132"); Thu, 12 Jan 2023 15:59:13 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 78c927954f8c5d9c X-CR-MTA-TID: 64aa7808 Received: from 63d971280f65.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id ED49807B-4C90-40EA-BA10-3C750040FC79.1; Thu, 12 Jan 2023 15:59:07 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 63d971280f65.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:59:07 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LP+Vl36AuFd6gsCeVrn7ayDN6jNPoP5K6r+cd8lCtEJ3vN1M1/cvEOJeEVlHSTk42ZUykGgxejgHqCvcly7Kpi3X6Lzxu7M5vogdFYub3ZC/Pu/UzRgWrzr5C0ws07HQGmUHzhfOO2U+Dxx36Uf+B7JnKHiAQmXSvDOmndHnIkMBGpKsTxd2OsYjUub9oSXz5d4j7acoIAXwtf//Hhdtz85ovDE7AGwkw7Wi6UxcUeL2wLcWggxUvzoIPZpToDV9dlCBuOhGAhpirQRjupICG5JyqduL+RehqZKHXgtmjcLzmmUw7pV5P70MCxUlWPFp2vpztqJY5RVNrhrxyavQGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qP4sI8bcZy8BRAqRShkUnce/t64Mjfd3aAQFfB+bt10=; b=HGXubihwFmF2AwmUglvSBery9+nIrE5miea2yePOdnYbEQS4HNqt8hXuPtZf66ok1o5iXty6oyqRS1pf9TuRZ7CK4MYrLvkSh6DaXrqA3QbtaYo4QaC4QwE9fRBGJUA8wSXo8X/O4Q1RUckgvc2ynFLCXvYhbpeJI+WL7jnuITKGLUGmRB0oYt2kSDfloi4bogeeuyQVxhokTrIuMBZm/2fyDjHa2qpVWm38Bb8DWD7+U3ZQ1p7BWxbpYjQTHdyKc86hVaHZAWuwRfCtca0qgRxdYHPOuWzF5Jjz3mNJOlAbA+6q/EM95WL4OT08V6evOpalQcTmsFBG5mZXffkCqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by DU0PR08MB8978.eurprd08.prod.outlook.com (2603:10a6:10:466::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:58:55 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:58:55 +0000 To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Optimize strchr Thread-Topic: [PATCH] AArch64: Optimize strchr Thread-Index: AQHZJp6uEaKJLSAxbUqrbILsBpb92Q== Date: Thu, 12 Jan 2023 15:58:55 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|DU0PR08MB8978:EE_|DBAEUR03FT003:EE_|GV2PR08MB8390:EE_ X-MS-Office365-Filtering-Correlation-Id: 7f7db59c-b25c-46ef-c6eb-08daf4b5f365 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: oco+nA11Sn2pNAh5jueBLhdFicEnH2ZkEGEBV1vy4dkcwDxk6qCabORoYkAJts1l31pp5Y/SlBPCJeBXfizpwIa6IAZOAgdS44tiJS/RnpO1DloWZ3a8OI/jeYBQNx7gZx7mwUKSZDVj97HWdc4uhHfEHQFSlVtxh5d0O7YetdAGrIzYFD3uyatnZZhvWgbdFzNjP/MSY4XQ4MQxC/4IuRYGzB6/d/3yZYVQtttSSu1mrWLOrYAWzaiDJ2+cC6BO82ASl65fPv9a5MAjKICdf4NRlsIdFY8/crcIMz3cdPbFetFdAcplOsCxUEFlej/S39Sdu776QpYuFXYCaEscbSXz3m0UJMCrUwqcGZGwqwDLqUFJDV8cwQsgYaGZZpmlrEj8S3k5nylBCC+kTTNIYK3K/G8wMinsw8AGkmaAEIF+bzQzIk6FOD+AFPK0wc82zgehJ75HO5ufPdX7HcMW1S9pFyWZcUPtNF6jKibmodDIpxXPBSELAAZR/hKb8kGqcZF2dsyvNSYjApmPvbdbLQcT2t/vpwTBey4t+n6DEMj0MkfzRdnYutbg2ZLeeEvm2S7jLKHYkxz2eANmD4+03LVwYkjYbqaxncEQIQVpmHPdEsWzYHn71soAJNUrPjUlGUKKl97aeQI9r6Mipa3mIA3jNGzIWrP3irJuy/nZHC33kMvwQ+FEnHFeEKl0xs7JVTfOcHCEDQnfaawpT72vP4V5g8UYTh3kouhPdnIyUaN3xU6eiI1acKoeiQQ5HZFm X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(346002)(39860400002)(136003)(396003)(366004)(376002)(451199015)(5660300002)(52536014)(8936002)(41300700001)(64756008)(2906002)(55016003)(66946007)(8676002)(83380400001)(6916009)(38070700005)(4326008)(91956017)(76116006)(66556008)(66446008)(316002)(71200400001)(33656002)(7696005)(66476007)(186003)(6506007)(9686003)(86362001)(26005)(478600001)(122000001)(38100700002)(17423001)(156123004); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8978 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT003.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: f4d43229-ae8f-444d-a3a9-08daf4b5e896 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xO6oe1DiYkXkAuHTfw0AzQr4SLeLPnVTiuVD8Kuum5HGzhXr9+H97sA001HM0EIQ81DkrNoKFt90EP0+GBbjiVc59RO0XD8cCIEtsU4zFqZMXNWfsX1XPtV7GRmYJVlEewIRGRmSJg69JlTUx1NHyJe4GLNvN2lmW85unkh9I1p4LDlDLbMddD5t+YUZn0ENXUFa/fd4ajNnVb8TVAYmLDEx8nOzALLlmSYIZ2fcKj9xzZIEGX3jHssVa3VkqL5j8B1pntLflovmOI3r5iR0lRxMjE/jJ2jdYCiZCiF2RFBpubfN8Ta369C/YjMaXbOkxgM8XtkP9EllGi89eaFS2RD1E9o3LSTF/+1xA9v9F5Dsf7yeZHtiA5SW7Yfjvv6j0bIlhC0rPRcPt2EnzalbVqPOsafJR6x7Gho1YnXVE+m8VcGhHittI7O00nN8qO/FeMWhHkiRwI9OL/qeh4Y9gcEOU4HNdtSaJrhXhpYAH5opl25aXwECj/COMkt5EWnPAAWjBHDOSturnewCsSlxgTZF2u8jjsLwx+Fwu02/NnbUE6RHlzVYXMeDiRSEWkLsGKZi99571rvRWuYwqK1zUutXZpxlkBICJKJfM07dOopYm4OauinhMeRfaqOMz2p+P230Pv5D70uq+D2sgu1A/ZltZyOOoAcuKlIbouqW8odlWAQkCpWVbemewTYkpCQ6z7cSJeK1+Yiyy3+CG6c9gw9tqE2cx9gWW/ckTXDz/4od5oBO/ePRBOi9TeKOVOJK X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(346002)(136003)(396003)(39860400002)(376002)(451199015)(40470700004)(36840700001)(46966006)(9686003)(36860700001)(186003)(26005)(83380400001)(82740400003)(336012)(6506007)(7696005)(478600001)(47076005)(316002)(6916009)(81166007)(41300700001)(8936002)(70586007)(356005)(4326008)(40460700003)(8676002)(5660300002)(2906002)(33656002)(52536014)(86362001)(70206006)(55016003)(82310400005)(40480700001)(17423001)(156123004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:59:13.6137 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7f7db59c-b25c-46ef-c6eb-08daf4b5f365 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT003.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB8390 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Simplify calculation of the mask using shrn. Unroll the main loop. Small strings are 20% faster on modern CPUs. Passes regress. Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/strchr.S b/sysdeps/aarch64/strchr.S index 900ef15944c2b8a82943cc0fbdaf0b40907c40e1..14ae1513a7330a62cf5985d06e1fb6a8bab78d63 100644 --- a/sysdeps/aarch64/strchr.S +++ b/sysdeps/aarch64/strchr.S @@ -32,8 +32,7 @@ #define src x2 #define tmp1 x1 -#define wtmp2 w3 -#define tmp3 x3 +#define tmp2 x3 #define vrepchr v0 #define vdata v1 @@ -41,39 +40,30 @@ #define vhas_nul v2 #define vhas_chr v3 #define vrepmask v4 -#define vrepmask2 v5 -#define vend v6 -#define dend d6 +#define vend v5 +#define dend d5 /* Core algorithm. For each 16-byte chunk we calculate a 64-bit syndrome value with four bits - per byte. For even bytes, bits 0-1 are set if the relevant byte matched the - requested character, bits 2-3 are set if the byte is NUL (or matched), and - bits 4-7 are not used and must be zero if none of bits 0-3 are set). Odd - bytes set bits 4-7 so that adjacent bytes can be merged. Since the bits - in the syndrome reflect the order in which things occur in the original - string, counting trailing zeros identifies exactly which byte matched. */ + per byte. Bits 0-1 are set if the relevant byte matched the requested + character, bits 2-3 are set if the byte is NUL or matched. Count trailing + zeroes gives the position of the matching byte if it is a multiple of 4. + If it is not a multiple of 4, there was no match. */ ENTRY (strchr) PTR_ARG (0) bic src, srcin, 15 dup vrepchr.16b, chrin ld1 {vdata.16b}, [src] - mov wtmp2, 0x3003 - dup vrepmask.8h, wtmp2 + movi vrepmask.16b, 0x33 cmeq vhas_nul.16b, vdata.16b, 0 cmeq vhas_chr.16b, vdata.16b, vrepchr.16b - mov wtmp2, 0xf00f - dup vrepmask2.8h, wtmp2 - bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b - lsl tmp3, srcin, 2 - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ - + lsl tmp2, srcin, 2 + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ fmov tmp1, dend - lsr tmp1, tmp1, tmp3 + lsr tmp1, tmp1, tmp2 cbz tmp1, L(loop) rbit tmp1, tmp1 @@ -87,28 +77,34 @@ ENTRY (strchr) .p2align 4 L(loop): - ldr qdata, [src, 16]! + ldr qdata, [src, 16] + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b + fmov tmp1, dend + cbnz tmp1, L(end) + ldr qdata, [src, 32]! cmeq vhas_chr.16b, vdata.16b, vrepchr.16b cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b umaxp vend.16b, vhas_nul.16b, vhas_nul.16b fmov tmp1, dend cbz tmp1, L(loop) + sub src, src, 16 +L(end): #ifdef __AARCH64EB__ bif vhas_nul.16b, vhas_chr.16b, vrepmask.16b - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ fmov tmp1, dend #else bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ fmov tmp1, dend rbit tmp1, tmp1 #endif + add src, src, 16 clz tmp1, tmp1 - /* Tmp1 is an even multiple of 2 if the target character was - found first. Otherwise we've found the end of string. */ + /* Tmp1 is a multiple of 4 if the target character was found. */ tst tmp1, 2 add result, src, tmp1, lsr 2 csel result, result, xzr, eq