Message ID | PAWPR08MB8982BE5A8C1635DBCB89700D83FD9@PAWPR08MB8982.eurprd08.prod.outlook.com |
---|---|
State | Committed |
Commit | 1bbb1a2022e126f21810d3d0ebe0a975d5243e43 |
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 629C53854386 for <patchwork@sourceware.org>; Thu, 12 Jan 2023 15:53:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 629C53854386 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1673538788; bh=UkfFwoM/UJoR6klORCfwuBkMlxyXHzbNpDtkKckY6jU=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=jOSNnRfVpHPlimql4tIZRYSS1BvB1fMFrov05H4/3P5Q6R/X98jTI6UPzCMaASs/J QrHs19OKD6/pkaru+VwR2vWsACJYEt2k7xw4LBPKQrY82neDnQvW4fdZP08xz8sv2a 89FRpQ9iJdMWLRQEjxTjsUQwPXtTkW8d8Ylsvaj4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2055.outbound.protection.outlook.com [40.107.20.55]) by sourceware.org (Postfix) with ESMTPS id DC69E3858C74 for <libc-alpha@sourceware.org>; Thu, 12 Jan 2023 15:51:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DC69E3858C74 Received: from DB6PR07CA0099.eurprd07.prod.outlook.com (2603:10a6:6:2c::13) by DU0PR08MB7590.eurprd08.prod.outlook.com (2603:10a6:10:317::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:51:42 +0000 Received: from DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com (2603:10a6:6:2c:cafe::45) by DB6PR07CA0099.outlook.office365.com (2603:10a6:6:2c::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6023.6 via Frontend Transport; Thu, 12 Jan 2023 15:51:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT030.mail.protection.outlook.com (100.127.142.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:51:42 +0000 Received: ("Tessian outbound b1d3ffe56e73:v132"); Thu, 12 Jan 2023 15:51:42 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 222dc42d7b6b6fd4 X-CR-MTA-TID: 64aa7808 Received: from 4b1ec5a452e6.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C0AE5D42-1D25-41B1-9F2A-01F30853F805.1; Thu, 12 Jan 2023 15:51:35 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4b1ec5a452e6.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:51:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YHOGp8uoWRK6r2t3Qr5G1RLRMNnZWWvjvAd4EG9gsYj/28m1bsAy3QwvuvkgAjY4v6fazEngcawl6k7lpqoQAYRPItxMZEtkQnBdXnvIK8MnRNaNoAoDDTT65vQ/UQnLOHzz75a+edLeHj85+bKspL4hXsa6cmhJh422ntL7pj6met2A44rCg47+KFDfhLFtzrDDF29rSQAjRPOty7PAbT85fvCoscO7imlLoGrrCFsstJeQ9i7rbdCjk7KDMcO49Q15C+MLQ1/z1M5bBNtAjMD/vHKJ67qdmjfluiEnZG8mwRFulE7X0MfuMUJUwq1KES6nq1DLOM8QWjzal4cJYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UkfFwoM/UJoR6klORCfwuBkMlxyXHzbNpDtkKckY6jU=; b=HFxKqadiEUM9cUWOTr44UArcYcV4qisgRI0nFe84KIMgSPw3rVamyylxszWhgBgi/zibrMMnrWu5BaiYePAN6W7yYaUhEaeNipCmYhBwi40TOYNKe/Hes0/KDI4AFO8DWOUERdBXUduHT6r+d+163i0+mG9jM8v7Av1w/QU0XEOSxC1tG/K+o22t9Cmj+ONtfk9X2fsC9mL1Kz4aVbcNs5fMkInB3QLB7GRlPLR0fbMLgE3PmCTS7pQ69pEdE93J+DE/Vc7m9jDOd3djS8brC6rrNrD2X5s79GEm6KQHwc1XkXf/bivLs2v+uNsZjZORGcieORR/zk6g0mT5kHx/GQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB10111.eurprd08.prod.outlook.com (2603:10a6:20b:62d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:51:34 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:51:32 +0000 To: 'GNU C Library' <libc-alpha@sourceware.org> CC: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Subject: [PATCH] AArch64: Improve strlen_asimd Thread-Topic: [PATCH] AArch64: Improve strlen_asimd Thread-Index: AQHZJp2JkWdXUUw02UqIwGwR+HcG/Q== Date: Thu, 12 Jan 2023 15:51:32 +0000 Message-ID: <PAWPR08MB8982BE5A8C1635DBCB89700D83FD9@PAWPR08MB8982.eurprd08.prod.outlook.com> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB10111:EE_|DBAEUR03FT030:EE_|DU0PR08MB7590:EE_ X-MS-Office365-Filtering-Correlation-Id: f5b6b68c-ae57-491d-45df-08daf4b4e647 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ijGM0Uk0r+8wfVKWCowyBG5hkgOcVVyAOCE4z+ePDywTqPCLZkNSu/wiuZze3PL4FUCSV8c41+khjLhXDsQxGiagqZqwF45suMpSsPQ15OjuiTCESRGvyn7NxEdZ4itXF22q4W21pKNqhtYI8t+RIGfI9G8/S9aKS7UE7pmNLSi7s7TlxycgiTA8+owKhR3CL1y6JVQEa5isKcaebqmwcKoclYuMLerxfsr6smqE7hxtSsrp7Q1qIZLqcN4Qm5NgyEk4HY66UGI8f8gBVjhuju2MT3y+0RMHlMccU/IENY266VeHBYyKsTUBcvsj9/HyVqTsBJnUvNpYnBUaJ8wJNYgUxw7dbBiY3A9GocnRzk+NM2RTgrk6KX+6eXvzFG4jOwTyfmysOHXSlpH2RpccY3RNQ+L0nx7Ejioxz9xzSehDf/kGTkjEL3oJ0pheo5pC8kPgFkxJTtTkhkNUDFP4PzPccFWekdBe3m8OJ3lzECh6tGWrY6AZrCC4YdF8tufSpG2NNEHTm03EkIav5v2yQuE9yt0zFNvsjSPVaGHytJFjN8C+yIXoD/628vqIApalvzmqhPnXy61LH98sI/fIB+9gHIb7jNpu9TmbHU5KYnsRtNjI0HqYvuAf2OKcEZgyo8Pz/J0WQtXXSJ9LhCQBJMdrbhs18intW1jzGK9q9aye+oMjlXzrFlJ13du1o5OYJudc9u/JAx+vWUdeVMApKg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(376002)(396003)(39860400002)(366004)(136003)(346002)(451199015)(86362001)(2906002)(5660300002)(8936002)(66446008)(41300700001)(66556008)(91956017)(33656002)(66476007)(52536014)(8676002)(66946007)(64756008)(76116006)(38100700002)(83380400001)(4326008)(316002)(55016003)(6916009)(6506007)(26005)(9686003)(71200400001)(186003)(7696005)(478600001)(122000001)(38070700005); DIR:OUT; SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB10111 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c4c77672-508b-4d3d-1792-08daf4b4e087 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: dUtFvdTCPiHF2KO5N+fW406cyR8mh7nmXqsDs3YHBTyeHVgcq0axWPgkTGwkNp6qucPvtits2Dq/CFaUcoKnGH4HKEpLSE987YIjHrlVqLfBFW2lLOtgaCDg1Jv7d5PRDs3GDwubumw2EYKmgH2Ij9TXDIIs5CHGKxEc3cz+DIuTfMXJDRJIhNZbfJsc+Ub7T/oUDHJzKOPPno/GX5vllRUcR0Iz6fbItiU40psBjQzEZlZavJ0JFXtV6XKZpSHEK+LMgh+fdS4X1VypOO8Cuxp1tg/W9JCyVSbOKN+/TqCoB1KEZtOb/zjxvSnY7DJXVFyEsj9BM72xkvynGronH461Ioe9YkbrnTFEWM9xmfQUk+mWaZBbMdAuDqJTgKC47lJhAMmiUW0bDwpXE3zkKP5rKElwqZ3XKMD81sbVs1rnVuv6EAyGW+pdMbKifcxNw3QfHEb0V7r12Ra4PQasJl2L0e0SpN+GZiyyHPGldcHivQ/0WqrktBvu215Sj4XJIKuu2jIUMrBzX5gmPTXCfjbv83O7PRJOyaqOPz/bbl1mZcSmlZD/m9sGjBg8mrBqz0ncI2IVAszagl7hv+kRUxr2l57vzOZadD0HCvYVixkwLRnOn1Jses03oZGiNILRDqNHm+8VUQ9rekWLfPcVQeO39ZuJ7zvLi0d2ZAFlfdxxwhAkeeumeqH7ONilr7YvXVEKUuEB7VWnDPmIvHB3+w== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(136003)(376002)(346002)(39860400002)(396003)(451199015)(36840700001)(46966006)(40470700004)(82310400005)(40460700003)(478600001)(83380400001)(9686003)(36860700001)(6916009)(86362001)(5660300002)(2906002)(316002)(4326008)(33656002)(82740400003)(26005)(70586007)(6506007)(336012)(70206006)(40480700001)(7696005)(55016003)(8936002)(52536014)(356005)(8676002)(81166007)(47076005)(186003)(41300700001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:51:42.0911 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f5b6b68c-ae57-491d-45df-08daf4b4e647 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7590 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Wilco Dijkstra via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
AArch64: Improve strlen_asimd
|
|
Checks
Context | Check | Description |
---|---|---|
dj/TryBot-apply_patch | success | Patch applied to master at the time it was sent |
dj/TryBot-32bit | success | Build for i686 |
Commit Message
Wilco Dijkstra
Jan. 12, 2023, 3:51 p.m. UTC
Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment. Performance improves slightly as a result. Passes regress. ---
Comments
The 01/12/2023 15:51, Wilco Dijkstra wrote: > Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment. > Performance improves slightly as a result. Passes regress. > I prefer to commit this and the other string function optimization patches and not delay to next release so start using and widely testing them sooner (we can fix and backport perf regressions). please commit it, thanks. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> > --- > > diff --git a/sysdeps/aarch64/multiarch/strlen_asimd.S b/sysdeps/aarch64/multiarch/strlen_asimd.S > index ca6ab96ecf2de45def79539facd8e0b86f4edc95..490439491d19c3f14b0228f42248bc8aa6e9e8bd 100644 > --- a/sysdeps/aarch64/multiarch/strlen_asimd.S > +++ b/sysdeps/aarch64/multiarch/strlen_asimd.S > @@ -48,6 +48,7 @@ > #define tmp x2 > #define tmpw w2 > #define synd x3 > +#define syndw w3 > #define shift x4 > > /* For the first 32 bytes, NUL detection works on the principle that > @@ -87,7 +88,6 @@ > > ENTRY (__strlen_asimd) > PTR_ARG (0) > - > and tmp1, srcin, MIN_PAGE_SIZE - 1 > cmp tmp1, MIN_PAGE_SIZE - 32 > b.hi L(page_cross) > @@ -123,7 +123,6 @@ ENTRY (__strlen_asimd) > add len, len, tmp1, lsr 3 > ret > > - .p2align 3 > /* Look for a NUL byte at offset 16..31 in the string. */ > L(bytes16_31): > ldp data1, data2, [srcin, 16] > @@ -151,6 +150,7 @@ L(bytes16_31): > add len, len, tmp1, lsr 3 > ret > > + nop > L(loop_entry): > bic src, srcin, 31 > > @@ -166,18 +166,12 @@ L(loop): > /* Low 32 bits of synd are non-zero if a NUL was found in datav1. */ > cmeq maskv.16b, datav1.16b, 0 > sub len, src, srcin > - tst synd, 0xffffffff > - b.ne 1f > + cbnz syndw, 1f > cmeq maskv.16b, datav2.16b, 0 > add len, len, 16 > 1: > /* Generate a bitmask and compute correct byte offset. */ > -#ifdef __AARCH64EB__ > - bic maskv.8h, 0xf0 > -#else > - bic maskv.8h, 0x0f, lsl 8 > -#endif > - umaxp maskv.16b, maskv.16b, maskv.16b > + shrn maskv.8b, maskv.8h, 4 > fmov synd, maskd > #ifndef __AARCH64EB__ > rbit synd, synd > @@ -186,8 +180,6 @@ L(loop): > add len, len, tmp, lsr 2 > ret > > - .p2align 4 > - > L(page_cross): > bic src, srcin, 31 > mov tmpw, 0x0c03
On 1/13/23 07:25, Szabolcs Nagy via Libc-alpha wrote: > The 01/12/2023 15:51, Wilco Dijkstra wrote: >> Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment. >> Performance improves slightly as a result. Passes regress. >> > > I prefer to commit this and the other string function optimization > patches and not delay to next release so start using and widely > testing them sooner (we can fix and backport perf regressions). > > please commit it, thanks. > > Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Yes, please commit this ASAP. As RM I would like to see machine testing start next week and so I'd like to see such changes committed early this week.
Hi Carlos, >> I prefer to commit this and the other string function optimization >> patches and not delay to next release so start using and widely >> testing them sooner (we can fix and backport perf regressions). >> >> please commit it, thanks. >> >> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> > > Yes, please commit this ASAP. > > As RM I would like to see machine testing start next week and so I'd like to > see such changes committed early this week. Thanks, I've pushed it. Cheers, Wilco
diff --git a/sysdeps/aarch64/multiarch/strlen_asimd.S b/sysdeps/aarch64/multiarch/strlen_asimd.S index ca6ab96ecf2de45def79539facd8e0b86f4edc95..490439491d19c3f14b0228f42248bc8aa6e9e8bd 100644 --- a/sysdeps/aarch64/multiarch/strlen_asimd.S +++ b/sysdeps/aarch64/multiarch/strlen_asimd.S @@ -48,6 +48,7 @@ #define tmp x2 #define tmpw w2 #define synd x3 +#define syndw w3 #define shift x4 /* For the first 32 bytes, NUL detection works on the principle that @@ -87,7 +88,6 @@ ENTRY (__strlen_asimd) PTR_ARG (0) - and tmp1, srcin, MIN_PAGE_SIZE - 1 cmp tmp1, MIN_PAGE_SIZE - 32 b.hi L(page_cross) @@ -123,7 +123,6 @@ ENTRY (__strlen_asimd) add len, len, tmp1, lsr 3 ret - .p2align 3 /* Look for a NUL byte at offset 16..31 in the string. */ L(bytes16_31): ldp data1, data2, [srcin, 16] @@ -151,6 +150,7 @@ L(bytes16_31): add len, len, tmp1, lsr 3 ret + nop L(loop_entry): bic src, srcin, 31 @@ -166,18 +166,12 @@ L(loop): /* Low 32 bits of synd are non-zero if a NUL was found in datav1. */ cmeq maskv.16b, datav1.16b, 0 sub len, src, srcin - tst synd, 0xffffffff - b.ne 1f + cbnz syndw, 1f cmeq maskv.16b, datav2.16b, 0 add len, len, 16 1: /* Generate a bitmask and compute correct byte offset. */ -#ifdef __AARCH64EB__ - bic maskv.8h, 0xf0 -#else - bic maskv.8h, 0x0f, lsl 8 -#endif - umaxp maskv.16b, maskv.16b, maskv.16b + shrn maskv.8b, maskv.8h, 4 fmov synd, maskd #ifndef __AARCH64EB__ rbit synd, synd @@ -186,8 +180,6 @@ L(loop): add len, len, tmp, lsr 2 ret - .p2align 4 - L(page_cross): bic src, srcin, 31 mov tmpw, 0x0c03