From patchwork Wed Jun 3 09:43:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Corallo X-Patchwork-Id: 39431 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7BA1D383E800; Wed, 3 Jun 2020 09:43:45 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20071.outbound.protection.outlook.com [40.107.2.71]) by sourceware.org (Postfix) with ESMTPS id 3426E383E81A for ; Wed, 3 Jun 2020 09:43:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3426E383E81A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Andrea.Corallo@arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IOGIxPWCnZnlDWE8/PZdpXZV+llH6JQ5ZPo9iIRmcc8=; b=nwlYJVHuJGer89nucNUUXNbFDZ19PzQ6hnqqQQ9UkJ1gunyV/kWucZ9jf2U7zQf//FD/TBRQ1myXxwnZ5Si+wQ+HDKHAvEJieHQQGc7xJHpRiXzW+HLdW73AIRCln9k3fRIKVgDOafcTRteh9ZZMKRHtyEwUSFOpTttoaBb5Gto= Received: from AM5PR04CA0014.eurprd04.prod.outlook.com (2603:10a6:206:1::27) by VI1PR08MB3344.eurprd08.prod.outlook.com (2603:10a6:803:3f::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3045.21; Wed, 3 Jun 2020 09:43:37 +0000 Received: from AM5EUR03FT021.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:1:cafe::76) by AM5PR04CA0014.outlook.office365.com (2603:10a6:206:1::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Wed, 3 Jun 2020 09:43:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; sourceware.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT021.mail.protection.outlook.com (10.152.16.105) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Wed, 3 Jun 2020 09:43:37 +0000 Received: ("Tessian outbound 444e8e881ac1:v57"); Wed, 03 Jun 2020 09:43:36 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: bcc51b2aba479620 X-CR-MTA-TID: 64aa7808 Received: from d667f7367aef.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 663C4A7C-7937-4560-9FB2-22A0D648FA22.1; Wed, 03 Jun 2020 09:43:31 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id d667f7367aef.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 03 Jun 2020 09:43:31 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S8BLI8d1otQAKdLzg0bRZ/ODoSh3blAmivjCKDITaQjPBcYiuv18HcBbrxDut/JrLtGdTEkFscOopoZQaH/fnq7im52H2qCXBcturzvTl4urFwy4tp9Z2pY25v3wTvcEhiumwG4mTIjkSjUqJkhXXvlcAm0M96RppaqNxojRZOwJXGlVF9lEKtPEk7WVCs7AiCwtx+4NLPDvsDnYan5nkAlt6fLUu8w36FklmbCAyesTLNneAZHaf1qFjIxOO+/OizWf7Q0c51TYtU0i2svz0F8QZjqnrXWUdsygq2Ps5GagN4QF9uZGjcFBFHzUk5R9mj5kv92oYTU3qqAJ+ng/vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IOGIxPWCnZnlDWE8/PZdpXZV+llH6JQ5ZPo9iIRmcc8=; b=MWBH3j6mNuQ54vCaOtTnv6+7EmnFsE4/jgMDsHDyETue9VD5W0Av8H9Uo4I3STy0eHbtQFwIsEwaqrPpGqopkxaMpNUw6XgkeZ/I7ulq3dXpEO9jdlguR7ZQJIVC1u4xYmN8dnGqsTN2afI9SBT6/lXr7pfcT5+kmk4hvt0HN03v5suV3oFCsPCFoKYMtEOm1jWtvEEuOSZvdUhfY/TjG0nU2WTL1dJLgATlbkVN26K7ZcpOHTE+HDLrQ44hOf7z1ga3w9Y7kGk6PmLv0dN8reIEklK5SrrhtdCZgswLqaD8mw3D/8NE4PDEdwVazFpxzDabrw+aYbUzmucCrl5mbg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IOGIxPWCnZnlDWE8/PZdpXZV+llH6JQ5ZPo9iIRmcc8=; b=nwlYJVHuJGer89nucNUUXNbFDZ19PzQ6hnqqQQ9UkJ1gunyV/kWucZ9jf2U7zQf//FD/TBRQ1myXxwnZ5Si+wQ+HDKHAvEJieHQQGc7xJHpRiXzW+HLdW73AIRCln9k3fRIKVgDOafcTRteh9ZZMKRHtyEwUSFOpTttoaBb5Gto= Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Received: from DB7PR08MB3594.eurprd08.prod.outlook.com (2603:10a6:10:4e::11) by DB7PR08MB3739.eurprd08.prod.outlook.com (2603:10a6:10:79::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18; Wed, 3 Jun 2020 09:43:29 +0000 Received: from DB7PR08MB3594.eurprd08.prod.outlook.com ([fe80::5447:f1c0:97c:aa35]) by DB7PR08MB3594.eurprd08.prod.outlook.com ([fe80::5447:f1c0:97c:aa35%7]) with mapi id 15.20.3045.024; Wed, 3 Jun 2020 09:43:29 +0000 From: Andrea Corallo To: libc-alpha@sourceware.org Subject: [PATCH] aarch64: MTE compatible strchrnul Date: Wed, 03 Jun 2020 11:43:25 +0200 Message-ID: X-ClientProxiedBy: LO2P265CA0452.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:e::32) To DB7PR08MB3594.eurprd08.prod.outlook.com (2603:10a6:10:4e::11) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from e112547 (217.140.96.140) by LO2P265CA0452.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:e::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Wed, 3 Jun 2020 09:43:28 +0000 X-Originating-IP: [217.140.96.140] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: bcf388f0-aee7-47b5-393b-08d807a296f5 X-MS-TrafficTypeDiagnostic: DB7PR08MB3739:|VI1PR08MB3344: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:9508;OLM:9508; X-Forefront-PRVS: 04238CD941 X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 2vXZs0iKG+sXfbtTwd08O/evbGw9wqwaAjX92FQT5spOOTOjmvIzsvtyv5PplxF6zkS9ZN/44ABpJ6uCwYes9LI6N4TzRa0iHE7NqD73YTlvri/wIwmoaqqNvTckSKIKGFMl6ndxKKzuC8nVTeVk1TePFOFWSIrVn/XzUkWQYpgy6rtOlqXC26lFVKi11apdHo1ZRZ2/yE9GgEl9BrXK+FqkaxOuC/LtpH0haUVyNNzWKu06jNwddc75C3gn/+hEi9c413ZM9abOCkYiGCiJiSJ4TNx2k08T5D7GtgtOKYytkvZ+Br3eKeq06cSBMMuO0rTMEfg1pBS9xjkVWkcNXR01QBRzL1KY0RUEW0Jn1wUIo+NbzsxXGnjZLWi5esIr X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DB7PR08MB3594.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(366004)(376002)(346002)(396003)(136003)(39860400002)(5660300002)(52116002)(44832011)(6486002)(235185007)(956004)(36756003)(2616005)(16526019)(6666004)(6916009)(4326008)(26005)(186003)(8936002)(66616009)(66946007)(86362001)(478600001)(66556008)(2906002)(6496006)(8676002)(66476007)(316002)(156123004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: 6DRJ4pfyK+pfN96eqomMlc/fX1g9fg0kb/hxVCzUUDZ+nfOjGiu3mJF6XYh3yn7FEqAgMqWWnhHi1D+dC4PLKVaIxKJaEcUJ5GVONgrCMkVVltiY2npPvWFdSr0emMoPjNJubfUKnGdZ7LVTXC21EtXfAanp/wOYOxTRF0Chi9axSR38g5iKddi3yutqzJmA5SEKW+fEJm4SPdPj63yo134IU+njGpsg8mIPrUOVJbJ+Rbk/51YsnZEzWnzoWz5F4NM62Rm7wBLTaVzeWKZCnNsnyd5bGUvk3ch0UWGnVwxi+S7L18xyLGk8xBHJawbHEYP9AIZdfUGkrm+qgDpMAR6B0Ib0t2STmQeVxV3zEEYzJ5jQjZriv/S32pJ1j3/d/XKR3iu0IFvHy2bLUsNHv9jcA6eQHoCg3v17sd92WCtq1V+nmtfCcbVQivGZO8CqbrSnhCZXBry/dpS3DgXsfWYR67Bxli7KNPpI3HGrMSVsVcJKx+y53QjtpJ9gh7Cx X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3739 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT021.eop-EUR03.prod.protection.outlook.com X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFTY:; SFS:(4636009)(376002)(136003)(346002)(396003)(39860400002)(46966005)(356005)(70206006)(86362001)(47076004)(81166007)(6486002)(82740400003)(6916009)(235185007)(478600001)(36756003)(6496006)(8936002)(82310400002)(5660300002)(70586007)(36906005)(26005)(336012)(44832011)(2616005)(66616009)(2906002)(316002)(956004)(16526019)(4326008)(186003)(8676002)(6666004)(156123004); DIR:OUT; SFP:1101; X-MS-Office365-Filtering-Correlation-Id-Prvs: e3d23406-0cc1-4a0a-f142-08d807a29233 X-Forefront-PRVS: 04238CD941 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: yZkHGLApT3NJbYovH6I4tUWT64fhX7osfmlNbebY0ko/9cj5p2zXEjd/qp7bSO3h9w+bZjiea51BneITKS9w3yabHplbT9zoIt8/5CDMVQSujrhzqxUALUhYtlGNreyv3mgroUd6q/H6jDtz2GeeNkI16fdla+yKvXjI58GWfp8GbkVZc9WUgho4mEyApKffWrsAb11tcRCfqEiv90OzpPEFlcf1Nd9WTKw65ZyCxjwVn63XNtf/aS1ylQl6pNjBq6/9VmpmGRPIsyfBZCTovps8x/eY5P/lH7tgcwbCYH5wsdtrHFLrQfbPeKXKBc696G+4ErMDjIG3Y8jl7EJyLWP6DLg4IJu52Dnx4JQNJO/R6LgzrTzZJ5oc97t9/CQlzI8ryghIN4LuDQN4vqawl4bKUkM/ZDjn7iZnvqXTBSQ= X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Jun 2020 09:43:37.0825 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: bcf388f0-aee7-47b5-393b-08d807a296f5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3344 X-Spam-Status: No, score=-18.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: nd@arm.com, Wilco.Dijkstra@arm.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Hi all, I'd like to submit this patch introducing an Arm MTE compatible strchrnul implementation. Follows a performance comparison of the strchrnul benchmark run on Cortex-A72, Cortex-A53, Neoverse N1. | length | alignment | perf-uplift A72 | perf-uplift A53 | perf-uplift N1 | |--------+-----------+-----------------+-----------------+----------------| | 32 | 0 | 1.16x | 1.07x | 1.35x | | 32 | 1 | 1.25x | 1.16x | 1.15x | | 64 | 0 | 1.26x | 0.97x | 1.20x | | 64 | 2 | 1.35x | 1.04x | 1.30x | | 128 | 0 | 1.12x | 0.84x | 1.22x | | 128 | 3 | 1.25x | 0.87x | 1.30x | | 256 | 0 | 1.14x | 0.84x | 1.16x | | 256 | 4 | 1.24x | 0.81x | 1.16x | | 512 | 0 | 1.15x | 0.80x | 1.13x | | 512 | 5 | 1.17x | 0.81x | 1.14x | | 1024 | 0 | 1.14x | 0.78x | 1.08x | | 1024 | 6 | 1.03x | 0.78x | 1.10x | | 2048 | 0 | 1.12x | 0.76x | 1.08x | | 2048 | 7 | 1.14x | 0.77x | 1.09x | | 64 | 1 | 1.35x | 1.04x | 1.37x | | 64 | 1 | 1.36x | 1.04x | 1.37x | | 64 | 2 | 1.36x | 1.04x | 1.37x | | 64 | 2 | 1.37x | 1.04x | 1.38x | | 64 | 3 | 1.38x | 1.04x | 1.36x | | 64 | 3 | 1.40x | 1.04x | 1.36x | | 64 | 4 | 1.41x | 1.04x | 1.36x | | 64 | 4 | 1.36x | 1.04x | 1.36x | | 64 | 5 | 1.34x | 1.04x | 1.40x | | 64 | 5 | 1.35x | 1.04x | 1.36x | | 64 | 6 | 1.34x | 1.04x | 1.37x | | 64 | 6 | 1.41x | 1.04x | 1.37x | | 64 | 7 | 1.39x | 1.04x | 1.36x | | 64 | 7 | 1.34x | 1.04x | 1.37x | | 0 | 0 | 1.18x | 1.63x | 1.66x | | 0 | 0 | 1.18x | 1.63x | 1.66x | | 1 | 0 | 1.18x | 1.63x | 1.66x | | 1 | 0 | 1.18x | 1.63x | 1.67x | | 2 | 0 | 1.18x | 1.63x | 1.66x | | 2 | 0 | 1.18x | 1.63x | 1.65x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 4 | 0 | 1.18x | 1.63x | 1.65x | | 4 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.64x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.65x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.64x | | 11 | 0 | 1.18x | 1.63x | 1.63x | | 12 | 0 | 1.18x | 1.63x | 1.63x | | 12 | 0 | 1.18x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.63x | | 13 | 0 | 1.18x | 1.63x | 1.63x | | 14 | 0 | 1.18x | 1.63x | 1.63x | | 14 | 0 | 1.18x | 1.63x | 1.22x | | 15 | 0 | 1.19x | 1.63x | 1.22x | | 15 | 0 | 1.18x | 1.63x | 1.63x | | 16 | 0 | 1.03x | 0.96x | 1.15x | | 16 | 0 | 1.03x | 0.96x | 1.13x | | 17 | 0 | 1.03x | 0.96x | 0.98x | | 17 | 0 | 1.03x | 0.96x | 0.98x | | 18 | 0 | 1.03x | 0.96x | 0.98x | | 18 | 0 | 1.03x | 0.96x | 0.98x | | 19 | 0 | 1.04x | 0.96x | 0.98x | | 19 | 0 | 1.04x | 0.96x | 0.98x | | 20 | 0 | 1.04x | 0.96x | 1.00x | | 20 | 0 | 1.03x | 0.96x | 0.99x | | 21 | 0 | 1.04x | 0.96x | 0.99x | | 21 | 0 | 1.03x | 0.96x | 1.14x | | 22 | 0 | 1.04x | 0.96x | 1.14x | | 22 | 0 | 1.03x | 0.96x | 1.14x | | 23 | 0 | 1.03x | 0.96x | 1.13x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 24 | 0 | 1.04x | 0.96x | 1.13x | | 24 | 0 | 1.04x | 0.95x | 1.13x | | 25 | 0 | 1.03x | 0.96x | 1.15x | | 25 | 0 | 1.04x | 0.96x | 1.12x | | 26 | 0 | 1.04x | 0.96x | 1.13x | | 26 | 0 | 1.02x | 0.96x | 1.13x | | 27 | 0 | 1.04x | 0.96x | 1.13x | | 27 | 0 | 1.03x | 0.96x | 1.13x | | 28 | 0 | 1.03x | 0.96x | 0.98x | | 28 | 0 | 1.04x | 0.96x | 1.05x | | 29 | 0 | 1.02x | 0.96x | 1.00x | | 29 | 0 | 1.03x | 0.96x | 1.00x | | 30 | 0 | 1.04x | 0.96x | 1.00x | | 30 | 0 | 1.04x | 0.96x | 1.00x | | 31 | 0 | 1.04x | 0.96x | 0.99x | | 31 | 0 | 1.03x | 0.96x | 0.99x | | 32 | 0 | 1.09x | 1.07x | 1.09x | | 32 | 1 | 1.25x | 1.15x | 1.38x | | 64 | 0 | 1.27x | 0.98x | 1.20x | | 64 | 2 | 1.41x | 1.04x | 1.30x | | 128 | 0 | 1.15x | 0.84x | 1.22x | | 128 | 3 | 1.23x | 0.87x | 1.30x | | 256 | 0 | 1.16x | 0.84x | 1.16x | | 256 | 4 | 1.23x | 0.81x | 1.17x | | 512 | 0 | 1.14x | 0.80x | 1.12x | | 512 | 5 | 1.18x | 0.81x | 1.14x | | 1024 | 0 | 1.16x | 0.78x | 1.09x | | 1024 | 6 | 1.03x | 0.78x | 1.11x | | 2048 | 0 | 1.14x | 0.76x | 1.08x | | 2048 | 7 | 1.14x | 0.77x | 1.09x | | 64 | 1 | 1.40x | 1.04x | 1.37x | | 64 | 1 | 1.40x | 1.04x | 1.37x | | 64 | 2 | 1.35x | 1.04x | 1.37x | | 64 | 2 | 1.38x | 1.04x | 1.37x | | 64 | 3 | 1.36x | 1.04x | 1.37x | | 64 | 3 | 1.34x | 1.04x | 1.37x | | 64 | 4 | 1.41x | 1.04x | 1.37x | | 64 | 4 | 1.38x | 1.04x | 1.37x | | 64 | 5 | 1.36x | 1.04x | 1.37x | | 64 | 5 | 1.36x | 1.04x | 1.37x | | 64 | 6 | 1.35x | 1.04x | 1.37x | | 64 | 6 | 1.40x | 1.04x | 1.37x | | 64 | 7 | 1.35x | 1.04x | 1.37x | | 64 | 7 | 1.40x | 1.04x | 1.37x | | 0 | 0 | 1.19x | 1.63x | 1.66x | | 0 | 0 | 1.19x | 1.63x | 1.66x | | 1 | 0 | 1.19x | 1.63x | 1.66x | | 1 | 0 | 1.19x | 1.63x | 1.66x | | 2 | 0 | 1.18x | 1.63x | 1.63x | | 2 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.20x | 1.63x | 1.63x | | 4 | 0 | 1.18x | 1.63x | 1.63x | | 4 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 8 | 0 | 1.18x | 1.63x | 1.25x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.66x | | 12 | 0 | 1.18x | 1.63x | 1.66x | | 12 | 0 | 1.19x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.66x | | 14 | 0 | 1.19x | 1.63x | 1.66x | | 14 | 0 | 1.19x | 1.63x | 1.66x | | 15 | 0 | 1.18x | 1.63x | 1.66x | | 15 | 0 | 1.18x | 1.63x | 1.66x | | 16 | 0 | 1.03x | 0.96x | 1.00x | | 16 | 0 | 1.03x | 0.96x | 1.00x | | 17 | 0 | 1.03x | 0.96x | 1.00x | | 17 | 0 | 1.03x | 0.96x | 1.15x | | 18 | 0 | 1.03x | 0.96x | 1.14x | | 18 | 0 | 1.04x | 0.96x | 1.15x | | 19 | 0 | 1.04x | 0.96x | 1.15x | | 19 | 0 | 1.04x | 0.96x | 1.15x | | 20 | 0 | 1.04x | 0.96x | 1.15x | | 20 | 0 | 1.03x | 0.96x | 1.15x | | 21 | 0 | 1.04x | 0.96x | 1.15x | | 21 | 0 | 1.03x | 0.96x | 1.15x | | 22 | 0 | 1.02x | 0.96x | 1.15x | | 22 | 0 | 1.03x | 0.96x | 1.15x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 24 | 0 | 1.03x | 0.96x | 1.00x | | 24 | 0 | 1.02x | 0.96x | 1.00x | | 25 | 0 | 1.04x | 0.96x | 1.00x | | 25 | 0 | 1.03x | 0.96x | 1.16x | | 26 | 0 | 1.04x | 0.96x | 1.15x | | 26 | 0 | 1.03x | 0.96x | 1.15x | | 27 | 0 | 1.04x | 0.96x | 1.00x | | 27 | 0 | 1.03x | 0.96x | 1.00x | | 28 | 0 | 1.04x | 0.96x | 1.00x | | 28 | 0 | 1.04x | 0.96x | 1.00x | | 29 | 0 | 1.03x | 0.96x | 1.00x | | 29 | 0 | 1.04x | 0.96x | 1.15x | | 30 | 0 | 1.04x | 0.96x | 1.15x | | 30 | 0 | 1.03x | 0.95x | 1.00x | | 31 | 0 | 1.03x | 0.96x | 1.00x | | 31 | 0 | 1.04x | 0.96x | 1.00x | This patch is passing GLIBC tests. Regards Andrea 8< --- 8< --- 8< Introduce an Arm MTE compatible strchrnul implementation. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show performance regressions. Co-authored-by: Wilco Dijkstra diff --git a/sysdeps/aarch64/strchrnul.S b/sysdeps/aarch64/strchrnul.S index a65be6cba8..1ae4598f82 100644 --- a/sysdeps/aarch64/strchrnul.S +++ b/sysdeps/aarch64/strchrnul.S @@ -22,109 +22,75 @@ /* Assumptions: * - * ARMv8-a, AArch64 - * Neon Available. + * ARMv8-a, AArch64, Advanced SIMD. + * MTE compatible. */ -/* Arguments and results. */ #define srcin x0 #define chrin w1 - #define result x0 -/* Locals and temporaries. */ - #define src x2 -#define tmp1 x3 -#define wtmp2 w4 -#define tmp3 x5 +#define tmp1 x1 +#define tmp2 x3 +#define tmp2w w3 #define vrepchr v0 -#define vdata1 v1 -#define vdata2 v2 -#define vhas_nul1 v3 -#define vhas_nul2 v4 -#define vhas_chr1 v5 -#define vhas_chr2 v6 -#define vrepmask v7 -#define vend1 v16 - -/* Core algorithm. - - For each 32-byte hunk we calculate a 64-bit syndrome value, with - two bits per byte (LSB is always in bits 0 and 1, for both big - and little-endian systems). For each tuple, bit 0 is set iff - the relevant byte matched the requested character or nul. Since the - bits in the syndrome reflect exactly the order in which things occur - in the original string a count_trailing_zeros() operation will - identify exactly which byte is causing the termination. */ +#define vdata v1 +#define qdata q1 +#define vhas_nul v2 +#define vhas_chr v3 +#define vrepmask v4 +#define vend v5 +#define dend d5 + +/* Core algorithm: + + For each 16-byte chunk we calculate a 64-bit syndrome value with four bits + per byte. For even bytes, bits 0-3 are set if the relevant byte matched the + requested character or the byte is NUL. Bits 4-7 must be zero. Bits 4-7 are + set likewise for odd bytes so that adjacent bytes can be merged. Since the + bits in the syndrome reflect the order in which things occur in the original + string, counting trailing zeros identifies exactly which byte matched. */ ENTRY (__strchrnul) DELOUSE (0) - /* Magic constant 0x40100401 to allow us to identify which lane - matches the termination condition. */ - mov wtmp2, #0x0401 - movk wtmp2, #0x4010, lsl #16 + bic src, srcin, 15 dup vrepchr.16b, chrin - bic src, srcin, #31 /* Work with aligned 32-byte hunks. */ - dup vrepmask.4s, wtmp2 - ands tmp1, srcin, #31 - b.eq L(loop) - - /* Input string is not 32-byte aligned. Rather than forcing - the padding bytes to a safe value, we calculate the syndrome - for all the bytes, but then mask off those bits of the - syndrome that are related to the padding. */ - ld1 {vdata1.16b, vdata2.16b}, [src], #32 - neg tmp1, tmp1 - cmeq vhas_nul1.16b, vdata1.16b, #0 - cmeq vhas_chr1.16b, vdata1.16b, vrepchr.16b - cmeq vhas_nul2.16b, vdata2.16b, #0 - cmeq vhas_chr2.16b, vdata2.16b, vrepchr.16b - orr vhas_chr1.16b, vhas_chr1.16b, vhas_nul1.16b - orr vhas_chr2.16b, vhas_chr2.16b, vhas_nul2.16b - and vhas_chr1.16b, vhas_chr1.16b, vrepmask.16b - and vhas_chr2.16b, vhas_chr2.16b, vrepmask.16b - lsl tmp1, tmp1, #1 - addp vend1.16b, vhas_chr1.16b, vhas_chr2.16b // 256->128 - mov tmp3, #~0 - addp vend1.16b, vend1.16b, vend1.16b // 128->64 - lsr tmp1, tmp3, tmp1 - - mov tmp3, vend1.2d[0] - bic tmp1, tmp3, tmp1 // Mask padding bits. - cbnz tmp1, L(tail) + ld1 {vdata.16b}, [src] + mov tmp2w, 0xf00f + dup vrepmask.8h, tmp2w + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_chr.16b, vhas_chr.16b, vdata.16b + lsl tmp2, srcin, 2 + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov tmp1, dend + lsr tmp1, tmp1, tmp2 /* Mask padding bits. */ + cbz tmp1, L(loop) + rbit tmp1, tmp1 + clz tmp1, tmp1 + add result, srcin, tmp1, lsr 2 + ret + + .p2align 4 L(loop): - ld1 {vdata1.16b, vdata2.16b}, [src], #32 - cmeq vhas_nul1.16b, vdata1.16b, #0 - cmeq vhas_chr1.16b, vdata1.16b, vrepchr.16b - cmeq vhas_nul2.16b, vdata2.16b, #0 - cmeq vhas_chr2.16b, vdata2.16b, vrepchr.16b - /* Use a fast check for the termination condition. */ - orr vhas_chr1.16b, vhas_nul1.16b, vhas_chr1.16b - orr vhas_chr2.16b, vhas_nul2.16b, vhas_chr2.16b - orr vend1.16b, vhas_chr1.16b, vhas_chr2.16b - addp vend1.2d, vend1.2d, vend1.2d - mov tmp1, vend1.2d[0] + ldr qdata, [src, 16]! + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_chr.16b, vhas_chr.16b, vdata.16b + umaxp vend.16b, vhas_chr.16b, vhas_chr.16b + fmov tmp1, dend cbz tmp1, L(loop) - /* Termination condition found. Now need to establish exactly why - we terminated. */ - and vhas_chr1.16b, vhas_chr1.16b, vrepmask.16b - and vhas_chr2.16b, vhas_chr2.16b, vrepmask.16b - addp vend1.16b, vhas_chr1.16b, vhas_chr2.16b // 256->128 - addp vend1.16b, vend1.16b, vend1.16b // 128->64 - - mov tmp1, vend1.2d[0] -L(tail): - /* Count the trailing zeros, by bit reversing... */ + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov tmp1, dend +#ifndef __AARCH64EB__ rbit tmp1, tmp1 - /* Re-bias source. */ - sub src, src, #32 - clz tmp1, tmp1 /* ... and counting the leading zeros. */ - /* tmp1 is twice the offset into the fragment. */ - add result, src, tmp1, lsr #1 +#endif + clz tmp1, tmp1 + add result, src, tmp1, lsr 2 ret END(__strchrnul)