From patchwork Wed Jun 3 09:49:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Corallo X-Patchwork-Id: 39432 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3AE1B383E806; Wed, 3 Jun 2020 09:49:27 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2083.outbound.protection.outlook.com [40.107.20.83]) by sourceware.org (Postfix) with ESMTPS id C4959383F860 for ; Wed, 3 Jun 2020 09:49:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C4959383F860 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Andrea.Corallo@arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PrmnvWLvto93X77xdPgtn1IYxDfB2tbB04exd2A+MOs=; b=n25wtUgITFscsiBAyxz5yM4VxciPQHdPdywWemP/dcRJKWwJ5XTT8C+fSHyuKwVeekn6S9K2J9dQmFJQpGq5rQPSZVjb/5Iy+YjbbMcIWX63FYZeq6zpiHBKn1mxeY7nUYlEgwH1DY63dvWIk+9nLFJ8lsdleUOBokDxIHkwdLE= Received: from DB6PR0301CA0001.eurprd03.prod.outlook.com (2603:10a6:4:3e::11) by AM0PR08MB4276.eurprd08.prod.outlook.com (2603:10a6:208:13a::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3045.21; Wed, 3 Jun 2020 09:49:21 +0000 Received: from DB5EUR03FT051.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:3e:cafe::e2) by DB6PR0301CA0001.outlook.office365.com (2603:10a6:4:3e::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Wed, 3 Jun 2020 09:49:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; sourceware.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT051.mail.protection.outlook.com (10.152.21.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Wed, 3 Jun 2020 09:49:20 +0000 Received: ("Tessian outbound 9eabd37e4fee:v57"); Wed, 03 Jun 2020 09:49:20 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5211ddd86ac225a9 X-CR-MTA-TID: 64aa7808 Received: from 41f4dced5dd9.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D105BFC5-CE9D-49A5-A37C-18D04C902222.1; Wed, 03 Jun 2020 09:49:14 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 41f4dced5dd9.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 03 Jun 2020 09:49:14 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mDQJKy0zZTd3HU6wogkfjO4bhI8FehO9PI3I0psYnJpwiSCD7dIo6SRSLlw4uWFDWKYFGW1QCTCd65VVDlKcctoHPGQD6F9KAz2sWErc1hkROFaJF0vEZ2wHplMF1lHJsRnrFJP89nHojZUBvtMI42H6QON0i0PrcIBS+QVYaGxicOmrUjtpde/hY8GGuSFqxclFopQKp5z9TdN/vNFSUG36F1GFGrLtQoLUgZtucXA4P7OfpK8GJJM4rNUg8GdGoS41l4YlcuBrz+b/FhDN07KnRjwXfgYr3FiQqDtUuExJL/WrVlZ6y1j0v+wUXEyMSqkv8slXMfsdrI8o966JZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PrmnvWLvto93X77xdPgtn1IYxDfB2tbB04exd2A+MOs=; b=n241jX0bLQUPJWsLVSUIXBzcEeV5go/rOCw2lrr6/xGrKg+E59jtTzpmYDdonkftV18JVtuU6givjTcjPL64qzo6mYuStE/oWqNgq8d6PbWsnxrmD/x3ossqDJAH9dCOvoTxGaxD8/ZwulgZ0PjXAbb/+J2X2cE3SIiVNbWOZO2UkArzYwYHtTO3Uo789T30U39b+F/oSmfkjOmvMgEEHaTpo1hUlwMuf5JquyWgHhGOJllJe/vYLL9g/4m8msIhuoIH/waxNOG8plWgXtqLTTb7Q1mLfNDEAkYX0hP9E6zAwr4TNXQH8kICNtDFRcGpob6WvitkK5/90L3qHDTshQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PrmnvWLvto93X77xdPgtn1IYxDfB2tbB04exd2A+MOs=; b=n25wtUgITFscsiBAyxz5yM4VxciPQHdPdywWemP/dcRJKWwJ5XTT8C+fSHyuKwVeekn6S9K2J9dQmFJQpGq5rQPSZVjb/5Iy+YjbbMcIWX63FYZeq6zpiHBKn1mxeY7nUYlEgwH1DY63dvWIk+9nLFJ8lsdleUOBokDxIHkwdLE= Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Received: from DB7PR08MB3594.eurprd08.prod.outlook.com (2603:10a6:10:4e::11) by DB7PR08MB3338.eurprd08.prod.outlook.com (2603:10a6:5:1b::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18; Wed, 3 Jun 2020 09:49:13 +0000 Received: from DB7PR08MB3594.eurprd08.prod.outlook.com ([fe80::5447:f1c0:97c:aa35]) by DB7PR08MB3594.eurprd08.prod.outlook.com ([fe80::5447:f1c0:97c:aa35%7]) with mapi id 15.20.3045.024; Wed, 3 Jun 2020 09:49:13 +0000 From: Andrea Corallo To: libc-alpha@sourceware.org Subject: [PATCH] aarch64: MTE compatible strchr Date: Wed, 03 Jun 2020 11:49:04 +0200 Message-ID: X-ClientProxiedBy: SN4PR0501CA0135.namprd05.prod.outlook.com (2603:10b6:803:2c::13) To DB7PR08MB3594.eurprd08.prod.outlook.com (2603:10a6:10:4e::11) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from e112547 (217.140.96.140) by SN4PR0501CA0135.namprd05.prod.outlook.com (2603:10b6:803:2c::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.7 via Frontend Transport; Wed, 3 Jun 2020 09:49:10 +0000 X-Originating-IP: [217.140.96.140] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 7013f840-2110-448e-05be-08d807a363c9 X-MS-TrafficTypeDiagnostic: DB7PR08MB3338:|AM0PR08MB4276: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:9508;OLM:9508; X-Forefront-PRVS: 04238CD941 X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: pL36gCFMoaBymA/c6JnMQxRBwqivtmH5K4O3fR7PJ1dUIfb6db8I3u7QwoeekaItut63v2VDiNzu0GOUrjhxsa+ZbZ75QEB4Al41Y6MKm7kNMQtJsthBGjrSn1i8//8w7XlpmNqTTkISgwUf9SE9ze6wW0XLtUQ/Ixe09ik3JZHq4tQsxeMO6sldQLaqx1S8SiPgjoVaRSTt+Y5VHq2wMkFp88CbJCyOuVu6iTjJM3i7Eia5nUjA1vSvaszSEiV/lC/nuu1Iqy0Q6v38A/scuCIYoGOwiVblr4yRpC2onVkFPpfcjWrzBQeSkidl3zbqxXxkqhNhPZtX52xYb4tpk6PEKWGkAVppk8Wbhh/Psrr2lzBaxxnvpDoWiXpMm1LirbQnHgbSZPKkpD44KvNsLw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DB7PR08MB3594.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(39860400002)(136003)(396003)(346002)(376002)(366004)(86362001)(6666004)(16526019)(44832011)(316002)(26005)(186003)(6496006)(52116002)(2616005)(956004)(6486002)(478600001)(36756003)(2906002)(66946007)(8676002)(235185007)(8936002)(5660300002)(4326008)(66476007)(66556008)(66616009)(6916009)(17423001)(156123004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: YrTQPbVJBU6XLb0tYy5D3ZbxJn2XZGmJZ1vKDY2CohxQoIu7mQY97SEhBVuTNYJM4Uv0suuLIg8cQBRMPcSZG1mbwA61srw13wZykF9EcH6nCApH9FfhSIMS3j7Xwnz8AmtizQ14eH/bs0qAj8onEssJHow/GDarMEqG725EXQI4oaHGXlaMz926zCLxMMc4Mmego4MAwMPCQIF465aU4gFMj/qa8PH9Y8R0IKkqLZlAVef10NU+yo8fp0LYlApIMfb2f3dL+dEpTDvsc97lU/zM1JDZzfVg33CpZUuGhl6OnLoyDGaBbiF+4qgn3nOWEwzswXo46Agqs77LFZ1DuoZyzqHLVJuie2BZQI+8h3XQcXylakxYHTq0tjtEMKo427ucCstIeMv5PypKLjvXmfsM3lbfhmthmkwZORVRF7ZLTXOe9ssYwbes0hnOz/N1ZEIO8Tj59oBOq2mk5m8ez0HmwH7LeAEpMjlNYNu5HDrYYjoKtdAuAjTF4oj1nEQC X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3338 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT051.eop-EUR03.prod.protection.outlook.com X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFTY:; SFS:(4636009)(396003)(346002)(376002)(39860400002)(136003)(46966005)(2616005)(44832011)(336012)(2906002)(956004)(186003)(16526019)(6666004)(26005)(4326008)(8676002)(316002)(82310400002)(6916009)(5660300002)(86362001)(47076004)(81166007)(70206006)(66616009)(70586007)(82740400003)(356005)(6496006)(478600001)(6486002)(235185007)(36756003)(8936002)(17423001)(156123004); DIR:OUT; SFP:1101; X-MS-Office365-Filtering-Correlation-Id-Prvs: 04c58682-5ce8-4609-f21a-08d807a35ef6 X-Forefront-PRVS: 04238CD941 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: m+JCBsOoVINwh///ahPd9UUxR0WcMpDP+o+EeGaJGCutDj1NMvSEKqXmuGC/qgWwL+0Bk6H8spTPaB4vTdkwzRmIm2+oyBecT8poLFYSjKOI0bPW6Kq+4rGVJtlbsiFG38v/FGLC1D9TmfMjKFbdCzDJCTKsfoQ9jX1Ba4C+uHKs9WanCyPt4F8BU43O9iso7D+HOdEoVkjmyjnh8DwEuLiroIsED2DC8sEQC6DsSmi8dwEMsZXABpI6jvi9W1btDj8BJlBo0zOMvCIa0MKpyiTIWQ9Oq5iV4MAN/wYRK++/4OwypzM32ATamtYOhJo6RWl0G+xfgK0L+Z37Yo+X9RocM8ltlq0Ec7DiOvGKE8VrzzqDRea+80glSjWj+rYwy9LJgVmCSt8QHVDvBcjlsu7QrsdnHEibzNV+oPVQQSbUytnBLxRFlnJ8uqPjWzeI X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Jun 2020 09:49:20.6828 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7013f840-2110-448e-05be-08d807a363c9 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB4276 X-Spam-Status: No, score=-18.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: nd@arm.com, Wilco.Dijkstra@arm.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Hi all, I'd like to submit this patch introducing an Arm MTE compatible strchr implementation. Follows a performance comparison of the strchr benchmark run on Cortex-A72, Cortex-A53, Neoverse N1. | length | alignment | perf-uplift A72 | perf-uplift A53 | perf-uplift N1 | |--------+-----------+-----------------+-----------------+----------------| | 32 | 0 | 1.91x | 1.10x | 1.33x | | 32 | 1 | 2.06x | 1.22x | 1.41x | | 64 | 0 | 1.61x | 1.00x | 1.18x | | 64 | 2 | 1.69x | 1.08x | 1.15x | | 128 | 0 | 1.51x | 0.85x | 1.06x | | 128 | 3 | 1.57x | 0.90x | 1.15x | | 256 | 0 | 1.37x | 0.84x | 1.09x | | 256 | 4 | 1.41x | 0.83x | 1.15x | | 512 | 0 | 1.18x | 0.80x | 1.09x | | 512 | 5 | 1.19x | 0.82x | 1.14x | | 1024 | 0 | 1.15x | 0.78x | 1.09x | | 1024 | 6 | 1.05x | 0.79x | 1.09x | | 2048 | 0 | 1.15x | 0.76x | 1.08x | | 2048 | 7 | 1.13x | 0.77x | 1.08x | | 64 | 1 | 1.28x | 1.08x | 1.33x | | 64 | 1 | 1.28x | 1.08x | 1.31x | | 64 | 2 | 1.28x | 1.08x | 1.31x | | 64 | 2 | 1.28x | 1.08x | 1.15x | | 64 | 3 | 1.28x | 1.08x | 1.15x | | 64 | 3 | 1.28x | 1.08x | 1.31x | | 64 | 4 | 1.28x | 1.08x | 1.31x | | 64 | 4 | 1.28x | 1.08x | 1.31x | | 64 | 5 | 1.28x | 1.08x | 1.31x | | 64 | 5 | 1.28x | 1.08x | 1.31x | | 64 | 6 | 1.28x | 1.08x | 1.31x | | 64 | 6 | 1.28x | 1.08x | 1.31x | | 64 | 7 | 1.28x | 1.08x | 1.31x | | 64 | 7 | 1.28x | 1.08x | 1.31x | | 0 | 0 | 1.32x | 1.63x | 1.53x | | 0 | 0 | 1.32x | 1.63x | 1.53x | | 1 | 0 | 1.31x | 1.64x | 1.53x | | 1 | 0 | 1.32x | 1.67x | 1.53x | | 2 | 0 | 1.32x | 1.63x | 1.52x | | 2 | 0 | 1.32x | 1.69x | 1.52x | | 3 | 0 | 1.32x | 1.67x | 1.51x | | 3 | 0 | 1.32x | 1.66x | 1.52x | | 4 | 0 | 1.32x | 1.69x | 1.52x | | 4 | 0 | 1.32x | 1.69x | 1.52x | | 5 | 0 | 1.32x | 1.69x | 1.26x | | 5 | 0 | 1.32x | 1.69x | 1.26x | | 6 | 0 | 1.32x | 1.69x | 1.26x | | 6 | 0 | 1.32x | 1.68x | 1.51x | | 7 | 0 | 1.32x | 1.63x | 1.54x | | 7 | 0 | 1.32x | 1.63x | 1.52x | | 8 | 0 | 1.32x | 1.69x | 1.53x | | 8 | 0 | 1.32x | 1.65x | 1.53x | | 9 | 0 | 1.32x | 1.63x | 1.54x | | 9 | 0 | 1.32x | 1.68x | 1.52x | | 10 | 0 | 1.32x | 1.63x | 1.52x | | 10 | 0 | 1.32x | 1.69x | 1.51x | | 11 | 0 | 1.32x | 1.64x | 1.52x | | 11 | 0 | 1.32x | 1.63x | 1.52x | | 12 | 0 | 1.32x | 1.64x | 1.52x | | 12 | 0 | 1.32x | 1.68x | 1.54x | | 13 | 0 | 1.32x | 1.63x | 1.53x | | 13 | 0 | 1.32x | 1.67x | 1.52x | | 14 | 0 | 1.32x | 1.65x | 1.53x | | 14 | 0 | 1.32x | 1.63x | 1.52x | | 15 | 0 | 1.32x | 1.67x | 1.52x | | 15 | 0 | 1.32x | 1.65x | 1.26x | | 16 | 0 | 1.08x | 1.00x | 1.03x | | 16 | 0 | 1.08x | 1.00x | 1.03x | | 17 | 0 | 1.09x | 1.00x | 1.03x | | 17 | 0 | 1.09x | 1.00x | 1.03x | | 18 | 0 | 1.09x | 1.00x | 1.03x | | 18 | 0 | 1.08x | 1.00x | 1.03x | | 19 | 0 | 1.08x | 1.00x | 1.03x | | 19 | 0 | 1.08x | 1.00x | 1.03x | | 20 | 0 | 1.08x | 1.00x | 1.03x | | 20 | 0 | 1.09x | 1.00x | 1.03x | | 21 | 0 | 1.08x | 1.00x | 1.03x | | 21 | 0 | 1.08x | 1.00x | 1.08x | | 22 | 0 | 1.09x | 1.00x | 1.09x | | 22 | 0 | 1.08x | 1.00x | 1.09x | | 23 | 0 | 1.08x | 1.00x | 1.08x | | 23 | 0 | 1.08x | 1.00x | 1.08x | | 24 | 0 | 1.08x | 1.00x | 1.08x | | 24 | 0 | 1.08x | 1.00x | 1.09x | | 25 | 0 | 1.08x | 1.00x | 1.10x | | 25 | 0 | 1.08x | 1.00x | 1.09x | | 26 | 0 | 1.08x | 1.00x | 1.08x | | 26 | 0 | 1.08x | 1.00x | 1.08x | | 27 | 0 | 1.09x | 1.00x | 1.08x | | 27 | 0 | 1.08x | 1.00x | 1.08x | | 28 | 0 | 1.08x | 1.00x | 1.08x | | 28 | 0 | 1.08x | 1.00x | 1.08x | | 29 | 0 | 1.08x | 1.00x | 1.09x | | 29 | 0 | 1.08x | 1.00x | 1.08x | | 30 | 0 | 1.08x | 1.00x | 1.08x | | 30 | 0 | 1.08x | 1.00x | 1.08x | | 31 | 0 | 1.09x | 1.00x | 1.08x | | 31 | 0 | 1.08x | 1.00x | 1.08x | | 32 | 0 | 1.27x | 1.10x | 1.25x | | 32 | 1 | 1.38x | 1.21x | 1.38x | | 64 | 0 | 1.17x | 1.00x | 1.20x | | 64 | 2 | 1.28x | 1.08x | 1.33x | | 128 | 0 | 1.17x | 0.85x | 1.17x | | 128 | 3 | 1.23x | 0.90x | 1.29x | | 256 | 0 | 1.17x | 0.84x | 1.15x | | 256 | 4 | 1.21x | 0.83x | 1.21x | | 512 | 0 | 1.16x | 0.80x | 1.08x | | 512 | 5 | 1.19x | 0.82x | 1.14x | | 1024 | 0 | 1.15x | 0.78x | 1.09x | | 1024 | 6 | 1.05x | 0.79x | 1.09x | | 2048 | 0 | 1.15x | 0.76x | 1.08x | | 2048 | 7 | 1.14x | 0.77x | 1.08x | | 64 | 1 | 1.20x | 1.08x | 1.33x | | 64 | 1 | 1.28x | 1.08x | 1.33x | | 64 | 2 | 1.28x | 1.08x | 1.35x | | 64 | 2 | 1.28x | 1.08x | 1.35x | | 64 | 3 | 1.28x | 1.08x | 1.15x | | 64 | 3 | 1.28x | 1.08x | 1.15x | | 64 | 4 | 1.28x | 1.08x | 1.35x | | 64 | 4 | 1.28x | 1.08x | 1.31x | | 64 | 5 | 1.28x | 1.08x | 1.35x | | 64 | 5 | 1.28x | 1.08x | 1.35x | | 64 | 6 | 1.28x | 1.08x | 1.31x | | 64 | 6 | 1.28x | 1.08x | 1.31x | | 64 | 7 | 1.28x | 1.08x | 1.35x | | 64 | 7 | 1.28x | 1.08x | 1.35x | | 0 | 0 | 1.32x | 1.68x | 1.52x | | 0 | 0 | 1.32x | 1.63x | 1.53x | | 1 | 0 | 1.32x | 1.69x | 1.52x | | 1 | 0 | 1.32x | 1.68x | 1.52x | | 2 | 0 | 1.32x | 1.69x | 1.51x | | 2 | 0 | 1.32x | 1.69x | 1.52x | | 3 | 0 | 1.32x | 1.67x | 1.51x | | 3 | 0 | 1.32x | 1.69x | 1.52x | | 4 | 0 | 1.32x | 1.67x | 1.52x | | 4 | 0 | 1.32x | 1.69x | 1.56x | | 5 | 0 | 1.32x | 1.69x | 1.52x | | 5 | 0 | 1.32x | 1.69x | 1.52x | | 6 | 0 | 1.32x | 1.69x | 1.51x | | 6 | 0 | 1.32x | 1.69x | 1.52x | | 7 | 0 | 1.32x | 1.63x | 1.52x | | 7 | 0 | 1.32x | 1.63x | 1.53x | | 8 | 0 | 1.32x | 1.65x | 1.52x | | 8 | 0 | 1.32x | 1.63x | 1.52x | | 9 | 0 | 1.32x | 1.63x | 1.51x | | 9 | 0 | 1.32x | 1.64x | 1.52x | | 10 | 0 | 1.32x | 1.63x | 1.52x | | 10 | 0 | 1.32x | 1.65x | 1.52x | | 11 | 0 | 1.32x | 1.63x | 1.52x | | 11 | 0 | 1.32x | 1.63x | 1.51x | | 12 | 0 | 1.32x | 1.63x | 1.53x | | 12 | 0 | 1.32x | 1.63x | 1.51x | | 13 | 0 | 1.32x | 1.63x | 1.52x | | 13 | 0 | 1.32x | 1.65x | 1.52x | | 14 | 0 | 1.32x | 1.66x | 1.53x | | 14 | 0 | 1.32x | 1.64x | 1.26x | | 15 | 0 | 1.32x | 1.68x | 1.26x | | 15 | 0 | 1.32x | 1.69x | 1.26x | | 16 | 0 | 1.08x | 1.00x | 1.03x | | 16 | 0 | 1.08x | 1.00x | 1.05x | | 17 | 0 | 1.08x | 1.00x | 1.08x | | 17 | 0 | 1.09x | 1.00x | 1.03x | | 18 | 0 | 1.09x | 1.00x | 1.08x | | 18 | 0 | 1.08x | 1.00x | 1.08x | | 19 | 0 | 1.08x | 1.00x | 1.08x | | 19 | 0 | 1.08x | 1.00x | 1.09x | | 20 | 0 | 1.09x | 1.00x | 1.08x | | 20 | 0 | 1.08x | 1.00x | 1.08x | | 21 | 0 | 1.08x | 1.00x | 1.09x | | 21 | 0 | 1.08x | 1.00x | 1.08x | | 22 | 0 | 1.09x | 1.00x | 1.08x | | 22 | 0 | 1.08x | 1.00x | 1.09x | | 23 | 0 | 1.08x | 1.00x | 1.08x | | 23 | 0 | 1.08x | 1.00x | 1.08x | | 24 | 0 | 1.08x | 1.00x | 1.08x | | 24 | 0 | 1.08x | 1.00x | 1.08x | | 25 | 0 | 1.08x | 1.00x | 1.08x | | 25 | 0 | 1.08x | 1.00x | 1.09x | | 26 | 0 | 1.08x | 1.00x | 1.08x | | 26 | 0 | 1.08x | 1.00x | 1.09x | | 27 | 0 | 1.09x | 1.00x | 1.08x | | 27 | 0 | 1.08x | 1.00x | 1.08x | | 28 | 0 | 1.08x | 1.00x | 1.08x | | 28 | 0 | 1.09x | 1.00x | 1.03x | | 29 | 0 | 1.08x | 1.00x | 1.03x | | 29 | 0 | 1.08x | 1.00x | 1.03x | | 30 | 0 | 1.08x | 1.00x | 1.08x | | 30 | 0 | 1.08x | 1.00x | 1.08x | | 31 | 0 | 1.09x | 1.00x | 1.08x | | 31 | 0 | 1.08x | 1.00x | 1.08x | This patch is passing GLIBC tests. Regards Andrea 8< --- 8< --- 8< Introduce an Arm MTE compatible strchr implementation. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show performance regressions. Co-authored-by: Wilco Dijkstra diff --git a/sysdeps/aarch64/strchr.S b/sysdeps/aarch64/strchr.S index 4a75e73945..fd1b941666 100644 --- a/sysdeps/aarch64/strchr.S +++ b/sysdeps/aarch64/strchr.S @@ -22,118 +22,98 @@ /* Assumptions: * - * ARMv8-a, AArch64 + * ARMv8-a, AArch64, Advanced SIMD. + * MTE compatible. */ -/* Arguments and results. */ #define srcin x0 #define chrin w1 - #define result x0 #define src x2 -#define tmp1 x3 -#define wtmp2 w4 -#define tmp3 x5 +#define tmp1 x1 +#define wtmp2 w3 +#define tmp3 x3 #define vrepchr v0 -#define vdata1 v1 -#define vdata2 v2 -#define vhas_nul1 v3 -#define vhas_nul2 v4 -#define vhas_chr1 v5 -#define vhas_chr2 v6 -#define vrepmask_0 v7 -#define vrepmask_c v16 -#define vend1 v17 -#define vend2 v18 - - /* Core algorithm. - For each 32-byte hunk we calculate a 64-bit syndrome value, with - two bits per byte (LSB is always in bits 0 and 1, for both big - and little-endian systems). Bit 0 is set iff the relevant byte - matched the requested character. Bit 1 is set iff the - relevant byte matched the NUL end of string (we trigger off bit0 - for the special case of looking for NUL). Since the bits - in the syndrome reflect exactly the order in which things occur - in the original string a count_trailing_zeros() operation will - identify exactly which byte is causing the termination, and why. */ - -/* Locals and temporaries. */ +#define vdata v1 +#define qdata q1 +#define vhas_nul v2 +#define vhas_chr v3 +#define vrepmask v4 +#define vrepmask2 v5 +#define vend v6 +#define dend d6 + +/* Core algorithm. + + For each 16-byte chunk we calculate a 64-bit syndrome value with four bits + per byte. For even bytes, bits 0-1 are set if the relevant byte matched the + requested character, bits 2-3 are set if the byte is NUL (or matched), and + bits 4-7 are not used and must be zero if none of bits 0-3 are set). Odd + bytes set bits 4-7 so that adjacent bytes can be merged. Since the bits + in the syndrome reflect the order in which things occur in the original + string, counting trailing zeros identifies exactly which byte matched. */ ENTRY (strchr) DELOUSE (0) - mov wtmp2, #0x0401 - movk wtmp2, #0x4010, lsl #16 + bic src, srcin, 15 dup vrepchr.16b, chrin - bic src, srcin, #31 - dup vrepmask_c.4s, wtmp2 - ands tmp1, srcin, #31 - add vrepmask_0.4s, vrepmask_c.4s, vrepmask_c.4s // lsl #1 - b.eq L(loop) - - /* Input string is not 32-byte aligned. Rather than forcing - the padding bytes to a safe value, we calculate the syndrome - for all the bytes, but then mask off those bits of the - syndrome that are related to the padding. */ - ld1 {vdata1.16b, vdata2.16b}, [src], #32 - neg tmp1, tmp1 - cmeq vhas_nul1.16b, vdata1.16b, #0 - cmeq vhas_chr1.16b, vdata1.16b, vrepchr.16b - cmeq vhas_nul2.16b, vdata2.16b, #0 - cmeq vhas_chr2.16b, vdata2.16b, vrepchr.16b - and vhas_nul1.16b, vhas_nul1.16b, vrepmask_0.16b - and vhas_nul2.16b, vhas_nul2.16b, vrepmask_0.16b - and vhas_chr1.16b, vhas_chr1.16b, vrepmask_c.16b - and vhas_chr2.16b, vhas_chr2.16b, vrepmask_c.16b - orr vend1.16b, vhas_nul1.16b, vhas_chr1.16b - orr vend2.16b, vhas_nul2.16b, vhas_chr2.16b - lsl tmp1, tmp1, #1 - addp vend1.16b, vend1.16b, vend2.16b // 256->128 - mov tmp3, #~0 - addp vend1.16b, vend1.16b, vend2.16b // 128->64 - lsr tmp1, tmp3, tmp1 - - mov tmp3, vend1.2d[0] - bic tmp1, tmp3, tmp1 // Mask padding bits. - cbnz tmp1, L(tail) + ld1 {vdata.16b}, [src] + mov wtmp2, 0x3003 + dup vrepmask.8h, wtmp2 + cmeq vhas_nul.16b, vdata.16b, 0 + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + mov wtmp2, 0xf00f + dup vrepmask2.8h, wtmp2 + + bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b + and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b + lsl tmp3, srcin, 2 + addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ + + fmov tmp1, dend + lsr tmp1, tmp1, tmp3 + cbz tmp1, L(loop) + + rbit tmp1, tmp1 + clz tmp1, tmp1 + /* Tmp1 is an even multiple of 2 if the target character was + found first. Otherwise we've found the end of string. */ + tst tmp1, 2 + add result, srcin, tmp1, lsr 2 + csel result, result, xzr, eq + ret + .p2align 4 L(loop): - ld1 {vdata1.16b, vdata2.16b}, [src], #32 - cmeq vhas_nul1.16b, vdata1.16b, #0 - cmeq vhas_chr1.16b, vdata1.16b, vrepchr.16b - cmeq vhas_nul2.16b, vdata2.16b, #0 - cmeq vhas_chr2.16b, vdata2.16b, vrepchr.16b - /* Use a fast check for the termination condition. */ - orr vend1.16b, vhas_nul1.16b, vhas_chr1.16b - orr vend2.16b, vhas_nul2.16b, vhas_chr2.16b - orr vend1.16b, vend1.16b, vend2.16b - addp vend1.2d, vend1.2d, vend1.2d - mov tmp1, vend1.2d[0] + ldr qdata, [src, 16]! + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b + fmov tmp1, dend cbz tmp1, L(loop) - /* Termination condition found. Now need to establish exactly why - we terminated. */ - and vhas_nul1.16b, vhas_nul1.16b, vrepmask_0.16b - and vhas_nul2.16b, vhas_nul2.16b, vrepmask_0.16b - and vhas_chr1.16b, vhas_chr1.16b, vrepmask_c.16b - and vhas_chr2.16b, vhas_chr2.16b, vrepmask_c.16b - orr vend1.16b, vhas_nul1.16b, vhas_chr1.16b - orr vend2.16b, vhas_nul2.16b, vhas_chr2.16b - addp vend1.16b, vend1.16b, vend2.16b // 256->128 - addp vend1.16b, vend1.16b, vend2.16b // 128->64 - - mov tmp1, vend1.2d[0] -L(tail): - sub src, src, #32 +#ifdef __AARCH64EB__ + bif vhas_nul.16b, vhas_chr.16b, vrepmask.16b + and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b + addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ + fmov tmp1, dend +#else + bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b + and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b + addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ + fmov tmp1, dend rbit tmp1, tmp1 +#endif clz tmp1, tmp1 - /* Tmp1 is even if the target charager was found first. Otherwise - we've found the end of string and we weren't looking for NUL. */ - tst tmp1, #1 - add result, src, tmp1, lsr #1 + /* Tmp1 is an even multiple of 2 if the target character was + found first. Otherwise we've found the end of string. */ + tst tmp1, 2 + add result, src, tmp1, lsr 2 csel result, result, xzr, eq ret + END (strchr) libc_hidden_builtin_def (strchr) weak_alias (strchr, index)