Message ID | gkrv9k88oxu.fsf@arm.com |
---|---|
State | Committed |
Commit | f7de454f20c05a748b5d421ed22d96a5232b6093 |
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7BA1D383E800; Wed, 3 Jun 2020 09:43:45 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20071.outbound.protection.outlook.com [40.107.2.71]) by sourceware.org (Postfix) with ESMTPS id 3426E383E81A for <libc-alpha@sourceware.org>; Wed, 3 Jun 2020 09:43:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3426E383E81A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Andrea.Corallo@arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IOGIxPWCnZnlDWE8/PZdpXZV+llH6JQ5ZPo9iIRmcc8=; b=nwlYJVHuJGer89nucNUUXNbFDZ19PzQ6hnqqQQ9UkJ1gunyV/kWucZ9jf2U7zQf//FD/TBRQ1myXxwnZ5Si+wQ+HDKHAvEJieHQQGc7xJHpRiXzW+HLdW73AIRCln9k3fRIKVgDOafcTRteh9ZZMKRHtyEwUSFOpTttoaBb5Gto= Received: from AM5PR04CA0014.eurprd04.prod.outlook.com (2603:10a6:206:1::27) by VI1PR08MB3344.eurprd08.prod.outlook.com (2603:10a6:803:3f::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3045.21; Wed, 3 Jun 2020 09:43:37 +0000 Received: from AM5EUR03FT021.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:1:cafe::76) by AM5PR04CA0014.outlook.office365.com (2603:10a6:206:1::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Wed, 3 Jun 2020 09:43:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; sourceware.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT021.mail.protection.outlook.com (10.152.16.105) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Wed, 3 Jun 2020 09:43:37 +0000 Received: ("Tessian outbound 444e8e881ac1:v57"); Wed, 03 Jun 2020 09:43:36 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: bcc51b2aba479620 X-CR-MTA-TID: 64aa7808 Received: from d667f7367aef.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 663C4A7C-7937-4560-9FB2-22A0D648FA22.1; Wed, 03 Jun 2020 09:43:31 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id d667f7367aef.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 03 Jun 2020 09:43:31 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S8BLI8d1otQAKdLzg0bRZ/ODoSh3blAmivjCKDITaQjPBcYiuv18HcBbrxDut/JrLtGdTEkFscOopoZQaH/fnq7im52H2qCXBcturzvTl4urFwy4tp9Z2pY25v3wTvcEhiumwG4mTIjkSjUqJkhXXvlcAm0M96RppaqNxojRZOwJXGlVF9lEKtPEk7WVCs7AiCwtx+4NLPDvsDnYan5nkAlt6fLUu8w36FklmbCAyesTLNneAZHaf1qFjIxOO+/OizWf7Q0c51TYtU0i2svz0F8QZjqnrXWUdsygq2Ps5GagN4QF9uZGjcFBFHzUk5R9mj5kv92oYTU3qqAJ+ng/vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IOGIxPWCnZnlDWE8/PZdpXZV+llH6JQ5ZPo9iIRmcc8=; b=MWBH3j6mNuQ54vCaOtTnv6+7EmnFsE4/jgMDsHDyETue9VD5W0Av8H9Uo4I3STy0eHbtQFwIsEwaqrPpGqopkxaMpNUw6XgkeZ/I7ulq3dXpEO9jdlguR7ZQJIVC1u4xYmN8dnGqsTN2afI9SBT6/lXr7pfcT5+kmk4hvt0HN03v5suV3oFCsPCFoKYMtEOm1jWtvEEuOSZvdUhfY/TjG0nU2WTL1dJLgATlbkVN26K7ZcpOHTE+HDLrQ44hOf7z1ga3w9Y7kGk6PmLv0dN8reIEklK5SrrhtdCZgswLqaD8mw3D/8NE4PDEdwVazFpxzDabrw+aYbUzmucCrl5mbg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IOGIxPWCnZnlDWE8/PZdpXZV+llH6JQ5ZPo9iIRmcc8=; b=nwlYJVHuJGer89nucNUUXNbFDZ19PzQ6hnqqQQ9UkJ1gunyV/kWucZ9jf2U7zQf//FD/TBRQ1myXxwnZ5Si+wQ+HDKHAvEJieHQQGc7xJHpRiXzW+HLdW73AIRCln9k3fRIKVgDOafcTRteh9ZZMKRHtyEwUSFOpTttoaBb5Gto= Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Received: from DB7PR08MB3594.eurprd08.prod.outlook.com (2603:10a6:10:4e::11) by DB7PR08MB3739.eurprd08.prod.outlook.com (2603:10a6:10:79::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18; Wed, 3 Jun 2020 09:43:29 +0000 Received: from DB7PR08MB3594.eurprd08.prod.outlook.com ([fe80::5447:f1c0:97c:aa35]) by DB7PR08MB3594.eurprd08.prod.outlook.com ([fe80::5447:f1c0:97c:aa35%7]) with mapi id 15.20.3045.024; Wed, 3 Jun 2020 09:43:29 +0000 From: Andrea Corallo <andrea.corallo@arm.com> To: libc-alpha@sourceware.org Subject: [PATCH] aarch64: MTE compatible strchrnul Date: Wed, 03 Jun 2020 11:43:25 +0200 Message-ID: <gkrv9k88oxu.fsf@arm.com> Content-Type: multipart/mixed; boundary="=-=-=" X-ClientProxiedBy: LO2P265CA0452.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:e::32) To DB7PR08MB3594.eurprd08.prod.outlook.com (2603:10a6:10:4e::11) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from e112547 (217.140.96.140) by LO2P265CA0452.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:e::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Wed, 3 Jun 2020 09:43:28 +0000 X-Originating-IP: [217.140.96.140] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: bcf388f0-aee7-47b5-393b-08d807a296f5 X-MS-TrafficTypeDiagnostic: DB7PR08MB3739:|VI1PR08MB3344: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: <VI1PR08MB3344887AC09DD492361A13B188880@VI1PR08MB3344.eurprd08.prod.outlook.com> x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:9508;OLM:9508; X-Forefront-PRVS: 04238CD941 X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 2vXZs0iKG+sXfbtTwd08O/evbGw9wqwaAjX92FQT5spOOTOjmvIzsvtyv5PplxF6zkS9ZN/44ABpJ6uCwYes9LI6N4TzRa0iHE7NqD73YTlvri/wIwmoaqqNvTckSKIKGFMl6ndxKKzuC8nVTeVk1TePFOFWSIrVn/XzUkWQYpgy6rtOlqXC26lFVKi11apdHo1ZRZ2/yE9GgEl9BrXK+FqkaxOuC/LtpH0haUVyNNzWKu06jNwddc75C3gn/+hEi9c413ZM9abOCkYiGCiJiSJ4TNx2k08T5D7GtgtOKYytkvZ+Br3eKeq06cSBMMuO0rTMEfg1pBS9xjkVWkcNXR01QBRzL1KY0RUEW0Jn1wUIo+NbzsxXGnjZLWi5esIr X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DB7PR08MB3594.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(366004)(376002)(346002)(396003)(136003)(39860400002)(5660300002)(52116002)(44832011)(6486002)(235185007)(956004)(36756003)(2616005)(16526019)(6666004)(6916009)(4326008)(26005)(186003)(8936002)(66616009)(66946007)(86362001)(478600001)(66556008)(2906002)(6496006)(8676002)(66476007)(316002)(156123004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: 6DRJ4pfyK+pfN96eqomMlc/fX1g9fg0kb/hxVCzUUDZ+nfOjGiu3mJF6XYh3yn7FEqAgMqWWnhHi1D+dC4PLKVaIxKJaEcUJ5GVONgrCMkVVltiY2npPvWFdSr0emMoPjNJubfUKnGdZ7LVTXC21EtXfAanp/wOYOxTRF0Chi9axSR38g5iKddi3yutqzJmA5SEKW+fEJm4SPdPj63yo134IU+njGpsg8mIPrUOVJbJ+Rbk/51YsnZEzWnzoWz5F4NM62Rm7wBLTaVzeWKZCnNsnyd5bGUvk3ch0UWGnVwxi+S7L18xyLGk8xBHJawbHEYP9AIZdfUGkrm+qgDpMAR6B0Ib0t2STmQeVxV3zEEYzJ5jQjZriv/S32pJ1j3/d/XKR3iu0IFvHy2bLUsNHv9jcA6eQHoCg3v17sd92WCtq1V+nmtfCcbVQivGZO8CqbrSnhCZXBry/dpS3DgXsfWYR67Bxli7KNPpI3HGrMSVsVcJKx+y53QjtpJ9gh7Cx X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3739 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT021.eop-EUR03.prod.protection.outlook.com X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFTY:; SFS:(4636009)(376002)(136003)(346002)(396003)(39860400002)(46966005)(356005)(70206006)(86362001)(47076004)(81166007)(6486002)(82740400003)(6916009)(235185007)(478600001)(36756003)(6496006)(8936002)(82310400002)(5660300002)(70586007)(36906005)(26005)(336012)(44832011)(2616005)(66616009)(2906002)(316002)(956004)(16526019)(4326008)(186003)(8676002)(6666004)(156123004); DIR:OUT; SFP:1101; X-MS-Office365-Filtering-Correlation-Id-Prvs: e3d23406-0cc1-4a0a-f142-08d807a29233 X-Forefront-PRVS: 04238CD941 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: yZkHGLApT3NJbYovH6I4tUWT64fhX7osfmlNbebY0ko/9cj5p2zXEjd/qp7bSO3h9w+bZjiea51BneITKS9w3yabHplbT9zoIt8/5CDMVQSujrhzqxUALUhYtlGNreyv3mgroUd6q/H6jDtz2GeeNkI16fdla+yKvXjI58GWfp8GbkVZc9WUgho4mEyApKffWrsAb11tcRCfqEiv90OzpPEFlcf1Nd9WTKw65ZyCxjwVn63XNtf/aS1ylQl6pNjBq6/9VmpmGRPIsyfBZCTovps8x/eY5P/lH7tgcwbCYH5wsdtrHFLrQfbPeKXKBc696G+4ErMDjIG3Y8jl7EJyLWP6DLg4IJu52Dnx4JQNJO/R6LgzrTzZJ5oc97t9/CQlzI8ryghIN4LuDQN4vqawl4bKUkM/ZDjn7iZnvqXTBSQ= X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Jun 2020 09:43:37.0825 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: bcf388f0-aee7-47b5-393b-08d807a296f5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3344 X-Spam-Status: No, score=-18.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <http://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <http://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> Cc: nd@arm.com, Wilco.Dijkstra@arm.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
aarch64: MTE compatible strchrnul
|
|
Commit Message
Andrea Corallo
June 3, 2020, 9:43 a.m. UTC
Hi all, I'd like to submit this patch introducing an Arm MTE compatible strchrnul implementation. Follows a performance comparison of the strchrnul benchmark run on Cortex-A72, Cortex-A53, Neoverse N1. | length | alignment | perf-uplift A72 | perf-uplift A53 | perf-uplift N1 | |--------+-----------+-----------------+-----------------+----------------| | 32 | 0 | 1.16x | 1.07x | 1.35x | | 32 | 1 | 1.25x | 1.16x | 1.15x | | 64 | 0 | 1.26x | 0.97x | 1.20x | | 64 | 2 | 1.35x | 1.04x | 1.30x | | 128 | 0 | 1.12x | 0.84x | 1.22x | | 128 | 3 | 1.25x | 0.87x | 1.30x | | 256 | 0 | 1.14x | 0.84x | 1.16x | | 256 | 4 | 1.24x | 0.81x | 1.16x | | 512 | 0 | 1.15x | 0.80x | 1.13x | | 512 | 5 | 1.17x | 0.81x | 1.14x | | 1024 | 0 | 1.14x | 0.78x | 1.08x | | 1024 | 6 | 1.03x | 0.78x | 1.10x | | 2048 | 0 | 1.12x | 0.76x | 1.08x | | 2048 | 7 | 1.14x | 0.77x | 1.09x | | 64 | 1 | 1.35x | 1.04x | 1.37x | | 64 | 1 | 1.36x | 1.04x | 1.37x | | 64 | 2 | 1.36x | 1.04x | 1.37x | | 64 | 2 | 1.37x | 1.04x | 1.38x | | 64 | 3 | 1.38x | 1.04x | 1.36x | | 64 | 3 | 1.40x | 1.04x | 1.36x | | 64 | 4 | 1.41x | 1.04x | 1.36x | | 64 | 4 | 1.36x | 1.04x | 1.36x | | 64 | 5 | 1.34x | 1.04x | 1.40x | | 64 | 5 | 1.35x | 1.04x | 1.36x | | 64 | 6 | 1.34x | 1.04x | 1.37x | | 64 | 6 | 1.41x | 1.04x | 1.37x | | 64 | 7 | 1.39x | 1.04x | 1.36x | | 64 | 7 | 1.34x | 1.04x | 1.37x | | 0 | 0 | 1.18x | 1.63x | 1.66x | | 0 | 0 | 1.18x | 1.63x | 1.66x | | 1 | 0 | 1.18x | 1.63x | 1.66x | | 1 | 0 | 1.18x | 1.63x | 1.67x | | 2 | 0 | 1.18x | 1.63x | 1.66x | | 2 | 0 | 1.18x | 1.63x | 1.65x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 4 | 0 | 1.18x | 1.63x | 1.65x | | 4 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.64x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.65x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.64x | | 11 | 0 | 1.18x | 1.63x | 1.63x | | 12 | 0 | 1.18x | 1.63x | 1.63x | | 12 | 0 | 1.18x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.63x | | 13 | 0 | 1.18x | 1.63x | 1.63x | | 14 | 0 | 1.18x | 1.63x | 1.63x | | 14 | 0 | 1.18x | 1.63x | 1.22x | | 15 | 0 | 1.19x | 1.63x | 1.22x | | 15 | 0 | 1.18x | 1.63x | 1.63x | | 16 | 0 | 1.03x | 0.96x | 1.15x | | 16 | 0 | 1.03x | 0.96x | 1.13x | | 17 | 0 | 1.03x | 0.96x | 0.98x | | 17 | 0 | 1.03x | 0.96x | 0.98x | | 18 | 0 | 1.03x | 0.96x | 0.98x | | 18 | 0 | 1.03x | 0.96x | 0.98x | | 19 | 0 | 1.04x | 0.96x | 0.98x | | 19 | 0 | 1.04x | 0.96x | 0.98x | | 20 | 0 | 1.04x | 0.96x | 1.00x | | 20 | 0 | 1.03x | 0.96x | 0.99x | | 21 | 0 | 1.04x | 0.96x | 0.99x | | 21 | 0 | 1.03x | 0.96x | 1.14x | | 22 | 0 | 1.04x | 0.96x | 1.14x | | 22 | 0 | 1.03x | 0.96x | 1.14x | | 23 | 0 | 1.03x | 0.96x | 1.13x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 24 | 0 | 1.04x | 0.96x | 1.13x | | 24 | 0 | 1.04x | 0.95x | 1.13x | | 25 | 0 | 1.03x | 0.96x | 1.15x | | 25 | 0 | 1.04x | 0.96x | 1.12x | | 26 | 0 | 1.04x | 0.96x | 1.13x | | 26 | 0 | 1.02x | 0.96x | 1.13x | | 27 | 0 | 1.04x | 0.96x | 1.13x | | 27 | 0 | 1.03x | 0.96x | 1.13x | | 28 | 0 | 1.03x | 0.96x | 0.98x | | 28 | 0 | 1.04x | 0.96x | 1.05x | | 29 | 0 | 1.02x | 0.96x | 1.00x | | 29 | 0 | 1.03x | 0.96x | 1.00x | | 30 | 0 | 1.04x | 0.96x | 1.00x | | 30 | 0 | 1.04x | 0.96x | 1.00x | | 31 | 0 | 1.04x | 0.96x | 0.99x | | 31 | 0 | 1.03x | 0.96x | 0.99x | | 32 | 0 | 1.09x | 1.07x | 1.09x | | 32 | 1 | 1.25x | 1.15x | 1.38x | | 64 | 0 | 1.27x | 0.98x | 1.20x | | 64 | 2 | 1.41x | 1.04x | 1.30x | | 128 | 0 | 1.15x | 0.84x | 1.22x | | 128 | 3 | 1.23x | 0.87x | 1.30x | | 256 | 0 | 1.16x | 0.84x | 1.16x | | 256 | 4 | 1.23x | 0.81x | 1.17x | | 512 | 0 | 1.14x | 0.80x | 1.12x | | 512 | 5 | 1.18x | 0.81x | 1.14x | | 1024 | 0 | 1.16x | 0.78x | 1.09x | | 1024 | 6 | 1.03x | 0.78x | 1.11x | | 2048 | 0 | 1.14x | 0.76x | 1.08x | | 2048 | 7 | 1.14x | 0.77x | 1.09x | | 64 | 1 | 1.40x | 1.04x | 1.37x | | 64 | 1 | 1.40x | 1.04x | 1.37x | | 64 | 2 | 1.35x | 1.04x | 1.37x | | 64 | 2 | 1.38x | 1.04x | 1.37x | | 64 | 3 | 1.36x | 1.04x | 1.37x | | 64 | 3 | 1.34x | 1.04x | 1.37x | | 64 | 4 | 1.41x | 1.04x | 1.37x | | 64 | 4 | 1.38x | 1.04x | 1.37x | | 64 | 5 | 1.36x | 1.04x | 1.37x | | 64 | 5 | 1.36x | 1.04x | 1.37x | | 64 | 6 | 1.35x | 1.04x | 1.37x | | 64 | 6 | 1.40x | 1.04x | 1.37x | | 64 | 7 | 1.35x | 1.04x | 1.37x | | 64 | 7 | 1.40x | 1.04x | 1.37x | | 0 | 0 | 1.19x | 1.63x | 1.66x | | 0 | 0 | 1.19x | 1.63x | 1.66x | | 1 | 0 | 1.19x | 1.63x | 1.66x | | 1 | 0 | 1.19x | 1.63x | 1.66x | | 2 | 0 | 1.18x | 1.63x | 1.63x | | 2 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.20x | 1.63x | 1.63x | | 4 | 0 | 1.18x | 1.63x | 1.63x | | 4 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 8 | 0 | 1.18x | 1.63x | 1.25x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.66x | | 12 | 0 | 1.18x | 1.63x | 1.66x | | 12 | 0 | 1.19x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.66x | | 14 | 0 | 1.19x | 1.63x | 1.66x | | 14 | 0 | 1.19x | 1.63x | 1.66x | | 15 | 0 | 1.18x | 1.63x | 1.66x | | 15 | 0 | 1.18x | 1.63x | 1.66x | | 16 | 0 | 1.03x | 0.96x | 1.00x | | 16 | 0 | 1.03x | 0.96x | 1.00x | | 17 | 0 | 1.03x | 0.96x | 1.00x | | 17 | 0 | 1.03x | 0.96x | 1.15x | | 18 | 0 | 1.03x | 0.96x | 1.14x | | 18 | 0 | 1.04x | 0.96x | 1.15x | | 19 | 0 | 1.04x | 0.96x | 1.15x | | 19 | 0 | 1.04x | 0.96x | 1.15x | | 20 | 0 | 1.04x | 0.96x | 1.15x | | 20 | 0 | 1.03x | 0.96x | 1.15x | | 21 | 0 | 1.04x | 0.96x | 1.15x | | 21 | 0 | 1.03x | 0.96x | 1.15x | | 22 | 0 | 1.02x | 0.96x | 1.15x | | 22 | 0 | 1.03x | 0.96x | 1.15x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 24 | 0 | 1.03x | 0.96x | 1.00x | | 24 | 0 | 1.02x | 0.96x | 1.00x | | 25 | 0 | 1.04x | 0.96x | 1.00x | | 25 | 0 | 1.03x | 0.96x | 1.16x | | 26 | 0 | 1.04x | 0.96x | 1.15x | | 26 | 0 | 1.03x | 0.96x | 1.15x | | 27 | 0 | 1.04x | 0.96x | 1.00x | | 27 | 0 | 1.03x | 0.96x | 1.00x | | 28 | 0 | 1.04x | 0.96x | 1.00x | | 28 | 0 | 1.04x | 0.96x | 1.00x | | 29 | 0 | 1.03x | 0.96x | 1.00x | | 29 | 0 | 1.04x | 0.96x | 1.15x | | 30 | 0 | 1.04x | 0.96x | 1.15x | | 30 | 0 | 1.03x | 0.95x | 1.00x | | 31 | 0 | 1.03x | 0.96x | 1.00x | | 31 | 0 | 1.04x | 0.96x | 1.00x | This patch is passing GLIBC tests. Regards Andrea 8< --- 8< --- 8< Introduce an Arm MTE compatible strchrnul implementation. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show performance regressions. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
Comments
* Andrea Corallo: > Introduce an Arm MTE compatible strchrnul implementation. > > Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show > performance regressions. > > Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com> As a very high-level comment, I would expect some sort of markup in the file that this implementation is now MTE-safe, similar to what we have for executable stacks. Or do you plan to handle that in some other fashion? Thanks, Florian
Florian Weimer <fweimer@redhat.com> writes: > * Andrea Corallo: > >> Introduce an Arm MTE compatible strchrnul implementation. >> >> Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show >> performance regressions. >> >> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com> > > As a very high-level comment, I would expect some sort of markup in the > file that this implementation is now MTE-safe, similar to what we have > for executable stacks. > > Or do you plan to handle that in some other fashion? > > Thanks, > Florian Hi Florian, Now the only markup is the comment on the top of the file stating the MTE compatibility of the routine. I'm not aware of how this marking is done for executable stacks, perhaps could you give an hook on where to look for? Just to make sure we are one the same page wanted to add: these functions are supposed to be backward compatible with what they are replacing, so I'm not sure a marking is necessary. Thanks Andrea
* Andrea Corallo: > Florian Weimer <fweimer@redhat.com> writes: > >> * Andrea Corallo: >> >>> Introduce an Arm MTE compatible strchrnul implementation. >>> >>> Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show >>> performance regressions. >>> >>> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com> >> >> As a very high-level comment, I would expect some sort of markup in the >> file that this implementation is now MTE-safe, similar to what we have >> for executable stacks. >> >> Or do you plan to handle that in some other fashion? >> >> Thanks, >> Florian > > Hi Florian, > > Now the only markup is the comment on the top of the file stating the > MTE compatibility of the routine. > > I'm not aware of how this marking is done for executable stacks, perhaps > could you give an hook on where to look for? Typically, the -z noexecstack flag or a special .note.GNU-stack section is used for that. > Just to make sure we are one the same page wanted to add: these > functions are supposed to be backward compatible with what they are > replacing, so I'm not sure a marking is necessary. It's MTE that isn't backwards-compatible without such markup. Thanks, Florian
On 03/06/2020 11:24, Florian Weimer via Libc-alpha wrote: > * Andrea Corallo: > >> Florian Weimer <fweimer@redhat.com> writes: >> >>> * Andrea Corallo: >>> >>>> Introduce an Arm MTE compatible strchrnul implementation. >>>> >>>> Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show >>>> performance regressions. >>>> >>>> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com> >>> >>> As a very high-level comment, I would expect some sort of markup in the >>> file that this implementation is now MTE-safe, similar to what we have >>> for executable stacks. >>> >>> Or do you plan to handle that in some other fashion? >>> >>> Thanks, >>> Florian >> >> Hi Florian, >> >> Now the only markup is the comment on the top of the file stating the >> MTE compatibility of the routine. >> >> I'm not aware of how this marking is done for executable stacks, perhaps >> could you give an hook on where to look for? > > Typically, the -z noexecstack flag or a special .note.GNU-stack section > is used for that. > >> Just to make sure we are one the same page wanted to add: these >> functions are supposed to be backward compatible with what they are >> replacing, so I'm not sure a marking is necessary. > > It's MTE that isn't backwards-compatible without such markup. Afaiu there is no need to add any marking for MTE, the main difference it enforce 16-byte granularity read. I think you are confusing with BTI, which does require the GNU note.
On 03/06/2020 06:43, Andrea Corallo wrote: > Hi all, > > I'd like to submit this patch introducing an Arm MTE compatible > strchrnul implementation. > > Follows a performance comparison of the strchrnul benchmark run on > Cortex-A72, Cortex-A53, Neoverse N1. How these performance numbers were calculated? Did you use glibc benchtests or an external one? Besides it the commit message does not give an overall description of which changes it does to the generic implementation neither the requirements of MTE support. You could use the description on optimized-routines to state it changes the granularity of the read instruction to 16 bytes from the 32 bytes strategy and briefly describe the MTE requirements for string routines. I would like to ask you to send patches as inline instead of attachment (it make easier to review in most email clients). > > | length | alignment | perf-uplift A72 | perf-uplift A53 | perf-uplift N1 | > |--------+-----------+-----------------+-----------------+----------------| > | 32 | 0 | 1.16x | 1.07x | 1.35x | > | 32 | 1 | 1.25x | 1.16x | 1.15x | > | 64 | 0 | 1.26x | 0.97x | 1.20x | > | 64 | 2 | 1.35x | 1.04x | 1.30x | > | 128 | 0 | 1.12x | 0.84x | 1.22x | > | 128 | 3 | 1.25x | 0.87x | 1.30x | > | 256 | 0 | 1.14x | 0.84x | 1.16x | > | 256 | 4 | 1.24x | 0.81x | 1.16x | > | 512 | 0 | 1.15x | 0.80x | 1.13x | > | 512 | 5 | 1.17x | 0.81x | 1.14x | > | 1024 | 0 | 1.14x | 0.78x | 1.08x | > | 1024 | 6 | 1.03x | 0.78x | 1.10x | > | 2048 | 0 | 1.12x | 0.76x | 1.08x | > | 2048 | 7 | 1.14x | 0.77x | 1.09x | > | 64 | 1 | 1.35x | 1.04x | 1.37x | > | 64 | 1 | 1.36x | 1.04x | 1.37x | > | 64 | 2 | 1.36x | 1.04x | 1.37x | > | 64 | 2 | 1.37x | 1.04x | 1.38x | > | 64 | 3 | 1.38x | 1.04x | 1.36x | > | 64 | 3 | 1.40x | 1.04x | 1.36x | > | 64 | 4 | 1.41x | 1.04x | 1.36x | > | 64 | 4 | 1.36x | 1.04x | 1.36x | > | 64 | 5 | 1.34x | 1.04x | 1.40x | > | 64 | 5 | 1.35x | 1.04x | 1.36x | > | 64 | 6 | 1.34x | 1.04x | 1.37x | > | 64 | 6 | 1.41x | 1.04x | 1.37x | > | 64 | 7 | 1.39x | 1.04x | 1.36x | > | 64 | 7 | 1.34x | 1.04x | 1.37x | > | 0 | 0 | 1.18x | 1.63x | 1.66x | > | 0 | 0 | 1.18x | 1.63x | 1.66x | > | 1 | 0 | 1.18x | 1.63x | 1.66x | > | 1 | 0 | 1.18x | 1.63x | 1.67x | > | 2 | 0 | 1.18x | 1.63x | 1.66x | > | 2 | 0 | 1.18x | 1.63x | 1.65x | > | 3 | 0 | 1.18x | 1.63x | 1.66x | > | 3 | 0 | 1.18x | 1.63x | 1.66x | > | 4 | 0 | 1.18x | 1.63x | 1.65x | > | 4 | 0 | 1.18x | 1.63x | 1.66x | > | 5 | 0 | 1.18x | 1.63x | 1.66x | > | 5 | 0 | 1.18x | 1.63x | 1.66x | > | 6 | 0 | 1.18x | 1.63x | 1.66x | > | 6 | 0 | 1.18x | 1.63x | 1.66x | > | 7 | 0 | 1.18x | 1.63x | 1.66x | > | 7 | 0 | 1.18x | 1.63x | 1.64x | > | 8 | 0 | 1.18x | 1.63x | 1.66x | > | 8 | 0 | 1.18x | 1.63x | 1.66x | > | 9 | 0 | 1.18x | 1.63x | 1.65x | > | 9 | 0 | 1.18x | 1.63x | 1.66x | > | 10 | 0 | 1.18x | 1.63x | 1.66x | > | 10 | 0 | 1.18x | 1.63x | 1.66x | > | 11 | 0 | 1.18x | 1.63x | 1.64x | > | 11 | 0 | 1.18x | 1.63x | 1.63x | > | 12 | 0 | 1.18x | 1.63x | 1.63x | > | 12 | 0 | 1.18x | 1.63x | 1.66x | > | 13 | 0 | 1.18x | 1.63x | 1.63x | > | 13 | 0 | 1.18x | 1.63x | 1.63x | > | 14 | 0 | 1.18x | 1.63x | 1.63x | > | 14 | 0 | 1.18x | 1.63x | 1.22x | > | 15 | 0 | 1.19x | 1.63x | 1.22x | > | 15 | 0 | 1.18x | 1.63x | 1.63x | > | 16 | 0 | 1.03x | 0.96x | 1.15x | > | 16 | 0 | 1.03x | 0.96x | 1.13x | > | 17 | 0 | 1.03x | 0.96x | 0.98x | > | 17 | 0 | 1.03x | 0.96x | 0.98x | > | 18 | 0 | 1.03x | 0.96x | 0.98x | > | 18 | 0 | 1.03x | 0.96x | 0.98x | > | 19 | 0 | 1.04x | 0.96x | 0.98x | > | 19 | 0 | 1.04x | 0.96x | 0.98x | > | 20 | 0 | 1.04x | 0.96x | 1.00x | > | 20 | 0 | 1.03x | 0.96x | 0.99x | > | 21 | 0 | 1.04x | 0.96x | 0.99x | > | 21 | 0 | 1.03x | 0.96x | 1.14x | > | 22 | 0 | 1.04x | 0.96x | 1.14x | > | 22 | 0 | 1.03x | 0.96x | 1.14x | > | 23 | 0 | 1.03x | 0.96x | 1.13x | > | 23 | 0 | 1.03x | 0.96x | 1.15x | > | 24 | 0 | 1.04x | 0.96x | 1.13x | > | 24 | 0 | 1.04x | 0.95x | 1.13x | > | 25 | 0 | 1.03x | 0.96x | 1.15x | > | 25 | 0 | 1.04x | 0.96x | 1.12x | > | 26 | 0 | 1.04x | 0.96x | 1.13x | > | 26 | 0 | 1.02x | 0.96x | 1.13x | > | 27 | 0 | 1.04x | 0.96x | 1.13x | > | 27 | 0 | 1.03x | 0.96x | 1.13x | > | 28 | 0 | 1.03x | 0.96x | 0.98x | > | 28 | 0 | 1.04x | 0.96x | 1.05x | > | 29 | 0 | 1.02x | 0.96x | 1.00x | > | 29 | 0 | 1.03x | 0.96x | 1.00x | > | 30 | 0 | 1.04x | 0.96x | 1.00x | > | 30 | 0 | 1.04x | 0.96x | 1.00x | > | 31 | 0 | 1.04x | 0.96x | 0.99x | > | 31 | 0 | 1.03x | 0.96x | 0.99x | > | 32 | 0 | 1.09x | 1.07x | 1.09x | > | 32 | 1 | 1.25x | 1.15x | 1.38x | > | 64 | 0 | 1.27x | 0.98x | 1.20x | > | 64 | 2 | 1.41x | 1.04x | 1.30x | > | 128 | 0 | 1.15x | 0.84x | 1.22x | > | 128 | 3 | 1.23x | 0.87x | 1.30x | > | 256 | 0 | 1.16x | 0.84x | 1.16x | > | 256 | 4 | 1.23x | 0.81x | 1.17x | > | 512 | 0 | 1.14x | 0.80x | 1.12x | > | 512 | 5 | 1.18x | 0.81x | 1.14x | > | 1024 | 0 | 1.16x | 0.78x | 1.09x | > | 1024 | 6 | 1.03x | 0.78x | 1.11x | > | 2048 | 0 | 1.14x | 0.76x | 1.08x | > | 2048 | 7 | 1.14x | 0.77x | 1.09x | > | 64 | 1 | 1.40x | 1.04x | 1.37x | > | 64 | 1 | 1.40x | 1.04x | 1.37x | > | 64 | 2 | 1.35x | 1.04x | 1.37x | > | 64 | 2 | 1.38x | 1.04x | 1.37x | > | 64 | 3 | 1.36x | 1.04x | 1.37x | > | 64 | 3 | 1.34x | 1.04x | 1.37x | > | 64 | 4 | 1.41x | 1.04x | 1.37x | > | 64 | 4 | 1.38x | 1.04x | 1.37x | > | 64 | 5 | 1.36x | 1.04x | 1.37x | > | 64 | 5 | 1.36x | 1.04x | 1.37x | > | 64 | 6 | 1.35x | 1.04x | 1.37x | > | 64 | 6 | 1.40x | 1.04x | 1.37x | > | 64 | 7 | 1.35x | 1.04x | 1.37x | > | 64 | 7 | 1.40x | 1.04x | 1.37x | > | 0 | 0 | 1.19x | 1.63x | 1.66x | > | 0 | 0 | 1.19x | 1.63x | 1.66x | > | 1 | 0 | 1.19x | 1.63x | 1.66x | > | 1 | 0 | 1.19x | 1.63x | 1.66x | > | 2 | 0 | 1.18x | 1.63x | 1.63x | > | 2 | 0 | 1.18x | 1.63x | 1.66x | > | 3 | 0 | 1.18x | 1.63x | 1.66x | > | 3 | 0 | 1.20x | 1.63x | 1.63x | > | 4 | 0 | 1.18x | 1.63x | 1.63x | > | 4 | 0 | 1.18x | 1.63x | 1.66x | > | 5 | 0 | 1.18x | 1.63x | 1.66x | > | 5 | 0 | 1.18x | 1.63x | 1.66x | > | 6 | 0 | 1.18x | 1.63x | 1.66x | > | 6 | 0 | 1.18x | 1.63x | 1.66x | > | 7 | 0 | 1.18x | 1.63x | 1.66x | > | 7 | 0 | 1.18x | 1.63x | 1.66x | > | 8 | 0 | 1.18x | 1.63x | 1.25x | > | 8 | 0 | 1.18x | 1.63x | 1.66x | > | 9 | 0 | 1.18x | 1.63x | 1.66x | > | 9 | 0 | 1.18x | 1.63x | 1.66x | > | 10 | 0 | 1.18x | 1.63x | 1.66x | > | 10 | 0 | 1.18x | 1.63x | 1.66x | > | 11 | 0 | 1.18x | 1.63x | 1.66x | > | 11 | 0 | 1.18x | 1.63x | 1.66x | > | 12 | 0 | 1.18x | 1.63x | 1.66x | > | 12 | 0 | 1.19x | 1.63x | 1.66x | > | 13 | 0 | 1.18x | 1.63x | 1.66x | > | 13 | 0 | 1.18x | 1.63x | 1.66x | > | 14 | 0 | 1.19x | 1.63x | 1.66x | > | 14 | 0 | 1.19x | 1.63x | 1.66x | > | 15 | 0 | 1.18x | 1.63x | 1.66x | > | 15 | 0 | 1.18x | 1.63x | 1.66x | > | 16 | 0 | 1.03x | 0.96x | 1.00x | > | 16 | 0 | 1.03x | 0.96x | 1.00x | > | 17 | 0 | 1.03x | 0.96x | 1.00x | > | 17 | 0 | 1.03x | 0.96x | 1.15x | > | 18 | 0 | 1.03x | 0.96x | 1.14x | > | 18 | 0 | 1.04x | 0.96x | 1.15x | > | 19 | 0 | 1.04x | 0.96x | 1.15x | > | 19 | 0 | 1.04x | 0.96x | 1.15x | > | 20 | 0 | 1.04x | 0.96x | 1.15x | > | 20 | 0 | 1.03x | 0.96x | 1.15x | > | 21 | 0 | 1.04x | 0.96x | 1.15x | > | 21 | 0 | 1.03x | 0.96x | 1.15x | > | 22 | 0 | 1.02x | 0.96x | 1.15x | > | 22 | 0 | 1.03x | 0.96x | 1.15x | > | 23 | 0 | 1.03x | 0.96x | 1.15x | > | 23 | 0 | 1.03x | 0.96x | 1.15x | > | 24 | 0 | 1.03x | 0.96x | 1.00x | > | 24 | 0 | 1.02x | 0.96x | 1.00x | > | 25 | 0 | 1.04x | 0.96x | 1.00x | > | 25 | 0 | 1.03x | 0.96x | 1.16x | > | 26 | 0 | 1.04x | 0.96x | 1.15x | > | 26 | 0 | 1.03x | 0.96x | 1.15x | > | 27 | 0 | 1.04x | 0.96x | 1.00x | > | 27 | 0 | 1.03x | 0.96x | 1.00x | > | 28 | 0 | 1.04x | 0.96x | 1.00x | > | 28 | 0 | 1.04x | 0.96x | 1.00x | > | 29 | 0 | 1.03x | 0.96x | 1.00x | > | 29 | 0 | 1.04x | 0.96x | 1.15x | > | 30 | 0 | 1.04x | 0.96x | 1.15x | > | 30 | 0 | 1.03x | 0.95x | 1.00x | > | 31 | 0 | 1.03x | 0.96x | 1.00x | > | 31 | 0 | 1.04x | 0.96x | 1.00x | > > This patch is passing GLIBC tests. > > Regards > > Andrea > > 8< --- 8< --- 8< > Introduce an Arm MTE compatible strchrnul implementation. > > Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show > performance regressions. > > Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com> >
The 06/03/2020 11:33, Adhemerval Zanella via Libc-alpha wrote: > On 03/06/2020 11:24, Florian Weimer via Libc-alpha wrote: > > * Andrea Corallo: > >> Florian Weimer <fweimer@redhat.com> writes: > >>> As a very high-level comment, I would expect some sort of markup in the > >>> file that this implementation is now MTE-safe, similar to what we have > >>> for executable stacks. > >>> > >>> Or do you plan to handle that in some other fashion? > >>> > >>> Thanks, > >>> Florian > >> > >> Hi Florian, > >> > >> Now the only markup is the comment on the top of the file stating the > >> MTE compatibility of the routine. > >> > >> I'm not aware of how this marking is done for executable stacks, perhaps > >> could you give an hook on where to look for? > > > > Typically, the -z noexecstack flag or a special .note.GNU-stack section > > is used for that. > > > >> Just to make sure we are one the same page wanted to add: these > >> functions are supposed to be backward compatible with what they are > >> replacing, so I'm not sure a marking is necessary. > > > > It's MTE that isn't backwards-compatible without such markup. > > Afaiu there is no need to add any marking for MTE, the main difference > it enforce 16-byte granularity read. I think you are confusing with > BTI, which does require the GNU note. in principle existing binaries may not be mte safe: (1) the top byte may be used by user code, (2) page size granularity may be assumed for memory protection. so it is a valid question if we want to mark binaries that are mte-safe to make mte an opt-in feature. the problem is that then we cannot use heap tagging for debugging heap corruption problems in existing binaries. so even if we apply an mte markup we have to allow the user to override that. we probably don't want heap tagging on by default since there is an overhead so in the end it will be a user choice if mte is enabled or not. the difference compared to noexecstack is that the runtime can fall back to executable stack if there are non-marked binaries, but with mte once it's on the runtime cannot fall back, just reject incompatible binaries (and it has to be on very early: before malloc calls are made). i don't know if it makes sense to introduce an object markup just for aborting / rejecting loading libraries when the user asked for mte.
The 06/03/2020 15:53, Szabolcs Nagy wrote: > The 06/03/2020 11:33, Adhemerval Zanella via Libc-alpha wrote: > > On 03/06/2020 11:24, Florian Weimer via Libc-alpha wrote: > > > * Andrea Corallo: > > >> Florian Weimer <fweimer@redhat.com> writes: > > >>> As a very high-level comment, I would expect some sort of markup in the > > >>> file that this implementation is now MTE-safe, similar to what we have > > >>> for executable stacks. > > >>> > > >>> Or do you plan to handle that in some other fashion? > > >>> > > >>> Thanks, > > >>> Florian > > >> > > >> Hi Florian, > > >> > > >> Now the only markup is the comment on the top of the file stating the > > >> MTE compatibility of the routine. > > >> > > >> I'm not aware of how this marking is done for executable stacks, perhaps > > >> could you give an hook on where to look for? > > > > > > Typically, the -z noexecstack flag or a special .note.GNU-stack section > > > is used for that. > > > > > >> Just to make sure we are one the same page wanted to add: these > > >> functions are supposed to be backward compatible with what they are > > >> replacing, so I'm not sure a marking is necessary. > > > > > > It's MTE that isn't backwards-compatible without such markup. > > > > Afaiu there is no need to add any marking for MTE, the main difference > > it enforce 16-byte granularity read. I think you are confusing with > > BTI, which does require the GNU note. > > in principle existing binaries may not be mte safe: > > (1) the top byte may be used by user code, > (2) page size granularity may be assumed for memory protection. > > so it is a valid question if we want to mark binaries that > are mte-safe to make mte an opt-in feature. > > the problem is that then we cannot use heap tagging for debugging > heap corruption problems in existing binaries. so even if we > apply an mte markup we have to allow the user to override that. > > we probably don't want heap tagging on by default since there > is an overhead so in the end it will be a user choice if mte > is enabled or not. > > the difference compared to noexecstack is that the runtime can > fall back to executable stack if there are non-marked binaries, > but with mte once it's on the runtime cannot fall back, just > reject incompatible binaries (and it has to be on very early: > before malloc calls are made). this is not entirely true: (2) can be fixed at runtime by keeping tags but turning tag checks off (it's a per thread setting so a bit tricky to do across all threads), but (1) cannot be fixed at runtime: if there are tagged pointers already being passed around we cannot get rid of them, so we have to reject the incompatible library. we could have separate markup for problems (1) and (2) but neither of them is easily discoverable by the compiler (well conforming c code is not supposed to do either), so i don't know how the markup would be added to object files in a reliable way. > > i don't know if it makes sense to introduce an object markup just > for aborting / rejecting loading libraries when the user asked > for mte. --
* Adhemerval Zanella via Libc-alpha: > Afaiu there is no need to add any marking for MTE, the main difference > it enforce 16-byte granularity read. I think you are confusing with > BTI, which does require the GNU note. If there is no compatibility issue, then why are these changes to the glibc string functions needed? Clearly I'm confused. Thanks, Florian
On 03/06/2020 11:53, Szabolcs Nagy wrote: > The 06/03/2020 11:33, Adhemerval Zanella via Libc-alpha wrote: >> On 03/06/2020 11:24, Florian Weimer via Libc-alpha wrote: >>> * Andrea Corallo: >>>> Florian Weimer <fweimer@redhat.com> writes: >>>>> As a very high-level comment, I would expect some sort of markup in the >>>>> file that this implementation is now MTE-safe, similar to what we have >>>>> for executable stacks. >>>>> >>>>> Or do you plan to handle that in some other fashion? >>>>> >>>>> Thanks, >>>>> Florian >>>> >>>> Hi Florian, >>>> >>>> Now the only markup is the comment on the top of the file stating the >>>> MTE compatibility of the routine. >>>> >>>> I'm not aware of how this marking is done for executable stacks, perhaps >>>> could you give an hook on where to look for? >>> >>> Typically, the -z noexecstack flag or a special .note.GNU-stack section >>> is used for that. >>> >>>> Just to make sure we are one the same page wanted to add: these >>>> functions are supposed to be backward compatible with what they are >>>> replacing, so I'm not sure a marking is necessary. >>> >>> It's MTE that isn't backwards-compatible without such markup. >> >> Afaiu there is no need to add any marking for MTE, the main difference >> it enforce 16-byte granularity read. I think you are confusing with >> BTI, which does require the GNU note. > > in principle existing binaries may not be mte safe: > > (1) the top byte may be used by user code, > (2) page size granularity may be assumed for memory protection. > > so it is a valid question if we want to mark binaries that > are mte-safe to make mte an opt-in feature. My understanding is MTE binary tagging support is a orthogonal discussion, although related. This change afaiu is mainly a algorithm change it is backwards compatible, it was motivated by MTE support but it could also be reviewed independently from its support. > > the problem is that then we cannot use heap tagging for debugging > heap corruption problems in existing binaries. so even if we > apply an mte markup we have to allow the user to override that. > > we probably don't want heap tagging on by default since there > is an overhead so in the end it will be a user choice if mte > is enabled or not. > > the difference compared to noexecstack is that the runtime can > fall back to executable stack if there are non-marked binaries, > but with mte once it's on the runtime cannot fall back, just > reject incompatible binaries (and it has to be on very early: > before malloc calls are made). > > i don't know if it makes sense to introduce an object markup just > for aborting / rejecting loading libraries when the user asked > for mte.
The 06/03/2020 17:04, Florian Weimer via Libc-alpha wrote: > * Adhemerval Zanella via Libc-alpha: > > > Afaiu there is no need to add any marking for MTE, the main difference > > it enforce 16-byte granularity read. I think you are confusing with > > BTI, which does require the GNU note. > > If there is no compatibility issue, then why are these changes to the > glibc string functions needed? > > Clearly I'm confused. string functions have problem (2): they assume that if an address is accessible then everything on that page is accessible via the same pointer (which is no longer true with mte). i think we can add the mte-safe string functions and tackle the abi compatibility issues separately: if we introduce mte-safe markups we will have to add that to all asm code to make libc.so marked. (which will be tricky since we have non-mte-safe ifunc variants that are only selected on non-mte hw so the code is not safe but way it is used is safe)
* Szabolcs Nagy: > The 06/03/2020 17:04, Florian Weimer via Libc-alpha wrote: >> * Adhemerval Zanella via Libc-alpha: >> >> > Afaiu there is no need to add any marking for MTE, the main difference >> > it enforce 16-byte granularity read. I think you are confusing with >> > BTI, which does require the GNU note. >> >> If there is no compatibility issue, then why are these changes to the >> glibc string functions needed? >> >> Clearly I'm confused. > > string functions have problem (2): they assume that > if an address is accessible then everything on that > page is accessible via the same pointer (which > is no longer true with mte). Ahh. > i think we can add the mte-safe string functions and > tackle the abi compatibility issues separately: I agree that markup is a separate issue. Thanks, Florian
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > On 03/06/2020 06:43, Andrea Corallo wrote: >> Hi all, >> >> I'd like to submit this patch introducing an Arm MTE compatible >> strchrnul implementation. >> >> Follows a performance comparison of the strchrnul benchmark run on >> Cortex-A72, Cortex-A53, Neoverse N1. > > How these performance numbers were calculated? Did you use glibc benchtests > or an external one? Yes the glibc benchtests has been used, forgot to mention sorry. > Besides it the commit message does not give an overall description of > which changes it does to the generic implementation neither the requirements > of MTE support. I'll improve the commit message as suggested. Regarding the MTE requirements given the function is backward compatible not sure what should be mentioned. Thanks Andrea
On Wed, Jun 3, 2020 at 9:02 AM Andrea Corallo <andrea.corallo@arm.com> wrote: > > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > > > On 03/06/2020 06:43, Andrea Corallo wrote: > >> Hi all, > >> > >> I'd like to submit this patch introducing an Arm MTE compatible > >> strchrnul implementation. > >> > >> Follows a performance comparison of the strchrnul benchmark run on > >> Cortex-A72, Cortex-A53, Neoverse N1. > > > > How these performance numbers were calculated? Did you use glibc benchtests > > or an external one? > > Yes the glibc benchtests has been used, forgot to mention sorry. > > > Besides it the commit message does not give an overall description of > > which changes it does to the generic implementation neither the requirements > > of MTE support. > > I'll improve the commit message as suggested. Regarding the MTE > requirements given the function is backward compatible not sure what > should be mentioned. > My impression is if some object files aren't MTE compatible, you can't enable MTE in the executable. Is that correct?
* H.J. Lu via Libc-alpha <libc-alpha@sourceware.org> [2020-06-04 05:04:23 -0700]: > On Wed, Jun 3, 2020 at 9:02 AM Andrea Corallo <andrea.corallo@arm.com> wrote: > > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > > > Besides it the commit message does not give an overall description of > > > which changes it does to the generic implementation neither the requirements > > > of MTE support. > > > > I'll improve the commit message as suggested. Regarding the MTE > > requirements given the function is backward compatible not sure what > > should be mentioned. > > > > My impression is if some object files aren't MTE compatible, you can't enable > MTE in the executable. Is that correct? there can be code that is not mte compatible that would crash when memory tagging and tag checking are enabled. currently there is no object file marking for compatibility. i think we should mention in the commit message in what way the old code is incompatible with mte (e.g. it does out of bound loads that can go across a tag granule boundaries which can cause tag check failures with mte)
On Thu, Jun 4, 2020 at 12:49 PM Szabolcs Nagy <nsz@port70.net> wrote: > > * H.J. Lu via Libc-alpha <libc-alpha@sourceware.org> [2020-06-04 05:04:23 -0700]: > > On Wed, Jun 3, 2020 at 9:02 AM Andrea Corallo <andrea.corallo@arm.com> wrote: > > > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > > > > Besides it the commit message does not give an overall description of > > > > which changes it does to the generic implementation neither the requirements > > > > of MTE support. > > > > > > I'll improve the commit message as suggested. Regarding the MTE > > > requirements given the function is backward compatible not sure what > > > should be mentioned. > > > > > > > My impression is if some object files aren't MTE compatible, you can't enable > > MTE in the executable. Is that correct? > > there can be code that is not mte compatible that would > crash when memory tagging and tag checking are enabled. > currently there is no object file marking for compatibility. > > i think we should mention in the commit message in what > way the old code is incompatible with mte (e.g. it does > out of bound loads that can go across a tag granule > boundaries which can cause tag check failures with mte) Should we add a marker to indicate that an object file is mte compatible?
The 06/04/2020 13:03, H.J. Lu via Libc-alpha wrote: > On Thu, Jun 4, 2020 at 12:49 PM Szabolcs Nagy <nsz@port70.net> wrote: > > > > * H.J. Lu via Libc-alpha <libc-alpha@sourceware.org> [2020-06-04 05:04:23 -0700]: > > > On Wed, Jun 3, 2020 at 9:02 AM Andrea Corallo <andrea.corallo@arm.com> wrote: > > > > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > > > > > Besides it the commit message does not give an overall description of > > > > > which changes it does to the generic implementation neither the requirements > > > > > of MTE support. > > > > > > > > I'll improve the commit message as suggested. Regarding the MTE > > > > requirements given the function is backward compatible not sure what > > > > should be mentioned. > > > > > > > > > > My impression is if some object files aren't MTE compatible, you can't enable > > > MTE in the executable. Is that correct? > > > > there can be code that is not mte compatible that would > > crash when memory tagging and tag checking are enabled. > > currently there is no object file marking for compatibility. > > > > i think we should mention in the commit message in what > > way the old code is incompatible with mte (e.g. it does > > out of bound loads that can go across a tag granule > > boundaries which can cause tag check failures with mte) > > Should we add a marker to indicate that an object file is > mte compatible? i think we will need a marking for 'compatible with tagged pointers' (on aarch64 pointer that's a new opt-in kernel abi and user code may use the top byte of pointers) and another one for 'compatible with 16byte granules'. e.g. the old string asm would have the first marking but not the second one. the details of the semantics should be discussed once we post the patches that turn mte on.
Hi, >> Should we add a marker to indicate that an object file is >> mte compatible? > > i think we will need a marking for 'compatible with tagged > pointers' (on aarch64 pointer that's a new opt-in kernel abi > and user code may use the top byte of pointers) and another > one for 'compatible with 16byte granules'. > > e.g. the old string asm would have the first marking but not > the second one. I don't see how adding different markings would help. String functions which are not MTE compatible simply cannot be used if MTE is enabled, and if they are not fixed or ifunced then all of GLIBC is not compatible with MTE. Compatibility is a dynamic property, ie. an incompatible string function is perfectly fine if it is ifunced. So having various different markings is not useful. Either way all of this is unrelated to these patches. Cheers, Wilco
The 06/05/2020 09:45, Wilco Dijkstra wrote: > Hi, > > >> Should we add a marker to indicate that an object file is > >> mte compatible? > > > > i think we will need a marking for 'compatible with tagged > > pointers' (on aarch64 pointer that's a new opt-in kernel abi > > and user code may use the top byte of pointers) and another > > one for 'compatible with 16byte granules'. > > > > e.g. the old string asm would have the first marking but not > > the second one. > > I don't see how adding different markings would help. String functions > which are not MTE compatible simply cannot be used if MTE is enabled, > and if they are not fixed or ifunced then all of GLIBC is not compatible > with MTE. > > Compatibility is a dynamic property, ie. an incompatible string function is > perfectly fine if it is ifunced. So having various different markings is not useful. > Either way all of this is unrelated to these patches. this is for user objects so we can print reasonable error when an incompatible lib is loaded or disable tag checking at startup time based on the marking. not for string functions
diff --git a/sysdeps/aarch64/strchrnul.S b/sysdeps/aarch64/strchrnul.S index a65be6cba8..1ae4598f82 100644 --- a/sysdeps/aarch64/strchrnul.S +++ b/sysdeps/aarch64/strchrnul.S @@ -22,109 +22,75 @@ /* Assumptions: * - * ARMv8-a, AArch64 - * Neon Available. + * ARMv8-a, AArch64, Advanced SIMD. + * MTE compatible. */ -/* Arguments and results. */ #define srcin x0 #define chrin w1 - #define result x0 -/* Locals and temporaries. */ - #define src x2 -#define tmp1 x3 -#define wtmp2 w4 -#define tmp3 x5 +#define tmp1 x1 +#define tmp2 x3 +#define tmp2w w3 #define vrepchr v0 -#define vdata1 v1 -#define vdata2 v2 -#define vhas_nul1 v3 -#define vhas_nul2 v4 -#define vhas_chr1 v5 -#define vhas_chr2 v6 -#define vrepmask v7 -#define vend1 v16 - -/* Core algorithm. - - For each 32-byte hunk we calculate a 64-bit syndrome value, with - two bits per byte (LSB is always in bits 0 and 1, for both big - and little-endian systems). For each tuple, bit 0 is set iff - the relevant byte matched the requested character or nul. Since the - bits in the syndrome reflect exactly the order in which things occur - in the original string a count_trailing_zeros() operation will - identify exactly which byte is causing the termination. */ +#define vdata v1 +#define qdata q1 +#define vhas_nul v2 +#define vhas_chr v3 +#define vrepmask v4 +#define vend v5 +#define dend d5 + +/* Core algorithm: + + For each 16-byte chunk we calculate a 64-bit syndrome value with four bits + per byte. For even bytes, bits 0-3 are set if the relevant byte matched the + requested character or the byte is NUL. Bits 4-7 must be zero. Bits 4-7 are + set likewise for odd bytes so that adjacent bytes can be merged. Since the + bits in the syndrome reflect the order in which things occur in the original + string, counting trailing zeros identifies exactly which byte matched. */ ENTRY (__strchrnul) DELOUSE (0) - /* Magic constant 0x40100401 to allow us to identify which lane - matches the termination condition. */ - mov wtmp2, #0x0401 - movk wtmp2, #0x4010, lsl #16 + bic src, srcin, 15 dup vrepchr.16b, chrin - bic src, srcin, #31 /* Work with aligned 32-byte hunks. */ - dup vrepmask.4s, wtmp2 - ands tmp1, srcin, #31 - b.eq L(loop) - - /* Input string is not 32-byte aligned. Rather than forcing - the padding bytes to a safe value, we calculate the syndrome - for all the bytes, but then mask off those bits of the - syndrome that are related to the padding. */ - ld1 {vdata1.16b, vdata2.16b}, [src], #32 - neg tmp1, tmp1 - cmeq vhas_nul1.16b, vdata1.16b, #0 - cmeq vhas_chr1.16b, vdata1.16b, vrepchr.16b - cmeq vhas_nul2.16b, vdata2.16b, #0 - cmeq vhas_chr2.16b, vdata2.16b, vrepchr.16b - orr vhas_chr1.16b, vhas_chr1.16b, vhas_nul1.16b - orr vhas_chr2.16b, vhas_chr2.16b, vhas_nul2.16b - and vhas_chr1.16b, vhas_chr1.16b, vrepmask.16b - and vhas_chr2.16b, vhas_chr2.16b, vrepmask.16b - lsl tmp1, tmp1, #1 - addp vend1.16b, vhas_chr1.16b, vhas_chr2.16b // 256->128 - mov tmp3, #~0 - addp vend1.16b, vend1.16b, vend1.16b // 128->64 - lsr tmp1, tmp3, tmp1 - - mov tmp3, vend1.2d[0] - bic tmp1, tmp3, tmp1 // Mask padding bits. - cbnz tmp1, L(tail) + ld1 {vdata.16b}, [src] + mov tmp2w, 0xf00f + dup vrepmask.8h, tmp2w + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_chr.16b, vhas_chr.16b, vdata.16b + lsl tmp2, srcin, 2 + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov tmp1, dend + lsr tmp1, tmp1, tmp2 /* Mask padding bits. */ + cbz tmp1, L(loop) + rbit tmp1, tmp1 + clz tmp1, tmp1 + add result, srcin, tmp1, lsr 2 + ret + + .p2align 4 L(loop): - ld1 {vdata1.16b, vdata2.16b}, [src], #32 - cmeq vhas_nul1.16b, vdata1.16b, #0 - cmeq vhas_chr1.16b, vdata1.16b, vrepchr.16b - cmeq vhas_nul2.16b, vdata2.16b, #0 - cmeq vhas_chr2.16b, vdata2.16b, vrepchr.16b - /* Use a fast check for the termination condition. */ - orr vhas_chr1.16b, vhas_nul1.16b, vhas_chr1.16b - orr vhas_chr2.16b, vhas_nul2.16b, vhas_chr2.16b - orr vend1.16b, vhas_chr1.16b, vhas_chr2.16b - addp vend1.2d, vend1.2d, vend1.2d - mov tmp1, vend1.2d[0] + ldr qdata, [src, 16]! + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_chr.16b, vhas_chr.16b, vdata.16b + umaxp vend.16b, vhas_chr.16b, vhas_chr.16b + fmov tmp1, dend cbz tmp1, L(loop) - /* Termination condition found. Now need to establish exactly why - we terminated. */ - and vhas_chr1.16b, vhas_chr1.16b, vrepmask.16b - and vhas_chr2.16b, vhas_chr2.16b, vrepmask.16b - addp vend1.16b, vhas_chr1.16b, vhas_chr2.16b // 256->128 - addp vend1.16b, vend1.16b, vend1.16b // 128->64 - - mov tmp1, vend1.2d[0] -L(tail): - /* Count the trailing zeros, by bit reversing... */ + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov tmp1, dend +#ifndef __AARCH64EB__ rbit tmp1, tmp1 - /* Re-bias source. */ - sub src, src, #32 - clz tmp1, tmp1 /* ... and counting the leading zeros. */ - /* tmp1 is twice the offset into the fragment. */ - add result, src, tmp1, lsr #1 +#endif + clz tmp1, tmp1 + add result, src, tmp1, lsr 2 ret END(__strchrnul)