From patchwork Fri Jun 5 15:18:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Corallo X-Patchwork-Id: 39482 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6AB633890417; Fri, 5 Jun 2020 15:19:13 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10056.outbound.protection.outlook.com [40.107.1.56]) by sourceware.org (Postfix) with ESMTPS id 527DA383E828 for ; Fri, 5 Jun 2020 15:19:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 527DA383E828 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Andrea.Corallo@arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JPLR40HKCt0akHN/AOR0pnKH7gBCE2UKamd7/cr1qcU=; b=PY42sPvt3SzbKL1QevPncXkhR19Xaxi78SHVE1D+WlfbAKJc2JBlZQV7BlgEZSiDAuBRDRGxPp7QwomJKuLDix9VaVAM/B6Gi+FonChCLi4zHa3jk+swTrIwfi4BMbnbNqPv77moOwKROlRGwGp5fV7v2uzvQZoXDxBurVLQvGU= Received: from DB6PR1001CA0013.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:4:b7::23) by DB7PR08MB3209.eurprd08.prod.outlook.com (2603:10a6:5:1e::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.20; Fri, 5 Jun 2020 15:19:07 +0000 Received: from DB5EUR03FT031.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:b7:cafe::66) by DB6PR1001CA0013.outlook.office365.com (2603:10a6:4:b7::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Fri, 5 Jun 2020 15:19:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; sourceware.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT031.mail.protection.outlook.com (10.152.20.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Fri, 5 Jun 2020 15:19:07 +0000 Received: ("Tessian outbound 3e82c366635e:v59"); Fri, 05 Jun 2020 15:19:07 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6778281b211eb098 X-CR-MTA-TID: 64aa7808 Received: from dab2a12cf746.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 81DE3F99-7DA5-4067-B99C-32C5F63980DD.1; Fri, 05 Jun 2020 15:19:01 +0000 Received: from EUR02-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id dab2a12cf746.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 05 Jun 2020 15:19:01 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GZh89lHl7kvX15V7um/OE/av5awnO1CYs56S4JrUQ3dYtyyuwJVT6VHJ9xhP4pHAQ5O4I+7YT2WG4EshhaKO0TaSc+m/kJfrGA+flsMxkK/+HXeMKpyqfukof86I9ad1sTl6fwqUtXNJk6zdCV82VzMXWNEyH39gzJ/zU4aIiPfVySOfzGUfEtNzX0dI7d0AAR+namJV9uI0G0UIzZNWx6h0Sq4V4tIjF3nVQuWM/j9yfvJB1QA9fo8TyZECk7pqfBj74CYDM2DKyk4pHArq2ri3WuooMoKQg2unYGjf73t2dni5yHXwOhx/8SQf5dtuq9AczWqw5kNB5yNe8ND3MQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JPLR40HKCt0akHN/AOR0pnKH7gBCE2UKamd7/cr1qcU=; b=RZaGC9UZaG/dewXrAm4db+iCZecDklGW6LjFltIHPo5QCFbtp/fpg7lW2AjWMOPyQptyWa55l4U2sYDw/JyraVwz3v3kcAHbPGnANYrLmJz5Bp8ljn/07fywVu3nzb1c8Nz6sQiJjZrXVmu2AzPygWkqasyKufwcjuJin1gNMd/kpcw6q1YOTss8Fb6G5jlF6SXEwwUKI86/wOxylMkUjTiSqLDAf7JYsF6S6nlnCKig7THpUFFJ+NZpWTinXl50lgYd14AKySdz4r/ILmzkkEYYFrBTorxhnY75R0r2iYZ6WYdhLzQ/sjImXVfmCj2b6NsOC83DSXN9PbUny42CSw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JPLR40HKCt0akHN/AOR0pnKH7gBCE2UKamd7/cr1qcU=; b=PY42sPvt3SzbKL1QevPncXkhR19Xaxi78SHVE1D+WlfbAKJc2JBlZQV7BlgEZSiDAuBRDRGxPp7QwomJKuLDix9VaVAM/B6Gi+FonChCLi4zHa3jk+swTrIwfi4BMbnbNqPv77moOwKROlRGwGp5fV7v2uzvQZoXDxBurVLQvGU= Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Received: from DB7PR08MB3594.eurprd08.prod.outlook.com (2603:10a6:10:4e::11) by DB7PR08MB3564.eurprd08.prod.outlook.com (2603:10a6:10:4f::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.20; Fri, 5 Jun 2020 15:18:59 +0000 Received: from DB7PR08MB3594.eurprd08.prod.outlook.com ([fe80::5447:f1c0:97c:aa35]) by DB7PR08MB3594.eurprd08.prod.outlook.com ([fe80::5447:f1c0:97c:aa35%7]) with mapi id 15.20.3045.024; Fri, 5 Jun 2020 15:18:59 +0000 From: Andrea Corallo To: libc-alpha@sourceware.org Subject: [PATCH v2] aarch64: MTE compatible strchrnul Date: Fri, 05 Jun 2020 17:18:49 +0200 Message-ID: X-ClientProxiedBy: SN4PR0701CA0005.namprd07.prod.outlook.com (2603:10b6:803:28::15) To DB7PR08MB3594.eurprd08.prod.outlook.com (2603:10a6:10:4e::11) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from e112547 (217.140.96.140) by SN4PR0701CA0005.namprd07.prod.outlook.com (2603:10b6:803:28::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3066.18 via Frontend Transport; Fri, 5 Jun 2020 15:18:56 +0000 X-Originating-IP: [217.140.96.140] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: f4a278cf-8a1d-46f5-f481-08d80963ca3a X-MS-TrafficTypeDiagnostic: DB7PR08MB3564:|DB7PR08MB3209: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:9508;OLM:9508; X-Forefront-PRVS: 0425A67DEF X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Knd7JEqXZ95bmp0et7JEl0nq2LLYVkzD2bGdMvB+WrxHL1kBcbRcI3bh2Re627amkqcD+/K9n5y1ruSszfA+EXbBkxsW1r8bByHseZ9ZgUtDvOPC/7ySF9XrUjLF2o0cwtChoMucVCUWoqaSYrwAOHJ9yfkLafhCoxLSWVGZmCJL4EgHoLvEPO5QNFZ+wp6jfyYAnGhYv+FFHW6kRqmxoxZdwH1D3b+kuuIms7EvOnETTdRd/ECuXvoC2yO9jFqSBU6OWXFndAZTXFPX2fp4G8DiV2wWQPFkpsk4O7yjBd+i6s+EkoayCuaSQxsPg1gx7rVj1vbWWPjBfM3T4L5LHUkHxKx21ua3J1iGBHwIr98qjRAb8s9UuPwuHJ5hw4Xc X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DB7PR08MB3594.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(136003)(396003)(366004)(39860400002)(346002)(376002)(44832011)(36756003)(4326008)(86362001)(8936002)(8676002)(66616009)(478600001)(956004)(66946007)(16526019)(2616005)(66556008)(5660300002)(2906002)(186003)(66476007)(6916009)(30864003)(316002)(6486002)(52116002)(6666004)(83380400001)(6496006)(26005)(156123004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: Z24S26RAYD+lYntg7bcINEVPGOUsra8k88uvxD1pKXCNnP3PpH54i+Nh/hdhvcT646uNkwGVvlouoPY43kbJSjzykzVTkQhNDsK9Xm+SdmWqCh0IUzYCaYZ5SNajWwQAtLrsNClSfQeiqwN0+4x3Wurj47fJb4J6IJqttSZdXrspQbrxqo+RTvWmi/XAtn2a5kdtxZdi6WKN8vEXeSV3cVM8dC9rzASnQfGybZVvY8svcms40Dqx+uNMNjjZAfJYJ7YT0LJ8FsaGxmbODV51g5TqSbOTKOlpfxOK0JBR9mDRpDIoUQBwRtepLPnZa70vMBX02HXKHH3KT2ACrmUsxSoWZZQ/GshtzoblVAPUm/7beVWvMKcnbtdqphLu0xoAYYu0csjf+eYBYHW5aX3vgqU0E6WpDFCN6ADCgj1B/ge+KfClCQMvvthaSwLwRxMY9ZDaOGtfPNih1BfTCH5wqNDgY5JxtWxtYw1H9JnIEVI= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3564 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT031.eop-EUR03.prod.protection.outlook.com X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFTY:; SFS:(4636009)(396003)(136003)(346002)(39860400002)(376002)(46966005)(6496006)(5660300002)(30864003)(16526019)(6666004)(186003)(70586007)(66616009)(86362001)(70206006)(336012)(2616005)(956004)(4326008)(81166007)(26005)(6486002)(316002)(8936002)(47076004)(36756003)(82740400003)(44832011)(82310400002)(2906002)(6916009)(8676002)(356005)(83380400001)(478600001)(156123004); DIR:OUT; SFP:1101; X-MS-Office365-Filtering-Correlation-Id-Prvs: d1d8e5ea-c46a-4a19-48f3-08d80963c546 X-Forefront-PRVS: 0425A67DEF X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: l2VwhA6M63Z9w8t0WCgrbEQ4Rc3HO/nAm3tziryYHwVgx/nGj/vijnANXDpZprClXsWBKP7xlEnlNt2ti7oC3XytyOCaJkYegAPZeDT4fVg+idGrRmUMKZBtOe9O4roBSWtSvSrBdZMNnZo6aztcGKp0AlFoGh/6bxr2TB8/UbaY9v9T39R1qVQH9mWuz6gpIWohJzq9MBBsUBfHY82uFpFQTRXQS9u7imsD89g4XFR0DQB/KYy9QCczIL8g1a4WPwBC3jfkaOI9+YzTXUf6PZFNyzb8VjrclhR8C3Yyu+gtsyZkYc9nQb6zCwfPVM5XlzWhBDXoHVtLNnLaGTCbU2KQ1a4UUkyg4zSHia/d3oSPtGNAPqK4fgssFIVlI0msKj43IRa/gXCd5kyrFqD8MIkqivS42NRFjG13mT25a9I= X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Jun 2020 15:19:07.2118 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f4a278cf-8a1d-46f5-f481-08d80963ca3a X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3209 X-Spam-Status: No, score=-18.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: nd@arm.com, Wilco.Dijkstra@arm.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Hi all, I'd like to submit this patch introducing an Arm MTE compatible strchrnul implementation. Follows a performance comparison (obtained using glibc benchtests) of the strchrnul benchmark run on Cortex-A72, Cortex-A53, Neoverse N1. | length | alignment | perf-uplift A72 | perf-uplift A53 | perf-uplift N1 | |--------+-----------+-----------------+-----------------+----------------| | 32 | 0 | 1.16x | 1.07x | 1.35x | | 32 | 1 | 1.25x | 1.16x | 1.15x | | 64 | 0 | 1.26x | 0.97x | 1.20x | | 64 | 2 | 1.35x | 1.04x | 1.30x | | 128 | 0 | 1.12x | 0.84x | 1.22x | | 128 | 3 | 1.25x | 0.87x | 1.30x | | 256 | 0 | 1.14x | 0.84x | 1.16x | | 256 | 4 | 1.24x | 0.81x | 1.16x | | 512 | 0 | 1.15x | 0.80x | 1.13x | | 512 | 5 | 1.17x | 0.81x | 1.14x | | 1024 | 0 | 1.14x | 0.78x | 1.08x | | 1024 | 6 | 1.03x | 0.78x | 1.10x | | 2048 | 0 | 1.12x | 0.76x | 1.08x | | 2048 | 7 | 1.14x | 0.77x | 1.09x | | 64 | 1 | 1.35x | 1.04x | 1.37x | | 64 | 1 | 1.36x | 1.04x | 1.37x | | 64 | 2 | 1.36x | 1.04x | 1.37x | | 64 | 2 | 1.37x | 1.04x | 1.38x | | 64 | 3 | 1.38x | 1.04x | 1.36x | | 64 | 3 | 1.40x | 1.04x | 1.36x | | 64 | 4 | 1.41x | 1.04x | 1.36x | | 64 | 4 | 1.36x | 1.04x | 1.36x | | 64 | 5 | 1.34x | 1.04x | 1.40x | | 64 | 5 | 1.35x | 1.04x | 1.36x | | 64 | 6 | 1.34x | 1.04x | 1.37x | | 64 | 6 | 1.41x | 1.04x | 1.37x | | 64 | 7 | 1.39x | 1.04x | 1.36x | | 64 | 7 | 1.34x | 1.04x | 1.37x | | 0 | 0 | 1.18x | 1.63x | 1.66x | | 0 | 0 | 1.18x | 1.63x | 1.66x | | 1 | 0 | 1.18x | 1.63x | 1.66x | | 1 | 0 | 1.18x | 1.63x | 1.67x | | 2 | 0 | 1.18x | 1.63x | 1.66x | | 2 | 0 | 1.18x | 1.63x | 1.65x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 4 | 0 | 1.18x | 1.63x | 1.65x | | 4 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.64x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.65x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.64x | | 11 | 0 | 1.18x | 1.63x | 1.63x | | 12 | 0 | 1.18x | 1.63x | 1.63x | | 12 | 0 | 1.18x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.63x | | 13 | 0 | 1.18x | 1.63x | 1.63x | | 14 | 0 | 1.18x | 1.63x | 1.63x | | 14 | 0 | 1.18x | 1.63x | 1.22x | | 15 | 0 | 1.19x | 1.63x | 1.22x | | 15 | 0 | 1.18x | 1.63x | 1.63x | | 16 | 0 | 1.03x | 0.96x | 1.15x | | 16 | 0 | 1.03x | 0.96x | 1.13x | | 17 | 0 | 1.03x | 0.96x | 0.98x | | 17 | 0 | 1.03x | 0.96x | 0.98x | | 18 | 0 | 1.03x | 0.96x | 0.98x | | 18 | 0 | 1.03x | 0.96x | 0.98x | | 19 | 0 | 1.04x | 0.96x | 0.98x | | 19 | 0 | 1.04x | 0.96x | 0.98x | | 20 | 0 | 1.04x | 0.96x | 1.00x | | 20 | 0 | 1.03x | 0.96x | 0.99x | | 21 | 0 | 1.04x | 0.96x | 0.99x | | 21 | 0 | 1.03x | 0.96x | 1.14x | | 22 | 0 | 1.04x | 0.96x | 1.14x | | 22 | 0 | 1.03x | 0.96x | 1.14x | | 23 | 0 | 1.03x | 0.96x | 1.13x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 24 | 0 | 1.04x | 0.96x | 1.13x | | 24 | 0 | 1.04x | 0.95x | 1.13x | | 25 | 0 | 1.03x | 0.96x | 1.15x | | 25 | 0 | 1.04x | 0.96x | 1.12x | | 26 | 0 | 1.04x | 0.96x | 1.13x | | 26 | 0 | 1.02x | 0.96x | 1.13x | | 27 | 0 | 1.04x | 0.96x | 1.13x | | 27 | 0 | 1.03x | 0.96x | 1.13x | | 28 | 0 | 1.03x | 0.96x | 0.98x | | 28 | 0 | 1.04x | 0.96x | 1.05x | | 29 | 0 | 1.02x | 0.96x | 1.00x | | 29 | 0 | 1.03x | 0.96x | 1.00x | | 30 | 0 | 1.04x | 0.96x | 1.00x | | 30 | 0 | 1.04x | 0.96x | 1.00x | | 31 | 0 | 1.04x | 0.96x | 0.99x | | 31 | 0 | 1.03x | 0.96x | 0.99x | | 32 | 0 | 1.09x | 1.07x | 1.09x | | 32 | 1 | 1.25x | 1.15x | 1.38x | | 64 | 0 | 1.27x | 0.98x | 1.20x | | 64 | 2 | 1.41x | 1.04x | 1.30x | | 128 | 0 | 1.15x | 0.84x | 1.22x | | 128 | 3 | 1.23x | 0.87x | 1.30x | | 256 | 0 | 1.16x | 0.84x | 1.16x | | 256 | 4 | 1.23x | 0.81x | 1.17x | | 512 | 0 | 1.14x | 0.80x | 1.12x | | 512 | 5 | 1.18x | 0.81x | 1.14x | | 1024 | 0 | 1.16x | 0.78x | 1.09x | | 1024 | 6 | 1.03x | 0.78x | 1.11x | | 2048 | 0 | 1.14x | 0.76x | 1.08x | | 2048 | 7 | 1.14x | 0.77x | 1.09x | | 64 | 1 | 1.40x | 1.04x | 1.37x | | 64 | 1 | 1.40x | 1.04x | 1.37x | | 64 | 2 | 1.35x | 1.04x | 1.37x | | 64 | 2 | 1.38x | 1.04x | 1.37x | | 64 | 3 | 1.36x | 1.04x | 1.37x | | 64 | 3 | 1.34x | 1.04x | 1.37x | | 64 | 4 | 1.41x | 1.04x | 1.37x | | 64 | 4 | 1.38x | 1.04x | 1.37x | | 64 | 5 | 1.36x | 1.04x | 1.37x | | 64 | 5 | 1.36x | 1.04x | 1.37x | | 64 | 6 | 1.35x | 1.04x | 1.37x | | 64 | 6 | 1.40x | 1.04x | 1.37x | | 64 | 7 | 1.35x | 1.04x | 1.37x | | 64 | 7 | 1.40x | 1.04x | 1.37x | | 0 | 0 | 1.19x | 1.63x | 1.66x | | 0 | 0 | 1.19x | 1.63x | 1.66x | | 1 | 0 | 1.19x | 1.63x | 1.66x | | 1 | 0 | 1.19x | 1.63x | 1.66x | | 2 | 0 | 1.18x | 1.63x | 1.63x | | 2 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.18x | 1.63x | 1.66x | | 3 | 0 | 1.20x | 1.63x | 1.63x | | 4 | 0 | 1.18x | 1.63x | 1.63x | | 4 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 5 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 6 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 7 | 0 | 1.18x | 1.63x | 1.66x | | 8 | 0 | 1.18x | 1.63x | 1.25x | | 8 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 9 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 10 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.66x | | 11 | 0 | 1.18x | 1.63x | 1.66x | | 12 | 0 | 1.18x | 1.63x | 1.66x | | 12 | 0 | 1.19x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.66x | | 13 | 0 | 1.18x | 1.63x | 1.66x | | 14 | 0 | 1.19x | 1.63x | 1.66x | | 14 | 0 | 1.19x | 1.63x | 1.66x | | 15 | 0 | 1.18x | 1.63x | 1.66x | | 15 | 0 | 1.18x | 1.63x | 1.66x | | 16 | 0 | 1.03x | 0.96x | 1.00x | | 16 | 0 | 1.03x | 0.96x | 1.00x | | 17 | 0 | 1.03x | 0.96x | 1.00x | | 17 | 0 | 1.03x | 0.96x | 1.15x | | 18 | 0 | 1.03x | 0.96x | 1.14x | | 18 | 0 | 1.04x | 0.96x | 1.15x | | 19 | 0 | 1.04x | 0.96x | 1.15x | | 19 | 0 | 1.04x | 0.96x | 1.15x | | 20 | 0 | 1.04x | 0.96x | 1.15x | | 20 | 0 | 1.03x | 0.96x | 1.15x | | 21 | 0 | 1.04x | 0.96x | 1.15x | | 21 | 0 | 1.03x | 0.96x | 1.15x | | 22 | 0 | 1.02x | 0.96x | 1.15x | | 22 | 0 | 1.03x | 0.96x | 1.15x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 23 | 0 | 1.03x | 0.96x | 1.15x | | 24 | 0 | 1.03x | 0.96x | 1.00x | | 24 | 0 | 1.02x | 0.96x | 1.00x | | 25 | 0 | 1.04x | 0.96x | 1.00x | | 25 | 0 | 1.03x | 0.96x | 1.16x | | 26 | 0 | 1.04x | 0.96x | 1.15x | | 26 | 0 | 1.03x | 0.96x | 1.15x | | 27 | 0 | 1.04x | 0.96x | 1.00x | | 27 | 0 | 1.03x | 0.96x | 1.00x | | 28 | 0 | 1.04x | 0.96x | 1.00x | | 28 | 0 | 1.04x | 0.96x | 1.00x | | 29 | 0 | 1.03x | 0.96x | 1.00x | | 29 | 0 | 1.04x | 0.96x | 1.15x | | 30 | 0 | 1.04x | 0.96x | 1.15x | | 30 | 0 | 1.03x | 0.95x | 1.00x | | 31 | 0 | 1.03x | 0.96x | 1.00x | | 31 | 0 | 1.04x | 0.96x | 1.00x | This patch is passing GLIBC tests. Regards Andrea 8< --- 8< --- 8< Introduce an Arm MTE compatible strchrnul implementation. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1. Co-authored-by: Wilco Dijkstra diff --git a/sysdeps/aarch64/strchrnul.S b/sysdeps/aarch64/strchrnul.S index a65be6cba8..1ae4598f82 100644 --- a/sysdeps/aarch64/strchrnul.S +++ b/sysdeps/aarch64/strchrnul.S @@ -22,109 +22,75 @@ /* Assumptions: * - * ARMv8-a, AArch64 - * Neon Available. + * ARMv8-a, AArch64, Advanced SIMD. + * MTE compatible. */ -/* Arguments and results. */ #define srcin x0 #define chrin w1 - #define result x0 -/* Locals and temporaries. */ - #define src x2 -#define tmp1 x3 -#define wtmp2 w4 -#define tmp3 x5 +#define tmp1 x1 +#define tmp2 x3 +#define tmp2w w3 #define vrepchr v0 -#define vdata1 v1 -#define vdata2 v2 -#define vhas_nul1 v3 -#define vhas_nul2 v4 -#define vhas_chr1 v5 -#define vhas_chr2 v6 -#define vrepmask v7 -#define vend1 v16 - -/* Core algorithm. - - For each 32-byte hunk we calculate a 64-bit syndrome value, with - two bits per byte (LSB is always in bits 0 and 1, for both big - and little-endian systems). For each tuple, bit 0 is set iff - the relevant byte matched the requested character or nul. Since the - bits in the syndrome reflect exactly the order in which things occur - in the original string a count_trailing_zeros() operation will - identify exactly which byte is causing the termination. */ +#define vdata v1 +#define qdata q1 +#define vhas_nul v2 +#define vhas_chr v3 +#define vrepmask v4 +#define vend v5 +#define dend d5 + +/* Core algorithm: + + For each 16-byte chunk we calculate a 64-bit syndrome value with four bits + per byte. For even bytes, bits 0-3 are set if the relevant byte matched the + requested character or the byte is NUL. Bits 4-7 must be zero. Bits 4-7 are + set likewise for odd bytes so that adjacent bytes can be merged. Since the + bits in the syndrome reflect the order in which things occur in the original + string, counting trailing zeros identifies exactly which byte matched. */ ENTRY (__strchrnul) DELOUSE (0) - /* Magic constant 0x40100401 to allow us to identify which lane - matches the termination condition. */ - mov wtmp2, #0x0401 - movk wtmp2, #0x4010, lsl #16 + bic src, srcin, 15 dup vrepchr.16b, chrin - bic src, srcin, #31 /* Work with aligned 32-byte hunks. */ - dup vrepmask.4s, wtmp2 - ands tmp1, srcin, #31 - b.eq L(loop) - - /* Input string is not 32-byte aligned. Rather than forcing - the padding bytes to a safe value, we calculate the syndrome - for all the bytes, but then mask off those bits of the - syndrome that are related to the padding. */ - ld1 {vdata1.16b, vdata2.16b}, [src], #32 - neg tmp1, tmp1 - cmeq vhas_nul1.16b, vdata1.16b, #0 - cmeq vhas_chr1.16b, vdata1.16b, vrepchr.16b - cmeq vhas_nul2.16b, vdata2.16b, #0 - cmeq vhas_chr2.16b, vdata2.16b, vrepchr.16b - orr vhas_chr1.16b, vhas_chr1.16b, vhas_nul1.16b - orr vhas_chr2.16b, vhas_chr2.16b, vhas_nul2.16b - and vhas_chr1.16b, vhas_chr1.16b, vrepmask.16b - and vhas_chr2.16b, vhas_chr2.16b, vrepmask.16b - lsl tmp1, tmp1, #1 - addp vend1.16b, vhas_chr1.16b, vhas_chr2.16b // 256->128 - mov tmp3, #~0 - addp vend1.16b, vend1.16b, vend1.16b // 128->64 - lsr tmp1, tmp3, tmp1 - - mov tmp3, vend1.2d[0] - bic tmp1, tmp3, tmp1 // Mask padding bits. - cbnz tmp1, L(tail) + ld1 {vdata.16b}, [src] + mov tmp2w, 0xf00f + dup vrepmask.8h, tmp2w + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_chr.16b, vhas_chr.16b, vdata.16b + lsl tmp2, srcin, 2 + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov tmp1, dend + lsr tmp1, tmp1, tmp2 /* Mask padding bits. */ + cbz tmp1, L(loop) + rbit tmp1, tmp1 + clz tmp1, tmp1 + add result, srcin, tmp1, lsr 2 + ret + + .p2align 4 L(loop): - ld1 {vdata1.16b, vdata2.16b}, [src], #32 - cmeq vhas_nul1.16b, vdata1.16b, #0 - cmeq vhas_chr1.16b, vdata1.16b, vrepchr.16b - cmeq vhas_nul2.16b, vdata2.16b, #0 - cmeq vhas_chr2.16b, vdata2.16b, vrepchr.16b - /* Use a fast check for the termination condition. */ - orr vhas_chr1.16b, vhas_nul1.16b, vhas_chr1.16b - orr vhas_chr2.16b, vhas_nul2.16b, vhas_chr2.16b - orr vend1.16b, vhas_chr1.16b, vhas_chr2.16b - addp vend1.2d, vend1.2d, vend1.2d - mov tmp1, vend1.2d[0] + ldr qdata, [src, 16]! + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b + cmhs vhas_chr.16b, vhas_chr.16b, vdata.16b + umaxp vend.16b, vhas_chr.16b, vhas_chr.16b + fmov tmp1, dend cbz tmp1, L(loop) - /* Termination condition found. Now need to establish exactly why - we terminated. */ - and vhas_chr1.16b, vhas_chr1.16b, vrepmask.16b - and vhas_chr2.16b, vhas_chr2.16b, vrepmask.16b - addp vend1.16b, vhas_chr1.16b, vhas_chr2.16b // 256->128 - addp vend1.16b, vend1.16b, vend1.16b // 128->64 - - mov tmp1, vend1.2d[0] -L(tail): - /* Count the trailing zeros, by bit reversing... */ + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ + fmov tmp1, dend +#ifndef __AARCH64EB__ rbit tmp1, tmp1 - /* Re-bias source. */ - sub src, src, #32 - clz tmp1, tmp1 /* ... and counting the leading zeros. */ - /* tmp1 is twice the offset into the fragment. */ - add result, src, tmp1, lsr #1 +#endif + clz tmp1, tmp1 + add result, src, tmp1, lsr 2 ret END(__strchrnul)