From patchwork Fri Feb 3 13:05:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 64249 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5DC19385841C for ; Fri, 3 Feb 2023 13:07:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5DC19385841C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1675429623; bh=i4wEwYgy7bZiqHNAeMxPMJnHORUYhB43ajN4Bs2BshQ=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=lD5CDZmAnjCg8ao5pM5zXjUq05tN1Tvbv5kULZtZSiUB1NH0jl/oxxPR7tnALIVrP E2Ha1/B7JGoHDHux8rGCXb9qPH14F+pMnY87qYlvEAtkpb0Z09T24T4BJUulbHG645 VqpqJBfSmpqpKizEzZQjyJOg/nN59s3qHy+VDwhw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2084.outbound.protection.outlook.com [40.107.249.84]) by sourceware.org (Postfix) with ESMTPS id 3E6A938582AB for ; Fri, 3 Feb 2023 13:05:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3E6A938582AB Received: from AM6PR0502CA0072.eurprd05.prod.outlook.com (2603:10a6:20b:56::49) by DU0PR08MB9511.eurprd08.prod.outlook.com (2603:10a6:10:44d::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.23; Fri, 3 Feb 2023 13:05:54 +0000 Received: from AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:56:cafe::b4) by AM6PR0502CA0072.outlook.office365.com (2603:10a6:20b:56::49) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.28 via Frontend Transport; Fri, 3 Feb 2023 13:05:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT040.mail.protection.outlook.com (100.127.140.128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.28 via Frontend Transport; Fri, 3 Feb 2023 13:05:54 +0000 Received: ("Tessian outbound b1d3ffe56e73:v132"); Fri, 03 Feb 2023 13:05:54 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 4625dbdb2029c0c8 X-CR-MTA-TID: 64aa7808 Received: from 556b785cd7d7.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 911C7221-095D-4669-84DF-D23B834B69B5.1; Fri, 03 Feb 2023 13:05:43 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 556b785cd7d7.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 03 Feb 2023 13:05:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kKeBJLioln4g/6f2DOpCaMrcxZOnQfKgVmTFagLvcDw1/1QwhzkXLs0Z5Yqrq7OtJDTDapED3irIyVn3waGCKw82wL1Aqjr3VLZ/44IdM6yPwO4Nu3i8mLU7UFaxEZ9Gw7ut3ge+NpBANp3do/iSNZTFuw8uZeUKtITz/flk0/OSnYzx5YfzxIVnp8jZ2RcsHagls1OTQ6fIPX3bPwLNktafCMp+Zv1olXDGiDicQrs8GFnqGpqvNeDJz+JXmcYoFEi5i9o0TPACdyHD41Ly9+z6W01zMM6V+/gq2f32mJRSpU+xMJXSIWJsMiKFdC/tK6r3FLvlpOTetX4CCEyQiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=i4wEwYgy7bZiqHNAeMxPMJnHORUYhB43ajN4Bs2BshQ=; b=jXO2KYUt2w/r5F7ReRvMMV1juDjEdGNir/i7bGzGuOtcYJTbnZYAWFjompbeNHleFJr3oxTkX4QpP/UwJ07ELEBUsBY0VFN+H7SG6d1dDVU3eV/soSrgUIvWUUSvexKMPG5rROhcZiE58w5fMm9nryfomD5+uSPlMlyNdl/Ciais4+pdjO8FPZ7cylpPy267AAEMZmtuZDhKTsplANxw+STADt9loNMvmpEN9MV8dVw/8ogeudZOOzx/7gbI0KiFfAizIEPd/75bMF8eCVFHrmgk1B0Fz2szBJ44dE1hLqkOT2uORDApax5VG+vWXwST51JsTax7MLl++1PTX8ddIw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by GV1PR08MB7899.eurprd08.prod.outlook.com (2603:10a6:150:5e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.28; Fri, 3 Feb 2023 13:05:40 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::dc17:8fa2:cce5:3573]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::dc17:8fa2:cce5:3573%8]) with mapi id 15.20.6064.024; Fri, 3 Feb 2023 13:05:39 +0000 To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Improve SVE memcpy and memmove Thread-Topic: [PATCH] AArch64: Improve SVE memcpy and memmove Thread-Index: AQHZN9AY4tw+5ZHKQE6EhdG6ShhZ0Q== Date: Fri, 3 Feb 2023 13:05:39 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|GV1PR08MB7899:EE_|AM7EUR03FT040:EE_|DU0PR08MB9511:EE_ X-MS-Office365-Filtering-Correlation-Id: f31c1489-39d9-40ae-bf83-08db05e76234 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 15tLyJ7M5G8j8wPrUsBb8Bv6ROtvzFbym1uz72piZo5I7MUvU48oPVvHX6kWnsrA3d+zdH3Ci2DmRbzz9LpGa+tN4ywu/GhR65C9ZhCtTZoEniAtPc2fKmMyY//7B/WxTv58z7Jo2n04ldfoItKdKQctn4A4lZkCwGcmJaNYhVqD8oLXY1ttk7Adpik9CpN0+MGhxWlMREafNNyhkk74zme+DHAEnF/XUKlKb+zlX71asLaoW6kV2RmlfETG8eF/fRWWOUnxap4oZomnoRYgUhMEZeaNZkCjnNUkAkjFAFWwF3r7FCcLBID99XG/jYDl/bJHlR55krs0CuGaaLm5DzG4HjAnOoI7SzGLkUKG3eeOsYB8uxyER0+U4zbHo45/8I9Z0F1RkI8QRDDqo/bp+6iS9r28U1BmkVBMNAWCOIJkLtnoo6j7rC22adl+RnDDt3r1aFt4uvTGn7+7ChA0ps3iqIt0u+WFfCn6TteCXHJFJiR2+qd1GKVbvi5vGDuROAdGD9i/ec9UPC2kPCnQ5tGUylmVc7EELnM6pQaVwiuGf7xHhe/RhbEUDU2/sk2Krb/jXZ6OsvJoHVFhBMYY90fRFUStD9xk71Z5M6iDiu5NebrhGqNVFkhMC5ZRM8P75GwFC+eJszP/xn+FK3Sk2I8bTCx/9n5E5PeMbgMXQ/NR8lQzfij/9AHOTYYPrfIP85TpFLJviPQXT/+LzCwWY2qYekBuFkOTj8RRbQiegiM= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(4636009)(136003)(396003)(39860400002)(366004)(346002)(376002)(451199018)(41300700001)(38070700005)(33656002)(6916009)(4326008)(8676002)(66446008)(86362001)(316002)(91956017)(66946007)(76116006)(55016003)(478600001)(66556008)(66476007)(7696005)(64756008)(5660300002)(26005)(186003)(122000001)(9686003)(2906002)(71200400001)(52536014)(6506007)(38100700002)(8936002)(357404004); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB7899 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a15e87f1-7cad-44c2-f3ae-08db05e75907 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QuTj7lG5hH3tTprQtTRQHJ94MlRfyYetLkaNuz0YT7k0bpLoEMOH/j5Q8atiKnP4Gb8tUMl2oO9OgAwRvz7ksgaUbdbMssYhFtSmorYraGIfUFjpsl3vy93aLopJpKFpV0kJvkwonuKLMU9p97OEiHETc8o68O0/RhBdPG3PPbGQi0Nu/pzPcJMqPzWCIWwVVf6bkmYKci/rrPLixLTuc1Dv9JTMplG8EoSoFvlpHlMtcFrFtNhMGamuyjIsGv6T/GPCPjb8lHn+jALw8bgHVI+Oa80zXr8XXh1E9iJl3MkPc7qGCr9Xi2xmVBL1D4riFhYtQHcppFfgFc7vZ50V1lrKjj0yvFHuTViBrGuq6MluOK1do8HTPvAnRcb4wdadDjVcSzwPJp9ytUIhQxuN6MEs0ER7v03uKFCPtRFTtPl0XXlgauojRe7s0R3yNcSrmnrVmHuC6OpcMN1qkkWRPHqF5YfQkG4wwXHyYV+JD+RZsPi+ZOIBH+I4SUAotLwjnWiRwNiOUMv7yWmfS26ZIirE0/MIvwm3+fBhROCZxRWFUi9G+J50fRsV9DR3KtSyBeewQn8HJMKr6F5vfhE/WQRjtQqIAZIGfpWL6URwZk2FpbjJzGokoUUaQFM1C07jmCqJ+xzgYoM6m3pXCwcOphKi4EZZAC5fxwGOHA+6lg6OcG/yL/8JzBb5nPm8JYZWaqrBvuA142n9nLsdO/D9//cZWEy2FVYl2D3VA6pxgCQ= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230025)(4636009)(376002)(136003)(39860400002)(346002)(396003)(451199018)(46966006)(40470700004)(36840700001)(2906002)(26005)(82740400003)(6506007)(70586007)(41300700001)(81166007)(52536014)(86362001)(316002)(40480700001)(55016003)(5660300002)(4326008)(8936002)(33656002)(356005)(70206006)(478600001)(8676002)(7696005)(40460700003)(9686003)(186003)(47076005)(6916009)(36860700001)(82310400005)(336012)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2023 13:05:54.5796 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f31c1489-39d9-40ae-bf83-08db05e76234 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9511 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Improve SVE memcpy/memmove by copying 2 vectors if the size is small enough. This improves performance of random memcpy by ~9% on Neoverse V1, and memcpy/memmove of 33-64 bytes become ~16% faster. Passes regress, OK for commit? Reviewed-by: Szabolcs Nagy diff --git a/sysdeps/aarch64/multiarch/memcpy_sve.S b/sysdeps/aarch64/multiarch/memcpy_sve.S index f4dc214f60bf25e818eb6b8de2d4093ad0c886e1..d11be6a44301af4bfd7fa4900555b769dc58d34d 100644 --- a/sysdeps/aarch64/multiarch/memcpy_sve.S +++ b/sysdeps/aarch64/multiarch/memcpy_sve.S @@ -67,14 +67,15 @@ ENTRY (__memcpy_sve) cmp count, 128 b.hi L(copy_long) - cmp count, 32 + cntb vlen + cmp count, vlen, lsl 1 b.hi L(copy32_128) - whilelo p0.b, xzr, count - cntb vlen - tbnz vlen, 4, L(vlen128) - ld1b z0.b, p0/z, [src] - st1b z0.b, p0, [dstin] + whilelo p1.b, vlen, count + ld1b z0.b, p0/z, [src, 0, mul vl] + ld1b z1.b, p1/z, [src, 1, mul vl] + st1b z0.b, p0, [dstin, 0, mul vl] + st1b z1.b, p1, [dstin, 1, mul vl] ret /* Medium copies: 33..128 bytes. */ @@ -102,14 +103,6 @@ L(copy96): stp C_q, D_q, [dstend, -32] ret -L(vlen128): - whilelo p1.b, vlen, count - ld1b z0.b, p0/z, [src, 0, mul vl] - ld1b z1.b, p1/z, [src, 1, mul vl] - st1b z0.b, p0, [dstin, 0, mul vl] - st1b z1.b, p1, [dstin, 1, mul vl] - ret - .p2align 4 /* Copy more than 128 bytes. */ L(copy_long): @@ -158,14 +151,15 @@ ENTRY (__memmove_sve) cmp count, 128 b.hi L(move_long) - cmp count, 32 + cntb vlen + cmp count, vlen, lsl 1 b.hi L(copy32_128) - whilelo p0.b, xzr, count - cntb vlen - tbnz vlen, 4, L(vlen128) - ld1b z0.b, p0/z, [src] - st1b z0.b, p0, [dstin] + whilelo p1.b, vlen, count + ld1b z0.b, p0/z, [src, 0, mul vl] + ld1b z1.b, p1/z, [src, 1, mul vl] + st1b z0.b, p0, [dstin, 0, mul vl] + st1b z1.b, p1, [dstin, 1, mul vl] ret .p2align 4