From patchwork Wed Oct 12 15:19:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 58712 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3C88C3860C3C for ; Wed, 12 Oct 2022 15:20:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3C88C3860C3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665588023; bh=r6S5NdVJzogUpC6SP9OLAhzVtaSwfGHL1P7HXqLcQ+w=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=QL8VO82vQYDA2/EN90dLEsfkcwXIQRvW1R7FmFML2MUMyU/3QZs83G4oN9XfykUiM kv8iYZ5NE2iqMOnTXaU7UMX1LtKEgg/YNIRhBoY7iGRm2b017ZDkw0FxGBgtfSXSxz 2xwl56jZneDba8imjiq0VW3V8R3W2+0o149fUTPM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20048.outbound.protection.outlook.com [40.107.2.48]) by sourceware.org (Postfix) with ESMTPS id 8D2C93857372 for ; Wed, 12 Oct 2022 15:19:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8D2C93857372 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=mrZ76uELbX+2/xTGPZwg+iYhoDFHhOY+Xm2ZUBdR7Xdx/GVx1LiVwVKrYNnAeYbzysVOfQ5WEZQziysSbncPSR/QQLduYXrb5z0wpKRoRLrW6BEYGMl+aozqZsrooZsWjwKkrMkrcnoZBSu4hhyQeWuq/nAo0zQKkM9661MbGVcoumkjue9XkoNqf/mmPUqv0VKv+rcnaU+5RXPLmZboyv8+ZTn4U06zw6LrRigbVt0BT46SRYVm/KmDxUtpGwzFNs8QqFnq05GMEhsQ7y2e6ZtCopXPbDWBs1wzY+heGcigPtH3m8a1B5/9PYSLzfzwGee/WQuQwib9LZGnYNt39A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=r6S5NdVJzogUpC6SP9OLAhzVtaSwfGHL1P7HXqLcQ+w=; b=Wue68+kWFbJlObEPq3BApTGWesFnhxyUg0hWfcEqTBYjjRdRaBqxZNCEkaPmNSRRK7XH6WA3n4cTVLPXOxkHqyJViqBAuMwTINGlfcBbo6TIUP8MehisDCsJ/uUCZwE83Hu9/OpJgaRnOGRJSLhAX5XthVI4a6nYcXhNwIsH9RQzqOb7hH1UtTGW86Il7E5Zz4pfdi+/UD+GjsQjnICn515UlIk+cp3sQ8X4t3zlCP/ngyKbqHhC2qvmiqzRUPseEpJtwdtpZemMb6i8YiIjlzyxiu+v0oZR5THwhPEY3pDSmDiELsiu1I1lORvhUEWckf8Yj+oI9iHw4eFKAuCWog== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from DU2PR04CA0238.eurprd04.prod.outlook.com (2603:10a6:10:2b1::33) by GV2PR08MB9421.eurprd08.prod.outlook.com (2603:10a6:150:dd::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.21; Wed, 12 Oct 2022 15:19:47 +0000 Received: from DBAEUR03FT007.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:2b1:cafe::15) by DU2PR04CA0238.outlook.office365.com (2603:10a6:10:2b1::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.15 via Frontend Transport; Wed, 12 Oct 2022 15:19:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT007.mail.protection.outlook.com (100.127.142.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.10 via Frontend Transport; Wed, 12 Oct 2022 15:19:47 +0000 Received: ("Tessian outbound 86cf7f935b1b:v128"); Wed, 12 Oct 2022 15:19:47 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3b396379b909cbe1 X-CR-MTA-TID: 64aa7808 Received: from 473b6ecb1fa4.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id DD0EAC17-1E23-47A0-A65B-93B2DED53CEF.1; Wed, 12 Oct 2022 15:19:40 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 473b6ecb1fa4.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 12 Oct 2022 15:19:40 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BfD6cqnk2FNBdTQ7t7MdWnQnD8vSGh732Nrl9dGRQi/nj7FSopX/T5mojTS1OLq1XK05RjU4ygG3Gj+YEWf2OKDknKlnLEZYZoHeTC47lpjrY1betf36S8xdi68gjeJN50IWRSCc1jrjZsucGCxxIGgZkRVGDUC8EANDRnRS4AgQsTXR4aN0tGax6rMhv533iY0YwC5rAzv4/9bgZCH3iKsawlKy0Koi5utTphuxaJWgYuqM8n4hi8O/VxOb7CmgmFnHbYBdvpQhh/yPAkD4tJcJMtVbBF4Qh72dbck7Ve8v9inZsiI3CSGvbF52aAu2eWsG+kzWczum3Wh9T6SKxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=r6S5NdVJzogUpC6SP9OLAhzVtaSwfGHL1P7HXqLcQ+w=; b=hYiWXIwy1l3XUEvnI8eSxE6rifMELaAg86ExhqgfpkPdgsbbp5QrhusQSPltfLQmfVNsE41rseAk1OragcCzC3nSxe7XjH7H3W312Rp8x7Ea30b8vzYqz7GeC43bVX7/qVSrHZHglIECYRzgvpTRp/HuFJY0zsJXEtnOONzuffGL9Iw+E/OQVslDaxNf3tF+gO6BBrdnwYo6wMzAkCooxPFnlEdB6UuD47dEBmS928e58MtDBmxoerGUxdX8OT4Ndu1ErvSi8kC9aJ+vXvzh4v9YIP4X7xIXFTNFGrJTDWxJ8xp4jyjNEJ4ltn5HzEh+Q1CniR42zomzmMbI5duQbA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from AS4PR08MB7901.eurprd08.prod.outlook.com (2603:10a6:20b:51c::16) by DB8PR08MB5514.eurprd08.prod.outlook.com (2603:10a6:10:fa::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.21; Wed, 12 Oct 2022 15:19:36 +0000 Received: from AS4PR08MB7901.eurprd08.prod.outlook.com ([fe80::3991:ebed:c15b:de1e]) by AS4PR08MB7901.eurprd08.prod.outlook.com ([fe80::3991:ebed:c15b:de1e%5]) with mapi id 15.20.5709.015; Wed, 12 Oct 2022 15:19:36 +0000 To: 'GNU C Library' Subject: [PATCH] aarch64: Use memcpy_simd as the default memcpy Thread-Topic: [PATCH] aarch64: Use memcpy_simd as the default memcpy Thread-Index: AQHY3k3KrH6nKvBA1ku53UaRqmMnDQ== Date: Wed, 12 Oct 2022 15:19:36 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: AS4PR08MB7901:EE_|DB8PR08MB5514:EE_|DBAEUR03FT007:EE_|GV2PR08MB9421:EE_ X-MS-Office365-Filtering-Correlation-Id: 97d46221-1e43-4397-5a98-08daac6532e7 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Mvbi5QGgKuT37aowVXUwjKilIjknUUJe0XAkgXPObiW20oMo881yLIKpjvUJvR1sBhFTK136cseHv8aok1jp1rHI6s7Wn4XwDK9tAk49h74JPTPRHTOtH+tv2Mcl3dMr1JkCmEuQrAuiyvvu6HBWf5+iVK9SStoFZPy5UU2M4+Uod7AGQpus1qF7uKTx7ZrILjvxiM/+/4jYTZ6CTOuczuRphBQ9ebwtnxuGr1dYWJm5wsRmUGApULEU5Air1hv9sDvse2jaRq5LOh9CMGaBIVuHsRHOYQDObulek3547dS9PqMJHxnt2LQm+OtcVc02DlSW5EnfJxY8V7Ts3YlMZMcvh7XiR1FeUcmqynWH3wx88Q3jf0kz4gX+0yO1gAfl9U+2V/1L5wDFDDYuaygjIBxXhrgKabm0xh0FVcKOkcqRrXegvH8Q5r18CMqu70iQiPj02I3rs95hS87ysis5qbUAkVrLMwsWitM+ud210y5sUnoIP7JrbdfaPXOut+W90Y5/I5TVLpO4IwkIGLYbQX2yu5PnOspTCJEj/lMNttZ7sGyt+CgV2L9juXzcRiK3smhws/CDiCAwCZtYbNr1KvZMuaRzXg0BTn2ct2DxpQyX97JfOMrRLg4OCVJOowOw53SjTSRhLNcH+/3rdWAk3UXxFN85HKoPXCNRDUW8wgjjfXrR/2fxpS0/HviDOgZGJkKOSS/4bH2LqGFHaJnQHES/yLNodwEZ+D8uO6QI9yoy9KzMQdTmhVPcDdcdOoFJ+eky0hrG++FBtUhp0tCyDjeyPEswFZQW5YnU5qpmtvZUwK/kM5hS6QcrA3dmodM5 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AS4PR08MB7901.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(39860400002)(346002)(366004)(136003)(396003)(376002)(451199015)(38100700002)(122000001)(86362001)(91956017)(7696005)(9686003)(6506007)(64756008)(66446008)(66476007)(66556008)(66946007)(76116006)(8676002)(478600001)(26005)(71200400001)(316002)(6916009)(2906002)(41300700001)(186003)(5660300002)(52536014)(83380400001)(30864003)(8936002)(38070700005)(33656002)(55016003)(2004002)(579004); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR08MB5514 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT007.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: b021fea5-0f27-45e7-54f3-08daac652ca1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: BKDU1GGfizHDseFQaQkhLbJZD/tReekvvjBvzwL1FkeSh8UeshmOhgMnZQdevMpMjJ9Vrnf0KibCrvmM6mxmVB0AQdlK5UrQ3oM8r8aEueDgEpVGYRbTyb73PwTHGOK0zDG1cPcYwd8Vd7ZWUYu3r2OSqRq4DJPYHiCOxntlimylR4fZvx9yYBXS5eGdBCS3cyPKyJHTsitG06ZpERyPPYemVsswZOwm48ksikxYhhkPlpG92JM5op5A4skPFfKw5y9Pti5iqXo62ZhFu8qOzsQVOeHSxD9cKidqQblSOrPdrVTnr/zUp9kMyte9IQlzeMnLfJJpkIOWL68k4EsbgifHYst9epPHTCVRJ5ETU7cY7ds2m7q5CR3KzI7MDfgb2OP9KgRp9zSQ+5MW36v2/CbEsutcyFyu8ck3O4igJAhlADSeHfAYmuKm/z4ySq7cXYt1XKdbh/iICZJoUw3VatmBYqkFLIKUSH4Uv+eoOwhrlJMvr5lUF9qibkOvixr7G808Jm9NazlK+EuGGc2NijljQ3jn4FatGIF8cUBh/QIZ7oH8A7/VKq+5I83JsAR8PG9ZTgOmgWg/kBC6qZ2nB3hft7BHujiDRmvGuKubg1pbTNjdMQi7W1cYFj9JdRjmeGsg3RH4wQzvRZSQkriw3269kKWic96pHqqFNV0ElawM49aBXFWIVWkbUGq4Jkx99UoHIbylw28cfIq+94EIrudl6H729OLIfzjwAbrjuDqjKEhl+fmR8zPnb0LJOtmb3N7TDDPLTpFcy3hOj12E9E7NMsU7lGj8HL24IrU3t2xcle1wSSoYR2BT4aCIkOGw X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(39860400002)(376002)(346002)(136003)(396003)(451199015)(40470700004)(36840700001)(46966006)(316002)(26005)(9686003)(2906002)(70586007)(52536014)(336012)(70206006)(40460700003)(5660300002)(36860700001)(47076005)(33656002)(86362001)(8936002)(41300700001)(40480700001)(55016003)(8676002)(83380400001)(30864003)(186003)(82740400003)(82310400005)(478600001)(6506007)(81166007)(6916009)(7696005)(356005)(2004002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Oct 2022 15:19:47.1887 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 97d46221-1e43-4397-5a98-08daac6532e7 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT007.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB9421 X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Since __memcpy_simd is the fastest memcpy on almost all cores, use it by default if SVE is not available. Passes regress, OK for commit? diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S index 98d4e2c0e202eca13e1fd19ad8046cf61ad280ff..7b396b202fabf01b6ff2adc71a1038148e0b1054 100644 --- a/sysdeps/aarch64/memcpy.S +++ b/sysdeps/aarch64/memcpy.S @@ -1,4 +1,5 @@ -/* Copyright (C) 2012-2022 Free Software Foundation, Inc. +/* Generic optimized memcpy using SIMD. + Copyright (C) 2012-2022 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -20,7 +21,7 @@ /* Assumptions: * - * ARMv8-a, AArch64, unaligned accesses. + * ARMv8-a, AArch64, Advanced SIMD, unaligned accesses. * */ @@ -36,21 +37,18 @@ #define B_l x8 #define B_lw w8 #define B_h x9 -#define C_l x10 #define C_lw w10 -#define C_h x11 -#define D_l x12 -#define D_h x13 -#define E_l x14 -#define E_h x15 -#define F_l x16 -#define F_h x17 -#define G_l count -#define G_h dst -#define H_l src -#define H_h srcend #define tmp1 x14 +#define A_q q0 +#define B_q q1 +#define C_q q2 +#define D_q q3 +#define E_q q4 +#define F_q q5 +#define G_q q6 +#define H_q q7 + #ifndef MEMMOVE # define MEMMOVE memmove #endif @@ -69,10 +67,9 @@ Large copies use a software pipelined loop processing 64 bytes per iteration. The destination pointer is 16-byte aligned to minimize unaligned accesses. The loop tail is handled by always copying 64 bytes - from the end. -*/ + from the end. */ -ENTRY_ALIGN (MEMCPY, 6) +ENTRY (MEMCPY) PTR_ARG (0) PTR_ARG (1) SIZE_ARG (2) @@ -87,10 +84,10 @@ ENTRY_ALIGN (MEMCPY, 6) /* Small copies: 0..32 bytes. */ cmp count, 16 b.lo L(copy16) - ldp A_l, A_h, [src] - ldp D_l, D_h, [srcend, -16] - stp A_l, A_h, [dstin] - stp D_l, D_h, [dstend, -16] + ldr A_q, [src] + ldr B_q, [srcend, -16] + str A_q, [dstin] + str B_q, [dstend, -16] ret /* Copy 8-15 bytes. */ @@ -102,7 +99,6 @@ L(copy16): str A_h, [dstend, -8] ret - .p2align 3 /* Copy 4-7 bytes. */ L(copy8): tbz count, 2, L(copy4) @@ -128,87 +124,69 @@ L(copy0): .p2align 4 /* Medium copies: 33..128 bytes. */ L(copy32_128): - ldp A_l, A_h, [src] - ldp B_l, B_h, [src, 16] - ldp C_l, C_h, [srcend, -32] - ldp D_l, D_h, [srcend, -16] + ldp A_q, B_q, [src] + ldp C_q, D_q, [srcend, -32] cmp count, 64 b.hi L(copy128) - stp A_l, A_h, [dstin] - stp B_l, B_h, [dstin, 16] - stp C_l, C_h, [dstend, -32] - stp D_l, D_h, [dstend, -16] + stp A_q, B_q, [dstin] + stp C_q, D_q, [dstend, -32] ret .p2align 4 /* Copy 65..128 bytes. */ L(copy128): - ldp E_l, E_h, [src, 32] - ldp F_l, F_h, [src, 48] + ldp E_q, F_q, [src, 32] cmp count, 96 b.ls L(copy96) - ldp G_l, G_h, [srcend, -64] - ldp H_l, H_h, [srcend, -48] - stp G_l, G_h, [dstend, -64] - stp H_l, H_h, [dstend, -48] + ldp G_q, H_q, [srcend, -64] + stp G_q, H_q, [dstend, -64] L(copy96): - stp A_l, A_h, [dstin] - stp B_l, B_h, [dstin, 16] - stp E_l, E_h, [dstin, 32] - stp F_l, F_h, [dstin, 48] - stp C_l, C_h, [dstend, -32] - stp D_l, D_h, [dstend, -16] + stp A_q, B_q, [dstin] + stp E_q, F_q, [dstin, 32] + stp C_q, D_q, [dstend, -32] ret - .p2align 4 + /* Align loop64 below to 16 bytes. */ + nop + /* Copy more than 128 bytes. */ L(copy_long): - /* Copy 16 bytes and then align dst to 16-byte alignment. */ - ldp D_l, D_h, [src] - and tmp1, dstin, 15 - bic dst, dstin, 15 - sub src, src, tmp1 + /* Copy 16 bytes and then align src to 16-byte alignment. */ + ldr D_q, [src] + and tmp1, src, 15 + bic src, src, 15 + sub dst, dstin, tmp1 add count, count, tmp1 /* Count is now 16 too large. */ - ldp A_l, A_h, [src, 16] - stp D_l, D_h, [dstin] - ldp B_l, B_h, [src, 32] - ldp C_l, C_h, [src, 48] - ldp D_l, D_h, [src, 64]! + ldp A_q, B_q, [src, 16] + str D_q, [dstin] + ldp C_q, D_q, [src, 48] subs count, count, 128 + 16 /* Test and readjust count. */ b.ls L(copy64_from_end) - L(loop64): - stp A_l, A_h, [dst, 16] - ldp A_l, A_h, [src, 16] - stp B_l, B_h, [dst, 32] - ldp B_l, B_h, [src, 32] - stp C_l, C_h, [dst, 48] - ldp C_l, C_h, [src, 48] - stp D_l, D_h, [dst, 64]! - ldp D_l, D_h, [src, 64]! + stp A_q, B_q, [dst, 16] + ldp A_q, B_q, [src, 80] + stp C_q, D_q, [dst, 48] + ldp C_q, D_q, [src, 112] + add src, src, 64 + add dst, dst, 64 subs count, count, 64 b.hi L(loop64) /* Write the last iteration and copy 64 bytes from the end. */ L(copy64_from_end): - ldp E_l, E_h, [srcend, -64] - stp A_l, A_h, [dst, 16] - ldp A_l, A_h, [srcend, -48] - stp B_l, B_h, [dst, 32] - ldp B_l, B_h, [srcend, -32] - stp C_l, C_h, [dst, 48] - ldp C_l, C_h, [srcend, -16] - stp D_l, D_h, [dst, 64] - stp E_l, E_h, [dstend, -64] - stp A_l, A_h, [dstend, -48] - stp B_l, B_h, [dstend, -32] - stp C_l, C_h, [dstend, -16] + ldp E_q, F_q, [srcend, -64] + stp A_q, B_q, [dst, 16] + ldp A_q, B_q, [srcend, -32] + stp C_q, D_q, [dst, 48] + stp E_q, F_q, [dstend, -64] + stp A_q, B_q, [dstend, -32] ret END (MEMCPY) libc_hidden_builtin_def (MEMCPY) -ENTRY_ALIGN (MEMMOVE, 4) + +ENTRY (MEMMOVE) PTR_ARG (0) PTR_ARG (1) SIZE_ARG (2) @@ -220,64 +198,56 @@ ENTRY_ALIGN (MEMMOVE, 4) cmp count, 32 b.hi L(copy32_128) - /* Small copies: 0..32 bytes. */ + /* Small moves: 0..32 bytes. */ cmp count, 16 b.lo L(copy16) - ldp A_l, A_h, [src] - ldp D_l, D_h, [srcend, -16] - stp A_l, A_h, [dstin] - stp D_l, D_h, [dstend, -16] + ldr A_q, [src] + ldr B_q, [srcend, -16] + str A_q, [dstin] + str B_q, [dstend, -16] ret - .p2align 4 L(move_long): /* Only use backward copy if there is an overlap. */ sub tmp1, dstin, src - cbz tmp1, L(copy0) + cbz tmp1, L(move0) cmp tmp1, count b.hs L(copy_long) /* Large backwards copy for overlapping copies. - Copy 16 bytes and then align dst to 16-byte alignment. */ - ldp D_l, D_h, [srcend, -16] - and tmp1, dstend, 15 - sub srcend, srcend, tmp1 + Copy 16 bytes and then align srcend to 16-byte alignment. */ +L(copy_long_backwards): + ldr D_q, [srcend, -16] + and tmp1, srcend, 15 + bic srcend, srcend, 15 sub count, count, tmp1 - ldp A_l, A_h, [srcend, -16] - stp D_l, D_h, [dstend, -16] - ldp B_l, B_h, [srcend, -32] - ldp C_l, C_h, [srcend, -48] - ldp D_l, D_h, [srcend, -64]! + ldp A_q, B_q, [srcend, -32] + str D_q, [dstend, -16] + ldp C_q, D_q, [srcend, -64] sub dstend, dstend, tmp1 subs count, count, 128 b.ls L(copy64_from_start) L(loop64_backwards): - stp A_l, A_h, [dstend, -16] - ldp A_l, A_h, [srcend, -16] - stp B_l, B_h, [dstend, -32] - ldp B_l, B_h, [srcend, -32] - stp C_l, C_h, [dstend, -48] - ldp C_l, C_h, [srcend, -48] - stp D_l, D_h, [dstend, -64]! - ldp D_l, D_h, [srcend, -64]! + str B_q, [dstend, -16] + str A_q, [dstend, -32] + ldp A_q, B_q, [srcend, -96] + str D_q, [dstend, -48] + str C_q, [dstend, -64]! + ldp C_q, D_q, [srcend, -128] + sub srcend, srcend, 64 subs count, count, 64 b.hi L(loop64_backwards) /* Write the last iteration and copy 64 bytes from the start. */ L(copy64_from_start): - ldp G_l, G_h, [src, 48] - stp A_l, A_h, [dstend, -16] - ldp A_l, A_h, [src, 32] - stp B_l, B_h, [dstend, -32] - ldp B_l, B_h, [src, 16] - stp C_l, C_h, [dstend, -48] - ldp C_l, C_h, [src] - stp D_l, D_h, [dstend, -64] - stp G_l, G_h, [dstin, 48] - stp A_l, A_h, [dstin, 32] - stp B_l, B_h, [dstin, 16] - stp C_l, C_h, [dstin] + ldp E_q, F_q, [src, 32] + stp A_q, B_q, [dstend, -32] + ldp A_q, B_q, [src] + stp C_q, D_q, [dstend, -64] + stp E_q, F_q, [dstin, 32] + stp A_q, B_q, [dstin] +L(move0): ret END (MEMMOVE) diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile index bc5cde8add07b908178fb0271decc27f728f7a2e..7f2d85b0e5acc0a694e91b17fbccc0dba0ea339d 100644 --- a/sysdeps/aarch64/multiarch/Makefile +++ b/sysdeps/aarch64/multiarch/Makefile @@ -3,7 +3,6 @@ sysdep_routines += \ memchr_generic \ memchr_nosimd \ memcpy_a64fx \ - memcpy_advsimd \ memcpy_generic \ memcpy_sve \ memcpy_thunderx \ diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c index 9c2542de38fb109b7c6f1db4aacee3a6b544fa3f..e7c4dcc0ed5a68ecd8dacc06256d0749b76912cb 100644 --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c @@ -36,7 +36,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, memcpy, IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_thunderx) IFUNC_IMPL_ADD (array, i, memcpy, !bti, __memcpy_thunderx2) - IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_simd) #if HAVE_AARCH64_SVE_ASM IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_a64fx) IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_sve) @@ -45,7 +44,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, memmove, IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_thunderx) IFUNC_IMPL_ADD (array, i, memmove, !bti, __memmove_thunderx2) - IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_simd) #if HAVE_AARCH64_SVE_ASM IFUNC_IMPL_ADD (array, i, memmove, sve, __memmove_a64fx) IFUNC_IMPL_ADD (array, i, memmove, sve, __memmove_sve) diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c index 5006b0594a476bcc149f2ae022bea50379d04908..1e08ce852e68409fd0eeb975edab77ebe8da8635 100644 --- a/sysdeps/aarch64/multiarch/memcpy.c +++ b/sysdeps/aarch64/multiarch/memcpy.c @@ -29,7 +29,6 @@ extern __typeof (__redirect_memcpy) __libc_memcpy; extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; -extern __typeof (__redirect_memcpy) __memcpy_simd attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_thunderx2 attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_a64fx attribute_hidden; @@ -40,9 +39,6 @@ select_memcpy_ifunc (void) { INIT_ARCH (); - if (IS_NEOVERSE_N1 (midr) || IS_NEOVERSE_N2 (midr)) - return __memcpy_simd; - if (sve && HAVE_AARCH64_SVE_ASM) { if (IS_A64FX (midr)) diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S deleted file mode 100644 index fe9beaf5ead47268867bee98acad3b17c554656a..0000000000000000000000000000000000000000 --- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S +++ /dev/null @@ -1,248 +0,0 @@ -/* Generic optimized memcpy using SIMD. - Copyright (C) 2020-2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#include - -/* Assumptions: - * - * ARMv8-a, AArch64, Advanced SIMD, unaligned accesses. - * - */ - -#define dstin x0 -#define src x1 -#define count x2 -#define dst x3 -#define srcend x4 -#define dstend x5 -#define A_l x6 -#define A_lw w6 -#define A_h x7 -#define B_l x8 -#define B_lw w8 -#define B_h x9 -#define C_lw w10 -#define tmp1 x14 - -#define A_q q0 -#define B_q q1 -#define C_q q2 -#define D_q q3 -#define E_q q4 -#define F_q q5 -#define G_q q6 -#define H_q q7 - - -/* This implementation supports both memcpy and memmove and shares most code. - It uses unaligned accesses and branchless sequences to keep the code small, - simple and improve performance. - - Copies are split into 3 main cases: small copies of up to 32 bytes, medium - copies of up to 128 bytes, and large copies. The overhead of the overlap - check in memmove is negligible since it is only required for large copies. - - Large copies use a software pipelined loop processing 64 bytes per - iteration. The destination pointer is 16-byte aligned to minimize - unaligned accesses. The loop tail is handled by always copying 64 bytes - from the end. */ - -ENTRY (__memcpy_simd) - PTR_ARG (0) - PTR_ARG (1) - SIZE_ARG (2) - - add srcend, src, count - add dstend, dstin, count - cmp count, 128 - b.hi L(copy_long) - cmp count, 32 - b.hi L(copy32_128) - - /* Small copies: 0..32 bytes. */ - cmp count, 16 - b.lo L(copy16) - ldr A_q, [src] - ldr B_q, [srcend, -16] - str A_q, [dstin] - str B_q, [dstend, -16] - ret - - /* Copy 8-15 bytes. */ -L(copy16): - tbz count, 3, L(copy8) - ldr A_l, [src] - ldr A_h, [srcend, -8] - str A_l, [dstin] - str A_h, [dstend, -8] - ret - - /* Copy 4-7 bytes. */ -L(copy8): - tbz count, 2, L(copy4) - ldr A_lw, [src] - ldr B_lw, [srcend, -4] - str A_lw, [dstin] - str B_lw, [dstend, -4] - ret - - /* Copy 0..3 bytes using a branchless sequence. */ -L(copy4): - cbz count, L(copy0) - lsr tmp1, count, 1 - ldrb A_lw, [src] - ldrb C_lw, [srcend, -1] - ldrb B_lw, [src, tmp1] - strb A_lw, [dstin] - strb B_lw, [dstin, tmp1] - strb C_lw, [dstend, -1] -L(copy0): - ret - - .p2align 4 - /* Medium copies: 33..128 bytes. */ -L(copy32_128): - ldp A_q, B_q, [src] - ldp C_q, D_q, [srcend, -32] - cmp count, 64 - b.hi L(copy128) - stp A_q, B_q, [dstin] - stp C_q, D_q, [dstend, -32] - ret - - .p2align 4 - /* Copy 65..128 bytes. */ -L(copy128): - ldp E_q, F_q, [src, 32] - cmp count, 96 - b.ls L(copy96) - ldp G_q, H_q, [srcend, -64] - stp G_q, H_q, [dstend, -64] -L(copy96): - stp A_q, B_q, [dstin] - stp E_q, F_q, [dstin, 32] - stp C_q, D_q, [dstend, -32] - ret - - /* Align loop64 below to 16 bytes. */ - nop - - /* Copy more than 128 bytes. */ -L(copy_long): - /* Copy 16 bytes and then align src to 16-byte alignment. */ - ldr D_q, [src] - and tmp1, src, 15 - bic src, src, 15 - sub dst, dstin, tmp1 - add count, count, tmp1 /* Count is now 16 too large. */ - ldp A_q, B_q, [src, 16] - str D_q, [dstin] - ldp C_q, D_q, [src, 48] - subs count, count, 128 + 16 /* Test and readjust count. */ - b.ls L(copy64_from_end) -L(loop64): - stp A_q, B_q, [dst, 16] - ldp A_q, B_q, [src, 80] - stp C_q, D_q, [dst, 48] - ldp C_q, D_q, [src, 112] - add src, src, 64 - add dst, dst, 64 - subs count, count, 64 - b.hi L(loop64) - - /* Write the last iteration and copy 64 bytes from the end. */ -L(copy64_from_end): - ldp E_q, F_q, [srcend, -64] - stp A_q, B_q, [dst, 16] - ldp A_q, B_q, [srcend, -32] - stp C_q, D_q, [dst, 48] - stp E_q, F_q, [dstend, -64] - stp A_q, B_q, [dstend, -32] - ret - -END (__memcpy_simd) -libc_hidden_builtin_def (__memcpy_simd) - - -ENTRY (__memmove_simd) - PTR_ARG (0) - PTR_ARG (1) - SIZE_ARG (2) - - add srcend, src, count - add dstend, dstin, count - cmp count, 128 - b.hi L(move_long) - cmp count, 32 - b.hi L(copy32_128) - - /* Small moves: 0..32 bytes. */ - cmp count, 16 - b.lo L(copy16) - ldr A_q, [src] - ldr B_q, [srcend, -16] - str A_q, [dstin] - str B_q, [dstend, -16] - ret - -L(move_long): - /* Only use backward copy if there is an overlap. */ - sub tmp1, dstin, src - cbz tmp1, L(move0) - cmp tmp1, count - b.hs L(copy_long) - - /* Large backwards copy for overlapping copies. - Copy 16 bytes and then align srcend to 16-byte alignment. */ -L(copy_long_backwards): - ldr D_q, [srcend, -16] - and tmp1, srcend, 15 - bic srcend, srcend, 15 - sub count, count, tmp1 - ldp A_q, B_q, [srcend, -32] - str D_q, [dstend, -16] - ldp C_q, D_q, [srcend, -64] - sub dstend, dstend, tmp1 - subs count, count, 128 - b.ls L(copy64_from_start) - -L(loop64_backwards): - str B_q, [dstend, -16] - str A_q, [dstend, -32] - ldp A_q, B_q, [srcend, -96] - str D_q, [dstend, -48] - str C_q, [dstend, -64]! - ldp C_q, D_q, [srcend, -128] - sub srcend, srcend, 64 - subs count, count, 64 - b.hi L(loop64_backwards) - - /* Write the last iteration and copy 64 bytes from the start. */ -L(copy64_from_start): - ldp E_q, F_q, [src, 32] - stp A_q, B_q, [dstend, -32] - ldp A_q, B_q, [src] - stp C_q, D_q, [dstend, -64] - stp E_q, F_q, [dstin, 32] - stp A_q, B_q, [dstin] -L(move0): - ret - -END (__memmove_simd) -libc_hidden_builtin_def (__memmove_simd) diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c index 7dae8b7c956f9083d0896cc771cae79f4901581d..dbf1536525e614f72d3d74bb193015b303618357 100644 --- a/sysdeps/aarch64/multiarch/memmove.c +++ b/sysdeps/aarch64/multiarch/memmove.c @@ -29,7 +29,6 @@ extern __typeof (__redirect_memmove) __libc_memmove; extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden; -extern __typeof (__redirect_memmove) __memmove_simd attribute_hidden; extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden; extern __typeof (__redirect_memmove) __memmove_thunderx2 attribute_hidden; extern __typeof (__redirect_memmove) __memmove_a64fx attribute_hidden; @@ -40,9 +39,6 @@ select_memmove_ifunc (void) { INIT_ARCH (); - if (IS_NEOVERSE_N1 (midr) || IS_NEOVERSE_N2 (midr)) - return __memmove_simd; - if (sve && HAVE_AARCH64_SVE_ASM) { if (IS_A64FX (midr))