From patchwork Wed Oct 12 15:19:36 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
X-Patchwork-Id: 58712
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3C88C3860C3C
	for <patchwork@sourceware.org>; Wed, 12 Oct 2022 15:20:23 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3C88C3860C3C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1665588023;
	bh=r6S5NdVJzogUpC6SP9OLAhzVtaSwfGHL1P7HXqLcQ+w=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=QL8VO82vQYDA2/EN90dLEsfkcwXIQRvW1R7FmFML2MUMyU/3QZs83G4oN9XfykUiM
	 kv8iYZ5NE2iqMOnTXaU7UMX1LtKEgg/YNIRhBoY7iGRm2b017ZDkw0FxGBgtfSXSxz
	 2xwl56jZneDba8imjiq0VW3V8R3W2+0o149fUTPM=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from EUR02-VE1-obe.outbound.protection.outlook.com
 (mail-eopbgr20048.outbound.protection.outlook.com [40.107.2.48])
 by sourceware.org (Postfix) with ESMTPS id 8D2C93857372
 for <libc-alpha@sourceware.org>; Wed, 12 Oct 2022 15:19:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8D2C93857372
ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass;
 b=mrZ76uELbX+2/xTGPZwg+iYhoDFHhOY+Xm2ZUBdR7Xdx/GVx1LiVwVKrYNnAeYbzysVOfQ5WEZQziysSbncPSR/QQLduYXrb5z0wpKRoRLrW6BEYGMl+aozqZsrooZsWjwKkrMkrcnoZBSu4hhyQeWuq/nAo0zQKkM9661MbGVcoumkjue9XkoNqf/mmPUqv0VKv+rcnaU+5RXPLmZboyv8+ZTn4U06zw6LrRigbVt0BT46SRYVm/KmDxUtpGwzFNs8QqFnq05GMEhsQ7y2e6ZtCopXPbDWBs1wzY+heGcigPtH3m8a1B5/9PYSLzfzwGee/WQuQwib9LZGnYNt39A==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=r6S5NdVJzogUpC6SP9OLAhzVtaSwfGHL1P7HXqLcQ+w=;
 b=Wue68+kWFbJlObEPq3BApTGWesFnhxyUg0hWfcEqTBYjjRdRaBqxZNCEkaPmNSRRK7XH6WA3n4cTVLPXOxkHqyJViqBAuMwTINGlfcBbo6TIUP8MehisDCsJ/uUCZwE83Hu9/OpJgaRnOGRJSLhAX5XthVI4a6nYcXhNwIsH9RQzqOb7hH1UtTGW86Il7E5Zz4pfdi+/UD+GjsQjnICn515UlIk+cp3sQ8X4t3zlCP/ngyKbqHhC2qvmiqzRUPseEpJtwdtpZemMb6i8YiIjlzyxiu+v0oZR5THwhPEY3pDSmDiELsiu1I1lORvhUEWckf8Yj+oI9iHw4eFKAuCWog==
ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is
 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com;
 dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com;
 dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0
 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com]
 dmarc=[1,1,header.from=arm.com])
Received: from DU2PR04CA0238.eurprd04.prod.outlook.com (2603:10a6:10:2b1::33)
 by GV2PR08MB9421.eurprd08.prod.outlook.com (2603:10a6:150:dd::17)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.21; Wed, 12 Oct
 2022 15:19:47 +0000
Received: from DBAEUR03FT007.eop-EUR03.prod.protection.outlook.com
 (2603:10a6:10:2b1:cafe::15) by DU2PR04CA0238.outlook.office365.com
 (2603:10a6:10:2b1::33) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.15 via Frontend
 Transport; Wed, 12 Oct 2022 15:19:47 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
 pr=C
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 DBAEUR03FT007.mail.protection.outlook.com (100.127.142.161) with
 Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.5709.10 via Frontend Transport; Wed, 12 Oct 2022 15:19:47 +0000
Received: ("Tessian outbound 86cf7f935b1b:v128");
 Wed, 12 Oct 2022 15:19:47 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: 3b396379b909cbe1
X-CR-MTA-TID: 64aa7808
Received: from 473b6ecb1fa4.1
 by 64aa7808-outbound-1.mta.getcheckrecipient.com id
 DD0EAC17-1E23-47A0-A65B-93B2DED53CEF.1;
 Wed, 12 Oct 2022 15:19:40 +0000
Received: from EUR05-VI1-obe.outbound.protection.outlook.com
 by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id
 473b6ecb1fa4.1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
 Wed, 12 Oct 2022 15:19:40 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=BfD6cqnk2FNBdTQ7t7MdWnQnD8vSGh732Nrl9dGRQi/nj7FSopX/T5mojTS1OLq1XK05RjU4ygG3Gj+YEWf2OKDknKlnLEZYZoHeTC47lpjrY1betf36S8xdi68gjeJN50IWRSCc1jrjZsucGCxxIGgZkRVGDUC8EANDRnRS4AgQsTXR4aN0tGax6rMhv533iY0YwC5rAzv4/9bgZCH3iKsawlKy0Koi5utTphuxaJWgYuqM8n4hi8O/VxOb7CmgmFnHbYBdvpQhh/yPAkD4tJcJMtVbBF4Qh72dbck7Ve8v9inZsiI3CSGvbF52aAu2eWsG+kzWczum3Wh9T6SKxg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=r6S5NdVJzogUpC6SP9OLAhzVtaSwfGHL1P7HXqLcQ+w=;
 b=hYiWXIwy1l3XUEvnI8eSxE6rifMELaAg86ExhqgfpkPdgsbbp5QrhusQSPltfLQmfVNsE41rseAk1OragcCzC3nSxe7XjH7H3W312Rp8x7Ea30b8vzYqz7GeC43bVX7/qVSrHZHglIECYRzgvpTRp/HuFJY0zsJXEtnOONzuffGL9Iw+E/OQVslDaxNf3tF+gO6BBrdnwYo6wMzAkCooxPFnlEdB6UuD47dEBmS928e58MtDBmxoerGUxdX8OT4Ndu1ErvSi8kC9aJ+vXvzh4v9YIP4X7xIXFTNFGrJTDWxJ8xp4jyjNEJ4ltn5HzEh+Q1CniR42zomzmMbI5duQbA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
Received: from AS4PR08MB7901.eurprd08.prod.outlook.com (2603:10a6:20b:51c::16)
 by DB8PR08MB5514.eurprd08.prod.outlook.com (2603:10a6:10:fa::9) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.21; Wed, 12 Oct
 2022 15:19:36 +0000
Received: from AS4PR08MB7901.eurprd08.prod.outlook.com
 ([fe80::3991:ebed:c15b:de1e]) by AS4PR08MB7901.eurprd08.prod.outlook.com
 ([fe80::3991:ebed:c15b:de1e%5]) with mapi id 15.20.5709.015; Wed, 12 Oct 2022
 15:19:36 +0000
To: 'GNU C Library' <libc-alpha@sourceware.org>
Subject: [PATCH] aarch64: Use memcpy_simd as the default memcpy
Thread-Topic: [PATCH] aarch64: Use memcpy_simd as the default memcpy
Thread-Index: AQHY3k3KrH6nKvBA1ku53UaRqmMnDQ==
Date: Wed, 12 Oct 2022 15:19:36 +0000
Message-ID: 
 <AS4PR08MB79015B70DC5F74EB452222B183229@AS4PR08MB7901.eurprd08.prod.outlook.com>
Accept-Language: en-GB, en-US
Content-Language: en-GB
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
msip_labels: 
Authentication-Results-Original: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=arm.com;
x-ms-traffictypediagnostic: 
 AS4PR08MB7901:EE_|DB8PR08MB5514:EE_|DBAEUR03FT007:EE_|GV2PR08MB9421:EE_
X-MS-Office365-Filtering-Correlation-Id: 97d46221-1e43-4397-5a98-08daac6532e7
x-checkrecipientrouted: true
nodisclaimer: true
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: 
 Mvbi5QGgKuT37aowVXUwjKilIjknUUJe0XAkgXPObiW20oMo881yLIKpjvUJvR1sBhFTK136cseHv8aok1jp1rHI6s7Wn4XwDK9tAk49h74JPTPRHTOtH+tv2Mcl3dMr1JkCmEuQrAuiyvvu6HBWf5+iVK9SStoFZPy5UU2M4+Uod7AGQpus1qF7uKTx7ZrILjvxiM/+/4jYTZ6CTOuczuRphBQ9ebwtnxuGr1dYWJm5wsRmUGApULEU5Air1hv9sDvse2jaRq5LOh9CMGaBIVuHsRHOYQDObulek3547dS9PqMJHxnt2LQm+OtcVc02DlSW5EnfJxY8V7Ts3YlMZMcvh7XiR1FeUcmqynWH3wx88Q3jf0kz4gX+0yO1gAfl9U+2V/1L5wDFDDYuaygjIBxXhrgKabm0xh0FVcKOkcqRrXegvH8Q5r18CMqu70iQiPj02I3rs95hS87ysis5qbUAkVrLMwsWitM+ud210y5sUnoIP7JrbdfaPXOut+W90Y5/I5TVLpO4IwkIGLYbQX2yu5PnOspTCJEj/lMNttZ7sGyt+CgV2L9juXzcRiK3smhws/CDiCAwCZtYbNr1KvZMuaRzXg0BTn2ct2DxpQyX97JfOMrRLg4OCVJOowOw53SjTSRhLNcH+/3rdWAk3UXxFN85HKoPXCNRDUW8wgjjfXrR/2fxpS0/HviDOgZGJkKOSS/4bH2LqGFHaJnQHES/yLNodwEZ+D8uO6QI9yoy9KzMQdTmhVPcDdcdOoFJ+eky0hrG++FBtUhp0tCyDjeyPEswFZQW5YnU5qpmtvZUwK/kM5hS6QcrA3dmodM5
X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en;
 SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AS4PR08MB7901.eurprd08.prod.outlook.com;
 PTR:; CAT:NONE;
 SFS:(13230022)(4636009)(39860400002)(346002)(366004)(136003)(396003)(376002)(451199015)(38100700002)(122000001)(86362001)(91956017)(7696005)(9686003)(6506007)(64756008)(66446008)(66476007)(66556008)(66946007)(76116006)(8676002)(478600001)(26005)(71200400001)(316002)(6916009)(2906002)(41300700001)(186003)(5660300002)(52536014)(83380400001)(30864003)(8936002)(38070700005)(33656002)(55016003)(2004002)(579004);
 DIR:OUT; SFP:1101;
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR08MB5514
Original-Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped: 
 DBAEUR03FT007.eop-EUR03.prod.protection.outlook.com
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id-Prvs: 
 b021fea5-0f27-45e7-54f3-08daac652ca1
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: 
 BKDU1GGfizHDseFQaQkhLbJZD/tReekvvjBvzwL1FkeSh8UeshmOhgMnZQdevMpMjJ9Vrnf0KibCrvmM6mxmVB0AQdlK5UrQ3oM8r8aEueDgEpVGYRbTyb73PwTHGOK0zDG1cPcYwd8Vd7ZWUYu3r2OSqRq4DJPYHiCOxntlimylR4fZvx9yYBXS5eGdBCS3cyPKyJHTsitG06ZpERyPPYemVsswZOwm48ksikxYhhkPlpG92JM5op5A4skPFfKw5y9Pti5iqXo62ZhFu8qOzsQVOeHSxD9cKidqQblSOrPdrVTnr/zUp9kMyte9IQlzeMnLfJJpkIOWL68k4EsbgifHYst9epPHTCVRJ5ETU7cY7ds2m7q5CR3KzI7MDfgb2OP9KgRp9zSQ+5MW36v2/CbEsutcyFyu8ck3O4igJAhlADSeHfAYmuKm/z4ySq7cXYt1XKdbh/iICZJoUw3VatmBYqkFLIKUSH4Uv+eoOwhrlJMvr5lUF9qibkOvixr7G808Jm9NazlK+EuGGc2NijljQ3jn4FatGIF8cUBh/QIZ7oH8A7/VKq+5I83JsAR8PG9ZTgOmgWg/kBC6qZ2nB3hft7BHujiDRmvGuKubg1pbTNjdMQi7W1cYFj9JdRjmeGsg3RH4wQzvRZSQkriw3269kKWic96pHqqFNV0ElawM49aBXFWIVWkbUGq4Jkx99UoHIbylw28cfIq+94EIrudl6H729OLIfzjwAbrjuDqjKEhl+fmR8zPnb0LJOtmb3N7TDDPLTpFcy3hOj12E9E7NMsU7lGj8HL24IrU3t2xcle1wSSoYR2BT4aCIkOGw
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:;
 IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com;
 PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE;
 SFS:(13230022)(4636009)(39860400002)(376002)(346002)(136003)(396003)(451199015)(40470700004)(36840700001)(46966006)(316002)(26005)(9686003)(2906002)(70586007)(52536014)(336012)(70206006)(40460700003)(5660300002)(36860700001)(47076005)(33656002)(86362001)(8936002)(41300700001)(40480700001)(55016003)(8676002)(83380400001)(30864003)(186003)(82740400003)(82310400005)(478600001)(6506007)(81166007)(6916009)(7696005)(356005)(2004002);
 DIR:OUT; SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Oct 2022 15:19:47.1887 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 97d46221-1e43-4397-5a98-08daac6532e7
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
 Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 DBAEUR03FT007.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB9421
X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_NONE, TXREP,
 UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha
 <libc-alpha@sourceware.org>
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reply-To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

Since __memcpy_simd is the fastest memcpy on almost all cores, use it by default
if SVE is not available.

Passes regress, OK for commit?

diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
index 98d4e2c0e202eca13e1fd19ad8046cf61ad280ff..7b396b202fabf01b6ff2adc71a1038148e0b1054 100644
--- a/sysdeps/aarch64/memcpy.S
+++ b/sysdeps/aarch64/memcpy.S
@@ -1,4 +1,5 @@
-/* Copyright (C) 2012-2022 Free Software Foundation, Inc.
+/* Generic optimized memcpy using SIMD.
+   Copyright (C) 2012-2022 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
 
@@ -20,7 +21,7 @@
 
 /* Assumptions:
  *
- * ARMv8-a, AArch64, unaligned accesses.
+ * ARMv8-a, AArch64, Advanced SIMD, unaligned accesses.
  *
  */
 
@@ -36,21 +37,18 @@
 #define B_l	x8
 #define B_lw	w8
 #define B_h	x9
-#define C_l	x10
 #define C_lw	w10
-#define C_h	x11
-#define D_l	x12
-#define D_h	x13
-#define E_l	x14
-#define E_h	x15
-#define F_l	x16
-#define F_h	x17
-#define G_l	count
-#define G_h	dst
-#define H_l	src
-#define H_h	srcend
 #define tmp1	x14
 
+#define A_q	q0
+#define B_q	q1
+#define C_q	q2
+#define D_q	q3
+#define E_q	q4
+#define F_q	q5
+#define G_q	q6
+#define H_q	q7
+
 #ifndef MEMMOVE
 # define MEMMOVE memmove
 #endif
@@ -69,10 +67,9 @@
    Large copies use a software pipelined loop processing 64 bytes per
    iteration.  The destination pointer is 16-byte aligned to minimize
    unaligned accesses.  The loop tail is handled by always copying 64 bytes
-   from the end.
-*/
+   from the end.  */
 
-ENTRY_ALIGN (MEMCPY, 6)
+ENTRY (MEMCPY)
 	PTR_ARG (0)
 	PTR_ARG (1)
 	SIZE_ARG (2)
@@ -87,10 +84,10 @@ ENTRY_ALIGN (MEMCPY, 6)
 	/* Small copies: 0..32 bytes.  */
 	cmp	count, 16
 	b.lo	L(copy16)
-	ldp	A_l, A_h, [src]
-	ldp	D_l, D_h, [srcend, -16]
-	stp	A_l, A_h, [dstin]
-	stp	D_l, D_h, [dstend, -16]
+	ldr	A_q, [src]
+	ldr	B_q, [srcend, -16]
+	str	A_q, [dstin]
+	str	B_q, [dstend, -16]
 	ret
 
 	/* Copy 8-15 bytes.  */
@@ -102,7 +99,6 @@ L(copy16):
 	str	A_h, [dstend, -8]
 	ret
 
-	.p2align 3
 	/* Copy 4-7 bytes.  */
 L(copy8):
 	tbz	count, 2, L(copy4)
@@ -128,87 +124,69 @@ L(copy0):
 	.p2align 4
 	/* Medium copies: 33..128 bytes.  */
 L(copy32_128):
-	ldp	A_l, A_h, [src]
-	ldp	B_l, B_h, [src, 16]
-	ldp	C_l, C_h, [srcend, -32]
-	ldp	D_l, D_h, [srcend, -16]
+	ldp	A_q, B_q, [src]
+	ldp	C_q, D_q, [srcend, -32]
 	cmp	count, 64
 	b.hi	L(copy128)
-	stp	A_l, A_h, [dstin]
-	stp	B_l, B_h, [dstin, 16]
-	stp	C_l, C_h, [dstend, -32]
-	stp	D_l, D_h, [dstend, -16]
+	stp	A_q, B_q, [dstin]
+	stp	C_q, D_q, [dstend, -32]
 	ret
 
 	.p2align 4
 	/* Copy 65..128 bytes.  */
 L(copy128):
-	ldp	E_l, E_h, [src, 32]
-	ldp	F_l, F_h, [src, 48]
+	ldp	E_q, F_q, [src, 32]
 	cmp	count, 96
 	b.ls	L(copy96)
-	ldp	G_l, G_h, [srcend, -64]
-	ldp	H_l, H_h, [srcend, -48]
-	stp	G_l, G_h, [dstend, -64]
-	stp	H_l, H_h, [dstend, -48]
+	ldp	G_q, H_q, [srcend, -64]
+	stp	G_q, H_q, [dstend, -64]
 L(copy96):
-	stp	A_l, A_h, [dstin]
-	stp	B_l, B_h, [dstin, 16]
-	stp	E_l, E_h, [dstin, 32]
-	stp	F_l, F_h, [dstin, 48]
-	stp	C_l, C_h, [dstend, -32]
-	stp	D_l, D_h, [dstend, -16]
+	stp	A_q, B_q, [dstin]
+	stp	E_q, F_q, [dstin, 32]
+	stp	C_q, D_q, [dstend, -32]
 	ret
 
-	.p2align 4
+	/* Align loop64 below to 16 bytes.  */
+	nop
+
 	/* Copy more than 128 bytes.  */
 L(copy_long):
-	/* Copy 16 bytes and then align dst to 16-byte alignment.  */
-	ldp	D_l, D_h, [src]
-	and	tmp1, dstin, 15
-	bic	dst, dstin, 15
-	sub	src, src, tmp1
+	/* Copy 16 bytes and then align src to 16-byte alignment.  */
+	ldr	D_q, [src]
+	and	tmp1, src, 15
+	bic	src, src, 15
+	sub	dst, dstin, tmp1
 	add	count, count, tmp1	/* Count is now 16 too large.  */
-	ldp	A_l, A_h, [src, 16]
-	stp	D_l, D_h, [dstin]
-	ldp	B_l, B_h, [src, 32]
-	ldp	C_l, C_h, [src, 48]
-	ldp	D_l, D_h, [src, 64]!
+	ldp	A_q, B_q, [src, 16]
+	str	D_q, [dstin]
+	ldp	C_q, D_q, [src, 48]
 	subs	count, count, 128 + 16	/* Test and readjust count.  */
 	b.ls	L(copy64_from_end)
-
 L(loop64):
-	stp	A_l, A_h, [dst, 16]
-	ldp	A_l, A_h, [src, 16]
-	stp	B_l, B_h, [dst, 32]
-	ldp	B_l, B_h, [src, 32]
-	stp	C_l, C_h, [dst, 48]
-	ldp	C_l, C_h, [src, 48]
-	stp	D_l, D_h, [dst, 64]!
-	ldp	D_l, D_h, [src, 64]!
+	stp	A_q, B_q, [dst, 16]
+	ldp	A_q, B_q, [src, 80]
+	stp	C_q, D_q, [dst, 48]
+	ldp	C_q, D_q, [src, 112]
+	add	src, src, 64
+	add	dst, dst, 64
 	subs	count, count, 64
 	b.hi	L(loop64)
 
 	/* Write the last iteration and copy 64 bytes from the end.  */
 L(copy64_from_end):
-	ldp	E_l, E_h, [srcend, -64]
-	stp	A_l, A_h, [dst, 16]
-	ldp	A_l, A_h, [srcend, -48]
-	stp	B_l, B_h, [dst, 32]
-	ldp	B_l, B_h, [srcend, -32]
-	stp	C_l, C_h, [dst, 48]
-	ldp	C_l, C_h, [srcend, -16]
-	stp	D_l, D_h, [dst, 64]
-	stp	E_l, E_h, [dstend, -64]
-	stp	A_l, A_h, [dstend, -48]
-	stp	B_l, B_h, [dstend, -32]
-	stp	C_l, C_h, [dstend, -16]
+	ldp	E_q, F_q, [srcend, -64]
+	stp	A_q, B_q, [dst, 16]
+	ldp	A_q, B_q, [srcend, -32]
+	stp	C_q, D_q, [dst, 48]
+	stp	E_q, F_q, [dstend, -64]
+	stp	A_q, B_q, [dstend, -32]
 	ret
 
 END (MEMCPY)
 libc_hidden_builtin_def (MEMCPY)
 
-ENTRY_ALIGN (MEMMOVE, 4)
+
+ENTRY (MEMMOVE)
 	PTR_ARG (0)
 	PTR_ARG (1)
 	SIZE_ARG (2)
@@ -220,64 +198,56 @@ ENTRY_ALIGN (MEMMOVE, 4)
 	cmp	count, 32
 	b.hi	L(copy32_128)
 
-	/* Small copies: 0..32 bytes.  */
+	/* Small moves: 0..32 bytes.  */
 	cmp	count, 16
 	b.lo	L(copy16)
-	ldp	A_l, A_h, [src]
-	ldp	D_l, D_h, [srcend, -16]
-	stp	A_l, A_h, [dstin]
-	stp	D_l, D_h, [dstend, -16]
+	ldr	A_q, [src]
+	ldr	B_q, [srcend, -16]
+	str	A_q, [dstin]
+	str	B_q, [dstend, -16]
 	ret
 
-	.p2align 4
 L(move_long):
 	/* Only use backward copy if there is an overlap.  */
 	sub	tmp1, dstin, src
-	cbz	tmp1, L(copy0)
+	cbz	tmp1, L(move0)
 	cmp	tmp1, count
 	b.hs	L(copy_long)
 
 	/* Large backwards copy for overlapping copies.
-	   Copy 16 bytes and then align dst to 16-byte alignment.  */
-	ldp	D_l, D_h, [srcend, -16]
-	and	tmp1, dstend, 15
-	sub	srcend, srcend, tmp1
+	   Copy 16 bytes and then align srcend to 16-byte alignment.  */
+L(copy_long_backwards):
+	ldr	D_q, [srcend, -16]
+	and	tmp1, srcend, 15
+	bic	srcend, srcend, 15
 	sub	count, count, tmp1
-	ldp	A_l, A_h, [srcend, -16]
-	stp	D_l, D_h, [dstend, -16]
-	ldp	B_l, B_h, [srcend, -32]
-	ldp	C_l, C_h, [srcend, -48]
-	ldp	D_l, D_h, [srcend, -64]!
+	ldp	A_q, B_q, [srcend, -32]
+	str	D_q, [dstend, -16]
+	ldp	C_q, D_q, [srcend, -64]
 	sub	dstend, dstend, tmp1
 	subs	count, count, 128
 	b.ls	L(copy64_from_start)
 
 L(loop64_backwards):
-	stp	A_l, A_h, [dstend, -16]
-	ldp	A_l, A_h, [srcend, -16]
-	stp	B_l, B_h, [dstend, -32]
-	ldp	B_l, B_h, [srcend, -32]
-	stp	C_l, C_h, [dstend, -48]
-	ldp	C_l, C_h, [srcend, -48]
-	stp	D_l, D_h, [dstend, -64]!
-	ldp	D_l, D_h, [srcend, -64]!
+	str	B_q, [dstend, -16]
+	str	A_q, [dstend, -32]
+	ldp	A_q, B_q, [srcend, -96]
+	str	D_q, [dstend, -48]
+	str	C_q, [dstend, -64]!
+	ldp	C_q, D_q, [srcend, -128]
+	sub	srcend, srcend, 64
 	subs	count, count, 64
 	b.hi	L(loop64_backwards)
 
 	/* Write the last iteration and copy 64 bytes from the start.  */
 L(copy64_from_start):
-	ldp	G_l, G_h, [src, 48]
-	stp	A_l, A_h, [dstend, -16]
-	ldp	A_l, A_h, [src, 32]
-	stp	B_l, B_h, [dstend, -32]
-	ldp	B_l, B_h, [src, 16]
-	stp	C_l, C_h, [dstend, -48]
-	ldp	C_l, C_h, [src]
-	stp	D_l, D_h, [dstend, -64]
-	stp	G_l, G_h, [dstin, 48]
-	stp	A_l, A_h, [dstin, 32]
-	stp	B_l, B_h, [dstin, 16]
-	stp	C_l, C_h, [dstin]
+	ldp	E_q, F_q, [src, 32]
+	stp	A_q, B_q, [dstend, -32]
+	ldp	A_q, B_q, [src]
+	stp	C_q, D_q, [dstend, -64]
+	stp	E_q, F_q, [dstin, 32]
+	stp	A_q, B_q, [dstin]
+L(move0):
 	ret
 
 END (MEMMOVE)
diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
index bc5cde8add07b908178fb0271decc27f728f7a2e..7f2d85b0e5acc0a694e91b17fbccc0dba0ea339d 100644
--- a/sysdeps/aarch64/multiarch/Makefile
+++ b/sysdeps/aarch64/multiarch/Makefile
@@ -3,7 +3,6 @@ sysdep_routines += \
   memchr_generic \
   memchr_nosimd \
   memcpy_a64fx \
-  memcpy_advsimd \
   memcpy_generic \
   memcpy_sve \
   memcpy_thunderx \
diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
index 9c2542de38fb109b7c6f1db4aacee3a6b544fa3f..e7c4dcc0ed5a68ecd8dacc06256d0749b76912cb 100644
--- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
@@ -36,7 +36,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, memcpy,
 	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_thunderx)
 	      IFUNC_IMPL_ADD (array, i, memcpy, !bti, __memcpy_thunderx2)
-	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_simd)
 #if HAVE_AARCH64_SVE_ASM
 	      IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_a64fx)
 	      IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_sve)
@@ -45,7 +44,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, memmove,
 	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_thunderx)
 	      IFUNC_IMPL_ADD (array, i, memmove, !bti, __memmove_thunderx2)
-	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_simd)
 #if HAVE_AARCH64_SVE_ASM
 	      IFUNC_IMPL_ADD (array, i, memmove, sve, __memmove_a64fx)
 	      IFUNC_IMPL_ADD (array, i, memmove, sve, __memmove_sve)
diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
index 5006b0594a476bcc149f2ae022bea50379d04908..1e08ce852e68409fd0eeb975edab77ebe8da8635 100644
--- a/sysdeps/aarch64/multiarch/memcpy.c
+++ b/sysdeps/aarch64/multiarch/memcpy.c
@@ -29,7 +29,6 @@
 extern __typeof (__redirect_memcpy) __libc_memcpy;
 
 extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
-extern __typeof (__redirect_memcpy) __memcpy_simd attribute_hidden;
 extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
 extern __typeof (__redirect_memcpy) __memcpy_thunderx2 attribute_hidden;
 extern __typeof (__redirect_memcpy) __memcpy_a64fx attribute_hidden;
@@ -40,9 +39,6 @@ select_memcpy_ifunc (void)
 {
   INIT_ARCH ();
 
-  if (IS_NEOVERSE_N1 (midr) || IS_NEOVERSE_N2 (midr))
-    return __memcpy_simd;
-
   if (sve && HAVE_AARCH64_SVE_ASM)
     {
       if (IS_A64FX (midr))
diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S
deleted file mode 100644
index fe9beaf5ead47268867bee98acad3b17c554656a..0000000000000000000000000000000000000000
--- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S
+++ /dev/null
@@ -1,248 +0,0 @@
-/* Generic optimized memcpy using SIMD.
-   Copyright (C) 2020-2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library.  If not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-/* Assumptions:
- *
- * ARMv8-a, AArch64, Advanced SIMD, unaligned accesses.
- *
- */
-
-#define dstin	x0
-#define src	x1
-#define count	x2
-#define dst	x3
-#define srcend	x4
-#define dstend	x5
-#define A_l	x6
-#define A_lw	w6
-#define A_h	x7
-#define B_l	x8
-#define B_lw	w8
-#define B_h	x9
-#define C_lw	w10
-#define tmp1	x14
-
-#define A_q	q0
-#define B_q	q1
-#define C_q	q2
-#define D_q	q3
-#define E_q	q4
-#define F_q	q5
-#define G_q	q6
-#define H_q	q7
-
-
-/* This implementation supports both memcpy and memmove and shares most code.
-   It uses unaligned accesses and branchless sequences to keep the code small,
-   simple and improve performance.
-
-   Copies are split into 3 main cases: small copies of up to 32 bytes, medium
-   copies of up to 128 bytes, and large copies.  The overhead of the overlap
-   check in memmove is negligible since it is only required for large copies.
-
-   Large copies use a software pipelined loop processing 64 bytes per
-   iteration.  The destination pointer is 16-byte aligned to minimize
-   unaligned accesses.  The loop tail is handled by always copying 64 bytes
-   from the end.  */
-
-ENTRY (__memcpy_simd)
-	PTR_ARG (0)
-	PTR_ARG (1)
-	SIZE_ARG (2)
-
-	add	srcend, src, count
-	add	dstend, dstin, count
-	cmp	count, 128
-	b.hi	L(copy_long)
-	cmp	count, 32
-	b.hi	L(copy32_128)
-
-	/* Small copies: 0..32 bytes.  */
-	cmp	count, 16
-	b.lo	L(copy16)
-	ldr	A_q, [src]
-	ldr	B_q, [srcend, -16]
-	str	A_q, [dstin]
-	str	B_q, [dstend, -16]
-	ret
-
-	/* Copy 8-15 bytes.  */
-L(copy16):
-	tbz	count, 3, L(copy8)
-	ldr	A_l, [src]
-	ldr	A_h, [srcend, -8]
-	str	A_l, [dstin]
-	str	A_h, [dstend, -8]
-	ret
-
-	/* Copy 4-7 bytes.  */
-L(copy8):
-	tbz	count, 2, L(copy4)
-	ldr	A_lw, [src]
-	ldr	B_lw, [srcend, -4]
-	str	A_lw, [dstin]
-	str	B_lw, [dstend, -4]
-	ret
-
-	/* Copy 0..3 bytes using a branchless sequence.  */
-L(copy4):
-	cbz	count, L(copy0)
-	lsr	tmp1, count, 1
-	ldrb	A_lw, [src]
-	ldrb	C_lw, [srcend, -1]
-	ldrb	B_lw, [src, tmp1]
-	strb	A_lw, [dstin]
-	strb	B_lw, [dstin, tmp1]
-	strb	C_lw, [dstend, -1]
-L(copy0):
-	ret
-
-	.p2align 4
-	/* Medium copies: 33..128 bytes.  */
-L(copy32_128):
-	ldp	A_q, B_q, [src]
-	ldp	C_q, D_q, [srcend, -32]
-	cmp	count, 64
-	b.hi	L(copy128)
-	stp	A_q, B_q, [dstin]
-	stp	C_q, D_q, [dstend, -32]
-	ret
-
-	.p2align 4
-	/* Copy 65..128 bytes.  */
-L(copy128):
-	ldp	E_q, F_q, [src, 32]
-	cmp	count, 96
-	b.ls	L(copy96)
-	ldp	G_q, H_q, [srcend, -64]
-	stp	G_q, H_q, [dstend, -64]
-L(copy96):
-	stp	A_q, B_q, [dstin]
-	stp	E_q, F_q, [dstin, 32]
-	stp	C_q, D_q, [dstend, -32]
-	ret
-
-	/* Align loop64 below to 16 bytes.  */
-	nop
-
-	/* Copy more than 128 bytes.  */
-L(copy_long):
-	/* Copy 16 bytes and then align src to 16-byte alignment.  */
-	ldr	D_q, [src]
-	and	tmp1, src, 15
-	bic	src, src, 15
-	sub	dst, dstin, tmp1
-	add	count, count, tmp1	/* Count is now 16 too large.  */
-	ldp	A_q, B_q, [src, 16]
-	str	D_q, [dstin]
-	ldp	C_q, D_q, [src, 48]
-	subs	count, count, 128 + 16	/* Test and readjust count.  */
-	b.ls	L(copy64_from_end)
-L(loop64):
-	stp	A_q, B_q, [dst, 16]
-	ldp	A_q, B_q, [src, 80]
-	stp	C_q, D_q, [dst, 48]
-	ldp	C_q, D_q, [src, 112]
-	add	src, src, 64
-	add	dst, dst, 64
-	subs	count, count, 64
-	b.hi	L(loop64)
-
-	/* Write the last iteration and copy 64 bytes from the end.  */
-L(copy64_from_end):
-	ldp	E_q, F_q, [srcend, -64]
-	stp	A_q, B_q, [dst, 16]
-	ldp	A_q, B_q, [srcend, -32]
-	stp	C_q, D_q, [dst, 48]
-	stp	E_q, F_q, [dstend, -64]
-	stp	A_q, B_q, [dstend, -32]
-	ret
-
-END (__memcpy_simd)
-libc_hidden_builtin_def (__memcpy_simd)
-
-
-ENTRY (__memmove_simd)
-	PTR_ARG (0)
-	PTR_ARG (1)
-	SIZE_ARG (2)
-
-	add	srcend, src, count
-	add	dstend, dstin, count
-	cmp	count, 128
-	b.hi	L(move_long)
-	cmp	count, 32
-	b.hi	L(copy32_128)
-
-	/* Small moves: 0..32 bytes.  */
-	cmp	count, 16
-	b.lo	L(copy16)
-	ldr	A_q, [src]
-	ldr	B_q, [srcend, -16]
-	str	A_q, [dstin]
-	str	B_q, [dstend, -16]
-	ret
-
-L(move_long):
-	/* Only use backward copy if there is an overlap.  */
-	sub	tmp1, dstin, src
-	cbz	tmp1, L(move0)
-	cmp	tmp1, count
-	b.hs	L(copy_long)
-
-	/* Large backwards copy for overlapping copies.
-	   Copy 16 bytes and then align srcend to 16-byte alignment.  */
-L(copy_long_backwards):
-	ldr	D_q, [srcend, -16]
-	and	tmp1, srcend, 15
-	bic	srcend, srcend, 15
-	sub	count, count, tmp1
-	ldp	A_q, B_q, [srcend, -32]
-	str	D_q, [dstend, -16]
-	ldp	C_q, D_q, [srcend, -64]
-	sub	dstend, dstend, tmp1
-	subs	count, count, 128
-	b.ls	L(copy64_from_start)
-
-L(loop64_backwards):
-	str	B_q, [dstend, -16]
-	str	A_q, [dstend, -32]
-	ldp	A_q, B_q, [srcend, -96]
-	str	D_q, [dstend, -48]
-	str	C_q, [dstend, -64]!
-	ldp	C_q, D_q, [srcend, -128]
-	sub	srcend, srcend, 64
-	subs	count, count, 64
-	b.hi	L(loop64_backwards)
-
-	/* Write the last iteration and copy 64 bytes from the start.  */
-L(copy64_from_start):
-	ldp	E_q, F_q, [src, 32]
-	stp	A_q, B_q, [dstend, -32]
-	ldp	A_q, B_q, [src]
-	stp	C_q, D_q, [dstend, -64]
-	stp	E_q, F_q, [dstin, 32]
-	stp	A_q, B_q, [dstin]
-L(move0):
-	ret
-
-END (__memmove_simd)
-libc_hidden_builtin_def (__memmove_simd)
diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
index 7dae8b7c956f9083d0896cc771cae79f4901581d..dbf1536525e614f72d3d74bb193015b303618357 100644
--- a/sysdeps/aarch64/multiarch/memmove.c
+++ b/sysdeps/aarch64/multiarch/memmove.c
@@ -29,7 +29,6 @@
 extern __typeof (__redirect_memmove) __libc_memmove;
 
 extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
-extern __typeof (__redirect_memmove) __memmove_simd attribute_hidden;
 extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
 extern __typeof (__redirect_memmove) __memmove_thunderx2 attribute_hidden;
 extern __typeof (__redirect_memmove) __memmove_a64fx attribute_hidden;
@@ -40,9 +39,6 @@ select_memmove_ifunc (void)
 {
   INIT_ARCH ();
 
-  if (IS_NEOVERSE_N1 (midr) || IS_NEOVERSE_N2 (midr))
-    return __memmove_simd;
-
   if (sve && HAVE_AARCH64_SVE_ASM)
     {
       if (IS_A64FX (midr))