From patchwork Tue Mar  5 17:02:57 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Adhemerval Zanella <adhemerval.zanella@linaro.org>
X-Patchwork-Id: 86832
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 4A0723858016
	for <patchwork@sourceware.org>; Tue,  5 Mar 2024 17:04:00 +0000 (GMT)
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com
 [IPv6:2607:f8b0:4864:20::534])
 by sourceware.org (Postfix) with ESMTPS id AE7053858426
 for <libc-alpha@sourceware.org>; Tue,  5 Mar 2024 17:03:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE7053858426
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AE7053858426
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2607:f8b0:4864:20::534
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709658193; cv=none;
 b=jB5CdWAoxfplnyxb3bwb4ufyvivd8on7odzP3Tnymr6PiTM1HF/h1DPO5CzgdV4ew8mVENJuY8iD5y+yC7H7yNuGetdutymk5f501Tz/tmVa2UVsutkffWbZ5dn5W0G1ciHss/XEEUV0W1vri/IXF3+hbgjmOYRCYC09flXMgGQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1709658193; c=relaxed/simple;
 bh=dbYUGwwsHJpn/JilY/XEfcw3D1McKxt4sv2ycqWfArs=;
 h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version;
 b=LQNszGYM+LV7+HTVFubczNL0EUWmCXZXhN80JMFo0255YxxmlooqEVKq9CqzwawMr6vuBWWKoa/NmDurnQzPt/U6yMwlpH0Ohc7IR4ZmW76gNTxqWoKBnljZ0CCrvkDMPmWwWx5TCMbmS+BLFL8DhDr4h+bovj36JRhScOJA+Fs=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-pg1-x534.google.com with SMTP id
 41be03b00d2f7-5c66b093b86so5438757a12.0
 for <libc-alpha@sourceware.org>; Tue, 05 Mar 2024 09:03:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=linaro.org; s=google; t=1709658183; x=1710262983; darn=sourceware.org;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:from:to:cc:subject:date:message-id:reply-to;
 bh=dBBpZYlv0He0NxaSQ0JvDk8vCfWDcR0G+e02fuRtEpU=;
 b=jBWUiqz4Q9IsU9uEUBm7ogMXZmlN5ACobbfuWV8J2fdc7zY4BNvwbvVwKO3jKPYyfG
 OSMhS4RHwTeVrxpD6TYMEqcbdx8oFo52crHV5xP+fuoDBLT6fLvnPKAoFJh1mgtYf40N
 dBHjNdcEDVaGGNum1sts+s+OwRcTJFKCwSVg25jJ7PHOCmwxXzMZ2WtmPmG6ccmciAMt
 MFfiSrr/ml5G1DOLa4w63IDCcDfxNbo1apZelqr+vDdggf+JcZ3XAUjCZdeG6iUY9rfw
 bfdBdgTe9NtmuOgHnPVn7xzfp99uYjT8FvZTxUZ7dbxy/WoMTvITNJ82O/mwwHYWKPSM
 2zTA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1709658183; x=1710262983;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=dBBpZYlv0He0NxaSQ0JvDk8vCfWDcR0G+e02fuRtEpU=;
 b=DwYu9yY7SXbVytTNpqH6uFtoAP6hdzNftkGppvErwvJxo1x7d3k517xIb5xdGTihls
 RVUMlPCN5xYhkv+MIatLnwv6srwRcT/74VvmaVUPa39gIr2K/GuIaZStvASCB+6tHHoG
 AKNG2BLVNr7vfRbYC8wF1NB0w4H4pKd9IXQlsLXWmziU8dsGegPpyDrLs/dFQoewrThW
 XEAsZGkBdcc+zw2++ON/cbZb9z5kJNWNL2pqYeaVzLA0LBlF64r49dbCkjHWnT7mhHa+
 qpIouUhXmT8W7OEdS336OrzEB4SSACP+NhmX0Wudn9jP1MD62SzWdQ6UoG6d16iLikWC
 0LoQ==
X-Gm-Message-State: AOJu0YxUeMfXKGS9NF8hObyUkKY8cwEnZLp9WPV7Oj1DSlH/QwS3hq1P
 wW+tehq2RnS8T2ljgDnv3r8VuVP7l+++oT4vf9x/Z26W3Cl9UqAPZA47PSEGsppPf/7V6EApUpy
 b
X-Google-Smtp-Source: 
 AGHT+IGDa3Jb9amfGW+pIoF0BtG8I9Wczgh3Dmc6wDoQ+M14ICax5skz0u/gjF1WkYGkzuF8ePzVYw==
X-Received: by 2002:a17:90a:b107:b0:29a:f199:1647 with SMTP id
 z7-20020a17090ab10700b0029af1991647mr4105728pjq.1.1709658182319;
 Tue, 05 Mar 2024 09:03:02 -0800 (PST)
Received: from mandiga.. ([2804:1b3:a7c1:ec17:6677:fbdd:5bee:239d])
 by smtp.gmail.com with ESMTPSA id
 k76-20020a633d4f000000b005dc8702f0a9sm9333205pga.1.2024.03.05.09.03.00
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 05 Mar 2024 09:03:01 -0800 (PST)
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Cc: vineetg@rivosinc.com, Evan Green <evan@rivosinc.com>, palmer@rivosinc.com,
 slewis@rivosinc.com, Andreas Schwab <schwab@linux-m68k.org>
Subject: [PATCH v2] riscv: Fix alignment-ignorant memcpy implementation
Date: Tue,  5 Mar 2024 14:02:57 -0300
Message-Id: <20240305170257.2639072-1-adhemerval.zanella@linaro.org>
X-Mailer: git-send-email 2.34.1
MIME-Version: 1.0
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org

The memcpy optimization (commit 587a1290a1af7bee6db) has a series
of mistakes:

  - The implementation is wrong: the chunk size calculation is wrong
    leading to invalid memory access.

  - It adds ifunc supports as default, so --disable-multi-arch does
    not work as expected for riscv.

  - It mixes Linux files (memcpy ifunc selection which requires the
    vDSO/syscall mechanism)  with generic support (the memcpy
    optimization itself).

  - There is no __libc_ifunc_impl_list, which makes testing only
    check the selected implementation instead of all supported
    by the system.

This patch also simplifies the required bits to enable ifunc: there
is no need to memcopy.h; nor to add Linux-specific files.

The __memcpy_noalignment tail handling now uses a branchless strategy
similar to aarch64 (overlap 32-bits copies for sizes 4..7 and byte
copies for size 1..3).

Checked on riscv64 and riscv32 by explicitly enabling the function
on __libc_ifunc_impl_list on qemu-system.

Changes from v1:
* Implement the memcpy in assembly to correctly handle RISCV
  strict-alignment.
Reviewed-by: Evan Green <evan@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 sysdeps/riscv/memcpy_noalignment.S            | 136 ---------------
 .../multiarch}/memcpy-generic.c               |   8 +-
 sysdeps/riscv/multiarch/memcpy_noalignment.S  | 162 ++++++++++++++++++
 sysdeps/unix/sysv/linux/riscv/Makefile        |   9 -
 sysdeps/unix/sysv/linux/riscv/hwprobe.c       |   1 +
 .../sysv/linux/riscv/include/sys/hwprobe.h    |   8 +
 .../unix/sysv/linux/riscv/multiarch/Makefile  |   9 +
 .../linux/riscv/multiarch/ifunc-impl-list.c}  |  33 +++-
 .../sysv/linux/riscv/multiarch}/memcpy.c      |  16 +-
 9 files changed, 215 insertions(+), 167 deletions(-)
 delete mode 100644 sysdeps/riscv/memcpy_noalignment.S
 rename sysdeps/{unix/sysv/linux/riscv => riscv/multiarch}/memcpy-generic.c (87%)
 create mode 100644 sysdeps/riscv/multiarch/memcpy_noalignment.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/include/sys/hwprobe.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/Makefile
 rename sysdeps/{riscv/memcopy.h => unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c} (51%)
 rename sysdeps/{riscv => unix/sysv/linux/riscv/multiarch}/memcpy.c (90%)

diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S
deleted file mode 100644
index 621f8d028f..0000000000
--- a/sysdeps/riscv/memcpy_noalignment.S
+++ /dev/null
@@ -1,136 +0,0 @@
-/* memcpy for RISC-V, ignoring buffer alignment
-   Copyright (C) 2024 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library.  If not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-#include <sys/asm.h>
-
-/* void *memcpy(void *, const void *, size_t) */
-ENTRY (__memcpy_noalignment)
-	move t6, a0  /* Preserve return value */
-
-	/* Bail if 0 */
-	beqz a2, 7f
-
-	/* Jump to byte copy if size < SZREG */
-	li a4, SZREG
-	bltu a2, a4, 5f
-
-	/* Round down to the nearest "page" size */
-	andi a4, a2, ~((16*SZREG)-1)
-	beqz a4, 2f
-	add a3, a1, a4
-
-	/* Copy the first word to get dest word aligned */
-	andi a5, t6, SZREG-1
-	beqz a5, 1f
-	REG_L a6, (a1)
-	REG_S a6, (t6)
-
-	/* Align dst up to a word, move src and size as well. */
-	addi t6, t6, SZREG-1
-	andi t6, t6, ~(SZREG-1)
-	sub a5, t6, a0
-	add a1, a1, a5
-	sub a2, a2, a5
-
-	/* Recompute page count */
-	andi a4, a2, ~((16*SZREG)-1)
-	beqz a4, 2f
-
-1:
-	/* Copy "pages" (chunks of 16 registers) */
-	REG_L a4,       0(a1)
-	REG_L a5,   SZREG(a1)
-	REG_L a6, 2*SZREG(a1)
-	REG_L a7, 3*SZREG(a1)
-	REG_L t0, 4*SZREG(a1)
-	REG_L t1, 5*SZREG(a1)
-	REG_L t2, 6*SZREG(a1)
-	REG_L t3, 7*SZREG(a1)
-	REG_L t4, 8*SZREG(a1)
-	REG_L t5, 9*SZREG(a1)
-	REG_S a4,       0(t6)
-	REG_S a5,   SZREG(t6)
-	REG_S a6, 2*SZREG(t6)
-	REG_S a7, 3*SZREG(t6)
-	REG_S t0, 4*SZREG(t6)
-	REG_S t1, 5*SZREG(t6)
-	REG_S t2, 6*SZREG(t6)
-	REG_S t3, 7*SZREG(t6)
-	REG_S t4, 8*SZREG(t6)
-	REG_S t5, 9*SZREG(t6)
-	REG_L a4, 10*SZREG(a1)
-	REG_L a5, 11*SZREG(a1)
-	REG_L a6, 12*SZREG(a1)
-	REG_L a7, 13*SZREG(a1)
-	REG_L t0, 14*SZREG(a1)
-	REG_L t1, 15*SZREG(a1)
-	addi a1, a1, 16*SZREG
-	REG_S a4, 10*SZREG(t6)
-	REG_S a5, 11*SZREG(t6)
-	REG_S a6, 12*SZREG(t6)
-	REG_S a7, 13*SZREG(t6)
-	REG_S t0, 14*SZREG(t6)
-	REG_S t1, 15*SZREG(t6)
-	addi t6, t6, 16*SZREG
-	bltu a1, a3, 1b
-	andi a2, a2, (16*SZREG)-1  /* Update count */
-
-2:
-	/* Remainder is smaller than a page, compute native word count */
-	beqz a2, 7f
-	andi a5, a2, ~(SZREG-1)
-	andi a2, a2, (SZREG-1)
-	add a3, a1, a5
-	/* Jump directly to last word if no words. */
-	beqz a5, 4f
-
-3:
-	/* Use single native register copy */
-	REG_L a4, 0(a1)
-	addi a1, a1, SZREG
-	REG_S a4, 0(t6)
-	addi t6, t6, SZREG
-	bltu a1, a3, 3b
-
-	/* Jump directly out if no more bytes */
-	beqz a2, 7f
-
-4:
-	/* Copy the last word unaligned */
-	add a3, a1, a2
-	add a4, t6, a2
-	REG_L a5, -SZREG(a3)
-	REG_S a5, -SZREG(a4)
-	ret
-
-5:
-	/* Copy bytes when the total copy is <SZREG */
-	add a3, a1, a2
-
-6:
-	lb a4, 0(a1)
-	addi a1, a1, 1
-	sb a4, 0(t6)
-	addi t6, t6, 1
-	bltu a1, a3, 6b
-
-7:
-	ret
-
-END (__memcpy_noalignment)
diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c b/sysdeps/riscv/multiarch/memcpy-generic.c
similarity index 87%
rename from sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
rename to sysdeps/riscv/multiarch/memcpy-generic.c
index f06f4bda15..4235d33859 100644
--- a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
+++ b/sysdeps/riscv/multiarch/memcpy-generic.c
@@ -18,7 +18,9 @@
 
 #include <string.h>
 
-extern __typeof (memcpy) __memcpy_generic;
-hidden_proto (__memcpy_generic)
-
+#if IS_IN(libc)
+# define MEMCPY __memcpy_generic
+# undef libc_hidden_builtin_def
+# define libc_hidden_builtin_def(x)
+#endif
 #include <string/memcpy.c>
diff --git a/sysdeps/riscv/multiarch/memcpy_noalignment.S b/sysdeps/riscv/multiarch/memcpy_noalignment.S
new file mode 100644
index 0000000000..fa39be21d6
--- /dev/null
+++ b/sysdeps/riscv/multiarch/memcpy_noalignment.S
@@ -0,0 +1,162 @@
+/* memcpy for RISC-V, ignoring buffer alignment
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+/* memcpy optimization for CPUs with fast unaligned support
+   (RISCV_HWPROBE_MISALIGNED_FAST).
+
+   Copies are split into 3 main cases: small copies up to SZREG, copies up to
+   BLOCK_SIZE (128 for 64 bits, 64 for 32 bits), and copies larger than BLOCK_SIZE.
+
+   Large copies use a software pipelined loop processing BLOCK_SIZE bytes per
+   iteration.  The destination pointer is SZREG-byte aligned to minimize store
+   unaligned accesses.
+
+   The tail is handled with branchless copies.  */
+
+#define BLOCK_SIZE (16 * SZREG)
+
+	.attribute unaligned_access, 1
+ENTRY (__memcpy_noalignment)
+	beq	a2, zero, L(ret)
+
+	/* if LEN < SZREG jump to tail handling.  */
+	li	a5, SZREG-1
+	mv	a6, a0
+	bleu	a2, a5, L(tail)
+
+	/* Copy the first word, align DEST to word, and adjust DEST/SRC/LEN
+	   based on the amount adjusted to align DEST.  */
+	REG_L	a3, 0(a1)
+	andi	a5, a0, SZREG-1
+	addi	a2, a2, -SZREG
+	li	a4, SZREG
+	sub	a4, a4, a5
+	REG_S	a3, 0(a0)
+	add	a2, a5, a2
+
+	/* If LEN < BLOCK_SIZE jump to word copy.  */
+	li	a3, BLOCK_SIZE-1
+	add	a5, a0, a4
+	add	a1, a1, a4
+	bleu	a2, a3, L(word_copy_adjust)
+	addi	a7, a2, -BLOCK_SIZE
+	andi	a7, a7, -BLOCK_SIZE
+	addi	a7, a7, BLOCK_SIZE
+	add	a3, a5, a7
+	mv	a4, a1
+L(block_copy):
+	REG_L	a6,          0(a4)
+	REG_L	t0,      SZREG(a4)
+	REG_L	t1,  (2*SZREG)(a4)
+	REG_L	t2,  (3*SZREG)(a4)
+	REG_L	t3,  (4*SZREG)(a4)
+	REG_L	t4,  (5*SZREG)(a4)
+	REG_L	t5,  (6*SZREG)(a4)
+	REG_L	t6,  (7*SZREG)(a4)
+	REG_S	a6,          0(a5)
+	REG_S	t0,      SZREG(a5)
+	REG_S	t1,  (2*SZREG)(a5)
+	REG_S	t2,  (3*SZREG)(a5)
+	REG_S	t3,  (4*SZREG)(a5)
+	REG_S	t4,  (5*SZREG)(a5)
+	REG_S	t5,  (6*SZREG)(a5)
+	REG_S	t6,  (7*SZREG)(a5)
+	REG_L	a6,  (8*SZREG)(a4)
+	REG_L	t0,  (9*SZREG)(a4)
+	REG_L	t1, (10*SZREG)(a4)
+	REG_L	t2, (11*SZREG)(a4)
+	REG_L	t3, (12*SZREG)(a4)
+	REG_L	t4, (13*SZREG)(a4)
+	REG_L	t5, (14*SZREG)(a4)
+	REG_L	t6, (15*SZREG)(a4)
+	addi	a4, a4, BLOCK_SIZE
+	REG_S	a6,  (8*SZREG)(a5)
+	REG_S	t0,  (9*SZREG)(a5)
+	REG_S	t1, (10*SZREG)(a5)
+	REG_S	t2, (11*SZREG)(a5)
+	REG_S	t3, (12*SZREG)(a5)
+	REG_S	t4, (13*SZREG)(a5)
+	REG_S	t5, (14*SZREG)(a5)
+	REG_S	t6, (15*SZREG)(a5)
+	addi	a5, a5, BLOCK_SIZE
+	bne	a5, a3, L(block_copy)
+	add	a1, a1, a7
+	andi	a2, a2, BLOCK_SIZE-1
+
+	/* 0 <= a2/LEN  < BLOCK_SIZE.  */
+L(word_copy):
+	li	a5, SZREG-1
+	/* if LEN < SZREG jump to tail handling.  */
+	bleu	a2, a5, L(tail_adjust)
+	addi	a7, a2, -SZREG
+	andi	a7, a7, -SZREG
+	addi	a7, a7, SZREG
+	add	a6, a3, a7
+	mv	a5, a1
+L(word_copy_loop):
+	REG_L	a4, 0(a5)
+	addi	a3, a3, SZREG
+	addi	a5, a5, SZREG
+	REG_S	a4, -SZREG(a3)
+	bne	a3, a6, L(word_copy_loop)
+	add	a1, a1, a7
+	andi	a2, a2, SZREG-1
+
+	/* Copy the last word unaligned.  */
+	add	a3, a1, a2
+	add	a4, a6, a2
+	REG_L	t0, -SZREG(a3)
+	REG_S	t0, -SZREG(a4)
+	ret
+
+L(tail):
+	/* Copy 4-7 bytes.  */
+	andi	a5, a2, 4
+	add	a3, a1, a2
+	add	a4, a6, a2
+	beq	a5, zero, L(copy_0_3)
+	lw	t0, 0(a1)
+	lw	t1, -4(a3)
+	sw	t0, 0(a6)
+	sw	t1, -4(a4)
+	ret
+
+	/* Copy 0-3 bytes.  */
+L(copy_0_3):
+	beq	a2, zero, L(ret)
+	srli    a2, a2, 1
+	add     t4, a1, a2
+	add     t5, a6, a2
+	lbu     t0, 0(a1)
+	lbu     t1, -1(a3)
+	lbu     t2, 0(t4)
+	sb      t0, 0(a6)
+	sb      t1, -1(a4)
+	sb      t2, 0(t5)
+L(ret):
+	ret
+L(tail_adjust):
+	mv	a6, a3
+	j	L(tail)
+L(word_copy_adjust):
+	mv	a3, a5
+	j	L(word_copy)
+END (__memcpy_noalignment)
diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile
index 398ff7418b..04abf226ad 100644
--- a/sysdeps/unix/sysv/linux/riscv/Makefile
+++ b/sysdeps/unix/sysv/linux/riscv/Makefile
@@ -15,15 +15,6 @@ ifeq ($(subdir),stdlib)
 gen-as-const-headers += ucontext_i.sym
 endif
 
-ifeq ($(subdir),string)
-sysdep_routines += \
-  memcpy \
-  memcpy-generic \
-  memcpy_noalignment \
-  # sysdep_routines
-
-endif
-
 abi-variants := ilp32 ilp32d lp64 lp64d
 
 ifeq (,$(filter $(default-abi),$(abi-variants)))
diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
index e64c159eb3..9159045478 100644
--- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c
+++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
@@ -34,3 +34,4 @@ int __riscv_hwprobe (struct riscv_hwprobe *pairs, size_t pair_count,
   /* Negate negative errno values to match pthreads API. */
   return -r;
 }
+libc_hidden_def (__riscv_hwprobe)
diff --git a/sysdeps/unix/sysv/linux/riscv/include/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/include/sys/hwprobe.h
new file mode 100644
index 0000000000..cce91c1b53
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/include/sys/hwprobe.h
@@ -0,0 +1,8 @@
+#ifndef _SYS_HWPROBE_H
+# include_next <sys/hwprobe.h>
+
+#ifndef _ISOMAC
+libc_hidden_proto (__riscv_hwprobe)
+#endif
+
+#endif
diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile
new file mode 100644
index 0000000000..fcef5659d4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile
@@ -0,0 +1,9 @@
+ifeq ($(subdir),string)
+sysdep_routines += \
+  memcpy \
+  memcpy-generic \
+  memcpy_noalignment \
+  # sysdep_routines
+
+CFLAGS-memcpy_noalignment.c += -mno-strict-align
+endif
diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c
similarity index 51%
rename from sysdeps/riscv/memcopy.h
rename to sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c
index 27675964b0..9f806d7a9e 100644
--- a/sysdeps/riscv/memcopy.h
+++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c
@@ -1,4 +1,4 @@
-/* memcopy.h -- definitions for memory copy functions. RISC-V version.
+/* Enumerate available IFUNC implementations of a function.  RISCV version.
    Copyright (C) 2024 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,11 +16,28 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <sysdeps/generic/memcopy.h>
+#include <ifunc-impl-list.h>
+#include <string.h>
+#include <sys/hwprobe.h>
 
-/* Redefine the generic memcpy implementation to __memcpy_generic, so
-   the memcpy ifunc can select between generic and special versions.
-   In rtld, don't bother with all the ifunciness. */
-#if IS_IN (libc)
-#define MEMCPY __memcpy_generic
-#endif
+size_t
+__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
+			size_t max)
+{
+  size_t i = max;
+
+  bool fast_unaligned = false;
+
+  struct riscv_hwprobe pair = { .key = RISCV_HWPROBE_KEY_CPUPERF_0 };
+  if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) == 0
+      && (pair.value & RISCV_HWPROBE_MISALIGNED_MASK)
+          == RISCV_HWPROBE_MISALIGNED_FAST)
+    fast_unaligned = true;
+
+  IFUNC_IMPL (i, name, memcpy,
+	      IFUNC_IMPL_ADD (array, i, memcpy, fast_unaligned,
+			      __memcpy_noalignment)
+	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
+
+  return 0;
+}
diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/unix/sysv/linux/riscv/multiarch/memcpy.c
similarity index 90%
rename from sysdeps/riscv/memcpy.c
rename to sysdeps/unix/sysv/linux/riscv/multiarch/memcpy.c
index 20f9548c44..51d8ace858 100644
--- a/sysdeps/riscv/memcpy.c
+++ b/sysdeps/unix/sysv/linux/riscv/multiarch/memcpy.c
@@ -28,8 +28,6 @@
 # include <riscv-ifunc.h>
 # include <sys/hwprobe.h>
 
-# define INIT_ARCH()
-
 extern __typeof (__redirect_memcpy) __libc_memcpy;
 
 extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
@@ -38,14 +36,9 @@ extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden;
 static inline __typeof (__redirect_memcpy) *
 select_memcpy_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func)
 {
-  unsigned long long int value;
-
-  INIT_ARCH ();
-
-  if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &value) != 0)
-    return __memcpy_generic;
-
-  if ((value & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST)
+  unsigned long long int v;
+  if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &v) == 0
+      && (v & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST)
     return __memcpy_noalignment;
 
   return __memcpy_generic;
@@ -59,5 +52,6 @@ strong_alias (__libc_memcpy, memcpy);
 __hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy)
   __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy);
 # endif
-
+#else
+# include <string/memcpy.c>
 #endif