From patchwork Thu Dec 11 13:27:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pincheng Wang X-Patchwork-Id: 126382 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 907DB4BA2E31 for ; Thu, 11 Dec 2025 13:27:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 907DB4BA2E31 X-Original-To: newlib@sourceware.org Delivered-To: newlib@sourceware.org Received: from cstnet.cn (smtp81.cstnet.cn [159.226.251.81]) by sourceware.org (Postfix) with ESMTPS id D976C4BA2E05 for ; Thu, 11 Dec 2025 13:27:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D976C4BA2E05 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=isrc.iscas.ac.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=isrc.iscas.ac.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D976C4BA2E05 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=159.226.251.81 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1765459655; cv=none; b=cDdI3Nzy3hfp7NwkS43XDWWZcmafpJrZA3N3ExxKMdkEdC73VsK4pwshLUVSYCDlNcKt0J+Q2elZFTAp+5Jmj0p4tI9hrJE4EUHC2LLedgOwOQF2n8bxFuMXZgyOxqk4EEccUPUV2qaIMBNQX6apQ0/CDhuUHr5g1e0k4ffLV+E= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1765459655; c=relaxed/simple; bh=XdJY/RWookAfk6QerbbKLDzvL1Y76D4S5cF/6EJD9ao=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=cn3knoeYaGcxGRp+90OJ//dsD/m81yD9LEvV0CIWMjMdxpQ+2I5gJuhk/ROYwFf7Wazs0EtD2oJ9iTPkamLLjXNku2vqMcUxNvFphulLN0Je3PWQwxYNJ5i2zWLS+7GZxqU4oULIkK8zptnGnGHvHaxDIpcOm96v2yqW7MHxC1o= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D976C4BA2E05 Received: from ROG.lan (unknown [120.227.57.121]) by APP-03 (Coremail) with SMTP id rQCowAAHJdzBxjpp_klOAA--.24119S3; Thu, 11 Dec 2025 21:27:29 +0800 (CST) From: Pincheng Wang To: newlib@sourceware.org Cc: pincheng.plct@isrc.iscas.ac.cn Subject: [PATCH 1/1] riscv: add vectorized memset, memcpy and memmove Date: Thu, 11 Dec 2025 21:27:25 +0800 Message-Id: <20251211132725.435742-2-pincheng.plct@isrc.iscas.ac.cn> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20251211132725.435742-1-pincheng.plct@isrc.iscas.ac.cn> References: <20251211132725.435742-1-pincheng.plct@isrc.iscas.ac.cn> MIME-Version: 1.0 X-CM-TRANSID: rQCowAAHJdzBxjpp_klOAA--.24119S3 X-Coremail-Antispam: 1UD129KBjvJXoWxuFWrWw18AFy8Aw1ktF1rXrb_yoW7WFy5pF 4UGFWrtw1ftrn3ArZ3XF1rZw43Xry8W3W3GFW5Aa1UtFZ8GayfKrZ0ya1ay3WFqrZ29r4f Wa1xAry5uw45ZrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9I14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE 2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1l84ACjc xK6I8E87Iv67AKxVWUJVW8JwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UM2AIxVAI cxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14 v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY c2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxAIw28IcxkI7VAKI48JMxC20s026x CaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_ JrWlx4CE17CEb7AF67AKxVWUXVWUAwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r 1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_ Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8Jb IYCTnIWIevJa73UjIFyTuYvjfU5VbkUUUUU X-Originating-IP: [120.227.57.121] X-CM-SenderInfo: pslquxhhqjh1xofwqxxvufhxpvfd2hldfou0/ X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_PASS, SPF_PASS, TXREP, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: newlib-bounces~patchwork=sourceware.org@sourceware.org The vector implementations use m8 register grouping and process data in vector-length chunks, providing significant performance improvements on RVV-capable hardware. Use conditional compilation to fallback to scalar implementation when __riscv_v is not available, maintaining compatibility with non-vector RISC-V systems. Signed-off-by: Pincheng Wang --- newlib/libc/machine/riscv/memcpy-asm.S | 23 +++++++++++- newlib/libc/machine/riscv/memcpy.c | 2 +- newlib/libc/machine/riscv/memmove-asm.S | 47 ++++++++++++++++++++++++- newlib/libc/machine/riscv/memmove.c | 2 +- newlib/libc/machine/riscv/memset.S | 22 ++++++++++++ 5 files changed, 92 insertions(+), 4 deletions(-) diff --git a/newlib/libc/machine/riscv/memcpy-asm.S b/newlib/libc/machine/riscv/memcpy-asm.S index 2771285f9..9d1d2d4bd 100644 --- a/newlib/libc/machine/riscv/memcpy-asm.S +++ b/newlib/libc/machine/riscv/memcpy-asm.S @@ -9,11 +9,11 @@ http://www.opensource.org/licenses. */ -#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) .text .global memcpy .type memcpy, @function memcpy: +#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) mv a3, a0 beqz a2, 2f @@ -29,4 +29,25 @@ memcpy: ret .size memcpy, .-memcpy +#elif defined(__riscv_v) + .option push + .option arch, +v + mv t0, a0 /* running dst */ + mv t1, a1 /* running src */ + beqz a2, .Ldone_copy /* n == 0 then return */ + +.Lbulk_copy: + vsetvli t2, a2, e8, m8, ta, ma /* t2 = vl (bytes) */ + vle8.v v0, (t1) + vse8.v v0, (t0) + add t0, t0, t2 + add t1, t1, t2 + sub a2, a2, t2 + bnez a2, .Lbulk_copy + /* fallthrough */ + +.Ldone_copy: + ret +.size memcpy, .-memcpy +.option pop #endif diff --git a/newlib/libc/machine/riscv/memcpy.c b/newlib/libc/machine/riscv/memcpy.c index a27e0ecb1..cd58c30a5 100644 --- a/newlib/libc/machine/riscv/memcpy.c +++ b/newlib/libc/machine/riscv/memcpy.c @@ -10,7 +10,7 @@ http://www.opensource.org/licenses. */ -#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) +#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) || defined(__riscv_v) // memcpy defined in memcpy-asm.S #else diff --git a/newlib/libc/machine/riscv/memmove-asm.S b/newlib/libc/machine/riscv/memmove-asm.S index 061472ca2..5cc2e5143 100644 --- a/newlib/libc/machine/riscv/memmove-asm.S +++ b/newlib/libc/machine/riscv/memmove-asm.S @@ -9,11 +9,11 @@ http://www.opensource.org/licenses. */ -#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) .text .global memmove .type memmove, @function memmove: +#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) beqz a2, .Ldone /* in case there are 0 bytes to be copied, return immediately */ mv a4, a0 /* copy the destination address over to a4, since memmove should return that address in a0 at the end */ @@ -37,4 +37,49 @@ memmove: ret .size memmove, .-memmove +#elif defined(__riscv_v) + .option push + .option arch, +v + beqz a2, .Ldone_move /* n == 0 */ + beq a0, a1, .Ldone_move /* dst == src */ + + /* overlap check */ + bgeu a1, a0, .Lforward_move /* src >= dst then forward move*/ + + sub t2, a0, a1 /* t2 = dst - src */ + bgeu t2, a2, .Lforward_move /* no overlap then forward move */ + + /* backward move */ + add t0, a0, a2 /* running dst_end */ + add t1, a1, a2 /* running src_end */ + +.Lbackward_loop: + vsetvli t3, a2, e8, m8, ta, ma /* t3 = vl (bytes) */ + sub t0, t0, t3 + sub t1, t1, t3 + vle8.v v0, (t1) + vse8.v v0, (t0) + sub a2, a2, t3 + bnez a2, .Lbackward_loop + j .Ldone_move + +/* forward move, same as memcpy */ +.Lforward_move: + mv t0, a0 /* running dst */ + mv t1, a1 /* running src */ + +.Lforward_loop: + vsetvli t3, a2, e8, m8, ta, ma + vle8.v v0, (t1) + vse8.v v0, (t0) + add t0, t0, t3 + add t1, t1, t3 + sub a2, a2, t3 + bnez a2, .Lforward_loop + /* fallthrough */ + +.Ldone_move: + ret +.size memmove, .-memmove +.option pop #endif diff --git a/newlib/libc/machine/riscv/memmove.c b/newlib/libc/machine/riscv/memmove.c index 209a75c69..67ce08b02 100644 --- a/newlib/libc/machine/riscv/memmove.c +++ b/newlib/libc/machine/riscv/memmove.c @@ -10,7 +10,7 @@ http://www.opensource.org/licenses. */ -#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) +#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) || defined(__riscv_v) /* memmove defined in memmove-asm.S */ #else diff --git a/newlib/libc/machine/riscv/memset.S b/newlib/libc/machine/riscv/memset.S index 533f66758..80f43fbaf 100644 --- a/newlib/libc/machine/riscv/memset.S +++ b/newlib/libc/machine/riscv/memset.S @@ -63,6 +63,28 @@ memset: .Ldone: ret +#elif defined(__riscv_v) + .option push + .option arch, +v + mv t0, a0 /* running dst; keep a0 as return */ + beqz a2, .Ldone_vect /* n == 0 then return */ + + /* Broadcast fill byte once. */ + vsetvli t1, zero, e8, m8, ta, ma + vmv.v.x v0, a1 + +.Lbulk_vect: + vsetvli t1, a2, e8, m8, ta, ma /* t1 = vl (bytes) */ + vse8.v v0, (t0) + add t0, t0, t1 + sub a2, a2, t1 + bnez a2, .Lbulk_vect + /* fallthrough */ + +.Ldone_vect: + ret + .option pop + #else li REG_TABLE, BYTE_TBL_SZ mv a3, a0