From patchwork Thu Mar 5 06:19:47 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pincheng Wang X-Patchwork-Id: 131123 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 329094BA23D7 for ; Thu, 5 Mar 2026 06:21:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 329094BA23D7 X-Original-To: newlib@sourceware.org Delivered-To: newlib@sourceware.org Received: from cstnet.cn (smtp81.cstnet.cn [159.226.251.81]) by sourceware.org (Postfix) with ESMTPS id 6EF6C4BA2E16 for ; Thu, 5 Mar 2026 06:20:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6EF6C4BA2E16 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=isrc.iscas.ac.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=isrc.iscas.ac.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6EF6C4BA2E16 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=159.226.251.81 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1772691625; cv=none; b=ulEl5lCpRCjayWKMH0onZeyDr3UD3nWTfLU32Dk4OWO4u1oJ5PgKDChJIxHOx8rj2y93X1o7jzBbsfXkMFirSucsxNYb/CMrWtVXMI6sRsfbxpShQc7A2uSKWq/3SBT785gRxnU49CHDUOkz5JkvWRbucrCaL1LBoqLslC02a9Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1772691625; c=relaxed/simple; bh=kWJOF3Mnt/5KG6v2xhYyWhmLVv50Es83QCKg4wJhO4E=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=vESOaSNxpi8UMXBXfw9iNZS48+jmum9wbbVnVyB7LzERtOR7fkhjfS7ach13M+mht6S+x1XViisgn+JXgJ9xPrR1VycWx3RU5FMhYAwHcFS0XLSnSTib+pCqE868KQJrOctGr3l7IKmyc3ubRTS5hDJwYdb4IhJZ2s3JVTTZC9E= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6EF6C4BA2E16 Received: from localhost.localdomain (unknown [36.148.251.141]) by APP-03 (Coremail) with SMTP id rQCowAB3lt2ZIKlpn5bdCQ--.195S3; Thu, 05 Mar 2026 14:20:11 +0800 (CST) From: Pincheng Wang To: newlib@sourceware.org Cc: pincheng.plct@isrc.iscas.ac.cn, kito.cheng@gmail.com Subject: [PATCH v3 1/1] riscv: add vectorized memset, memcpy and memmove Date: Thu, 5 Mar 2026 14:19:47 +0800 Message-Id: <20260305061947.31797-2-pincheng.plct@isrc.iscas.ac.cn> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20260305061947.31797-1-pincheng.plct@isrc.iscas.ac.cn> References: <20260305061947.31797-1-pincheng.plct@isrc.iscas.ac.cn> MIME-Version: 1.0 X-CM-TRANSID: rQCowAB3lt2ZIKlpn5bdCQ--.195S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw43CryUGw18tr18tr45Awb_yoWxAw47pF 4UGFWIy34ftrn3XrZIq3WrZwsxXry8WF13GFW3ZayUAFs8Wa95KFZ0ya1ay3Wvqr929w4x uwn7Cr15uw45ZFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9S14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r1Y6r1xM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1l84 ACjcxK6I8E87Iv67AKxVWUJVW8JwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Jr0_Gr1le2I2 62IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcV AFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG 0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1l42xK82IYc2Ij64vIr41l4I8I3I 0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWU GVWUWwC2zVAF1VAY17CE14v26r1Y6r17MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI 0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0 rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r 4UYxBIdaVFxhVjvjDU0xZFpf9x0JUHHq7UUUUU= X-Originating-IP: [36.148.251.141] X-CM-SenderInfo: pslquxhhqjh1xofwqxxvufhxpvfd2hldfou0/ X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_BLOCKED, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: newlib-bounces~patchwork=sourceware.org@sourceware.org The vector implementations use m8 register grouping and process data in vector-length chunks, providing significant performance improvements on RVV-capable hardware. Use conditional compilation to fallback to scalar implementations when __riscv_vector is not available, maintaining compatibility with non-vector RISC-V systems. Signed-off-by: Pincheng Wang --- newlib/libc/machine/riscv/memcpy-asm.S | 52 ++++++++++++++ newlib/libc/machine/riscv/memcpy.c | 2 +- newlib/libc/machine/riscv/memmove-asm.S | 93 +++++++++++++++++++++++++ newlib/libc/machine/riscv/memmove.c | 2 +- newlib/libc/machine/riscv/memset.S | 18 +++++ 5 files changed, 165 insertions(+), 2 deletions(-) diff --git a/newlib/libc/machine/riscv/memcpy-asm.S b/newlib/libc/machine/riscv/memcpy-asm.S index 2771285f9..352d1e618 100644 --- a/newlib/libc/machine/riscv/memcpy-asm.S +++ b/newlib/libc/machine/riscv/memcpy-asm.S @@ -29,4 +29,56 @@ memcpy: ret .size memcpy, .-memcpy +#elif defined(__riscv_vector) +.text +.option push +.option arch, +zve32x +.global memcpy +.type memcpy, @function +memcpy: + mv t0, a0 /* t0 = running dst */ + mv t1, a1 /* t1 = running src */ + beqz a2, .Ldone /* if n == 0, return */ + + /* Align dst to SZREG: skip when __riscv_misaligned_fast, else align */ +#ifndef __riscv_misaligned_fast + /* process small data directly with vectors, no alignment optimization */ + li t3, 32 + bltu a2, t3, .Lbulk_copy +#if __riscv_xlen == 64 + andi t2, t0, 7 /* t2 = dst & 7 */ + beqz t2, .Lbulk_copy /* already aligned to 8 bytes */ + li t4, 8 + sub t2, t4, t2 /* pad = 8 - (dst & 7) */ +#else + andi t2, t0, 3 /* t2 = dst & 3 */ + beqz t2, .Lbulk_copy /* already aligned to 4 bytes */ + li t4, 4 + sub t2, t4, t2 /* pad = 4 - (dst & 3) */ +#endif + /* copy prologue using vectors */ + vsetvli t3, t2, e8, m8, ta, ma + vle8.v v0, (t1) + vse8.v v0, (t0) + add t0, t0, t3 + add t1, t1, t3 + sub a2, a2, t3 + beqz a2, .Ldone +#endif + +.Lbulk_copy: + vsetvli t2, a2, e8, m8, ta, ma + vle8.v v0, (t1) + vse8.v v0, (t0) + add t0, t0, t2 + add t1, t1, t2 + sub a2, a2, t2 + bnez a2, .Lbulk_copy + /* fallthrough */ + +.Ldone: + ret + + .size memcpy, .-memcpy + .option pop #endif diff --git a/newlib/libc/machine/riscv/memcpy.c b/newlib/libc/machine/riscv/memcpy.c index a27e0ecb1..7fa0ff804 100644 --- a/newlib/libc/machine/riscv/memcpy.c +++ b/newlib/libc/machine/riscv/memcpy.c @@ -10,7 +10,7 @@ http://www.opensource.org/licenses. */ -#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) +#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) || defined(__riscv_vector) // memcpy defined in memcpy-asm.S #else diff --git a/newlib/libc/machine/riscv/memmove-asm.S b/newlib/libc/machine/riscv/memmove-asm.S index 061472ca2..c25796d89 100644 --- a/newlib/libc/machine/riscv/memmove-asm.S +++ b/newlib/libc/machine/riscv/memmove-asm.S @@ -37,4 +37,97 @@ memmove: ret .size memmove, .-memmove +#elif defined(__riscv_vector) +.text +.global memmove +.type memmove, @function +.option push +.option arch, +zve32x +memmove: + beqz a2, .Ldone_move /* n == 0 */ + beq a0, a1, .Ldone_move /* dst == src */ + + /* overlap check */ + bgeu a1, a0, .Lforward_move /* src >= dst then forward move */ + sub t2, a0, a1 /* t2 = dst - src */ + bgeu t2, a2, .Lforward_move /* no overlap then forward move */ + + /* backward move */ + add t0, a0, a2 /* running dst_end */ + add t1, a1, a2 /* running src_end */ + /* Align dst_end to SZREG: skip when __riscv_misaligned_fast, else align */ +#ifndef __riscv_misaligned_fast + /* process small data directly with vectors, no alignment optimization */ + li t3, 32 + bltu a2, t3, .Lbackward_loop + +#if __riscv_xlen == 64 + andi t2, t0, 7 /* misalignment = dst_end & 7 */ +#else + andi t2, t0, 3 /* misalignment = dst_end & 3 */ +#endif + beqz t2, .Lbackward_aligned /* already aligned */ + /* copy tail bytes to reach aligned dst_end */ + vsetvli t3, t2, e8, m8, ta, ma + sub t0, t0, t3 + sub t1, t1, t3 + vle8.v v0, (t1) + vse8.v v0, (t0) + sub a2, a2, t3 +.Lbackward_aligned: +#endif +.Lbackward_loop: + vsetvli t3, a2, e8, m8, ta, ma /* t3 = vl (bytes) */ + sub t0, t0, t3 + sub t1, t1, t3 + vle8.v v0, (t1) + vse8.v v0, (t0) + sub a2, a2, t3 + bnez a2, .Lbackward_loop + ret + + /* forward move, same as memcpy */ +.Lforward_move: + mv t0, a0 /* running dst */ + mv t1, a1 /* running src */ + /* Align dst to SZREG: skip when __riscv_misaligned_fast, else align */ +#ifndef __riscv_misaligned_fast + /* process small data directly with vectors, no alignment optimization */ + li t3, 32 + bltu a2, t3, .Lforward_loop + +#if __riscv_xlen == 64 + andi t2, t0, 7 /* t2 = dst & 7 */ + beqz t2, .Lforward_aligned /* already aligned to 8 bytes */ + li t4, 8 + sub t2, t4, t2 /* pad = 8 - (dst & 7) */ +#else + andi t2, t0, 3 /* t2 = dst & 3 */ + beqz t2, .Lforward_aligned /* already aligned to 4 bytes */ + li t4, 4 + sub t2, t4, t2 /* pad = 4 - (dst & 3) */ +#endif + /* copy prologue using vectors */ + vsetvli t3, t2, e8, m8, ta, ma + vle8.v v0, (t1) + vse8.v v0, (t0) + add t0, t0, t3 + add t1, t1, t3 + sub a2, a2, t3 +.Lforward_aligned: +#endif +.Lforward_loop: + vsetvli t3, a2, e8, m8, ta, ma + vle8.v v0, (t1) + vse8.v v0, (t0) + add t0, t0, t3 + add t1, t1, t3 + sub a2, a2, t3 + bnez a2, .Lforward_loop + /* fallthrough */ + +.Ldone_move: + ret + .size memmove, .-memmove + .option pop #endif diff --git a/newlib/libc/machine/riscv/memmove.c b/newlib/libc/machine/riscv/memmove.c index 209a75c69..691774e2e 100644 --- a/newlib/libc/machine/riscv/memmove.c +++ b/newlib/libc/machine/riscv/memmove.c @@ -10,7 +10,7 @@ http://www.opensource.org/licenses. */ -#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) +#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) || defined(__riscv_vector) /* memmove defined in memmove-asm.S */ #else diff --git a/newlib/libc/machine/riscv/memset.S b/newlib/libc/machine/riscv/memset.S index 533f66758..000121e68 100644 --- a/newlib/libc/machine/riscv/memset.S +++ b/newlib/libc/machine/riscv/memset.S @@ -62,7 +62,25 @@ memset: .Ldone: ret +#elif defined(__riscv_vector) +.option push +.option arch, +zve32x + mv t0, a0 /* running dst; keep a0 as return */ + beqz a2, .Ldone_set /* n == 0 then return */ + + /* Broadcast fill byte once. */ + vsetvli t1, zero, e8, m8, ta, ma + vmv.v.x v0, a1 +.Lbulk_set: + vsetvli t1, a2, e8, m8, ta, ma /* t1 = vl (bytes) */ + vse8.v v0, (t0) + add t0, t0, t1 + sub a2, a2, t1 + bnez a2, .Lbulk_set +.Ldone_set: + ret + .option pop #else li REG_TABLE, BYTE_TBL_SZ mv a3, a0