From patchwork Wed Feb 1 09:52:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Lewis X-Patchwork-Id: 64074 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CD3BE3858039 for ; Wed, 1 Feb 2023 09:54:10 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by sourceware.org (Postfix) with ESMTPS id B6DE23858CDA for ; Wed, 1 Feb 2023 09:53:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B6DE23858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pl1-x631.google.com with SMTP id d3so17872123plr.10 for ; Wed, 01 Feb 2023 01:53:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZsVu40+Bbp7QXkR4moUmQ+8trpiwVkSO4ztXfu1qHRU=; b=0CfTcouuYfKR+fgplbNDZlpWCrabefXyuG9GBpFKT6VktPyEy96lTOTE9eh7AeF0vb 1NSTh4aaXZ3TzHVghNzWAoVxgRae9ibs9APGfUO87SdXFscDiLY2WZvWGiXuNI40hDMa IGAJpJ2mrlZBLQx37OIKdXYPowaDA1vT/u0CtCb9UIRJcEX68xgRnCiHO7PA2y+/j5E5 x/LlvAA1dZCQf/Ugp5AOsxY1D9pZn4iJBRV1I6C4oFTn6U1lyNmYTNecJfxLCC5l/zxv wcmKhJtGrkVduW+xLokPZWDee5aq293oQNdwJO33DnaZGQT8fiTSC1nywCsNzsXq8iI0 Znow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZsVu40+Bbp7QXkR4moUmQ+8trpiwVkSO4ztXfu1qHRU=; b=1SK2QcNn18cZpmXFnP6FB/IA8sgnwk8CEoCk0pQt4i8Gmxi6Va/Y1W7480kdYYD317 0mugbhPwH1e28bXMDvSg0jOXE1kT4HTGyW+D9o/QmbDPPbKoAJ+6X6yMou/m2yzD+TFG dtOs8GC9qxSmLZS0pv7nkaVDc/VHhq0baGvjmKe9B8KI/46lp0nwvacB5P5DyiyGiYYk 2O6cflpgSAV6q4n39ffS5MMGze0nXvm9yeWiRzXZzM7AN/MapqMuMOXhjB3gpWvDEtRN YJztUf35Ih+9yfYMR6gmtoTohU1UV8m5072OBZLHMDKUc3mY/N3JlIryMEZntkNwPwL3 4xvQ== X-Gm-Message-State: AO0yUKUFdamSwxZbgvxewv416uhV54jsxdpbvgfWbvAJ5WpzyYj+p47J aikZlsWqwAlM7IPsuIDZ/H6ZHBjoi0nCmHUE X-Google-Smtp-Source: AK7set8bJvm8Bscb61d/QBOWlSHMXTIxeq/4ebVqEjN/dYBRyLqFxmpU2Bgo3WbZY5YAMj0LN3wU4Q== X-Received: by 2002:a17:90b:1c83:b0:22c:7639:a910 with SMTP id oo3-20020a17090b1c8300b0022c7639a910mr1593094pjb.25.1675245225282; Wed, 01 Feb 2023 01:53:45 -0800 (PST) Received: from slewis-laptop.uk.rivosinc.com ([51.52.155.79]) by smtp.gmail.com with ESMTPSA id h2-20020a17090a470200b00223ed94759csm900985pjg.39.2023.02.01.01.53.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Feb 2023 01:53:44 -0800 (PST) From: Sergei Lewis To: libc-alpha@sourceware.org Cc: Sergei Lewis Subject: [PATCH 2/2] riscv: vectorised mem* and str* functions Date: Wed, 1 Feb 2023 09:52:32 +0000 Message-Id: <20230201095232.15942-2-slewis@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230201095232.15942-1-slewis@rivosinc.com> References: <20230201095232.15942-1-slewis@rivosinc.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Initial implementations of memchr, memcmp, memcpy, memmove, memset, strchr, strcmp, strcpy, strlen, strncmp, strncpy, strnlen, strrchr, strspn targeting the riscv "V" extension, version 1.0 The vectorised implementations assume VLENB of at least 128 and at least 32 registers (as mandated by the "V" extension spec). They also assume that VLENB is a power of two which is no larger than the page size, and (as vectorised code in glibc for other platforms does) that it is safe to read past null terminators / buffer ends provided one does not cross a page boundary. Signed-off-by: Sergei Lewis --- sysdeps/riscv/rv64/rvv/Implies | 2 + sysdeps/riscv/rv64/rvv/memchr.S | 127 +++++++++++++++++++ sysdeps/riscv/rv64/rvv/memcmp.S | 93 ++++++++++++++ sysdeps/riscv/rv64/rvv/memcpy.S | 154 +++++++++++++++++++++++ sysdeps/riscv/rv64/rvv/memmove.c | 22 ++++ sysdeps/riscv/rv64/rvv/memset.S | 89 ++++++++++++++ sysdeps/riscv/rv64/rvv/strchr.S | 92 ++++++++++++++ sysdeps/riscv/rv64/rvv/strchrnul.c | 22 ++++ sysdeps/riscv/rv64/rvv/strcmp.S | 108 +++++++++++++++++ sysdeps/riscv/rv64/rvv/strcpy.S | 72 +++++++++++ sysdeps/riscv/rv64/rvv/strcspn.c | 22 ++++ sysdeps/riscv/rv64/rvv/strlen.S | 67 ++++++++++ sysdeps/riscv/rv64/rvv/strncmp.S | 104 ++++++++++++++++ sysdeps/riscv/rv64/rvv/strncpy.S | 96 +++++++++++++++ sysdeps/riscv/rv64/rvv/strnlen.S | 81 +++++++++++++ sysdeps/riscv/rv64/rvv/strrchr.S | 88 ++++++++++++++ sysdeps/riscv/rv64/rvv/strspn.S | 189 +++++++++++++++++++++++++++++ 17 files changed, 1428 insertions(+) create mode 100644 sysdeps/riscv/rv64/rvv/Implies create mode 100644 sysdeps/riscv/rv64/rvv/memchr.S create mode 100644 sysdeps/riscv/rv64/rvv/memcmp.S create mode 100644 sysdeps/riscv/rv64/rvv/memcpy.S create mode 100644 sysdeps/riscv/rv64/rvv/memmove.c create mode 100644 sysdeps/riscv/rv64/rvv/memset.S create mode 100644 sysdeps/riscv/rv64/rvv/strchr.S create mode 100644 sysdeps/riscv/rv64/rvv/strchrnul.c create mode 100644 sysdeps/riscv/rv64/rvv/strcmp.S create mode 100644 sysdeps/riscv/rv64/rvv/strcpy.S create mode 100644 sysdeps/riscv/rv64/rvv/strcspn.c create mode 100644 sysdeps/riscv/rv64/rvv/strlen.S create mode 100644 sysdeps/riscv/rv64/rvv/strncmp.S create mode 100644 sysdeps/riscv/rv64/rvv/strncpy.S create mode 100644 sysdeps/riscv/rv64/rvv/strnlen.S create mode 100644 sysdeps/riscv/rv64/rvv/strrchr.S create mode 100644 sysdeps/riscv/rv64/rvv/strspn.S diff --git a/sysdeps/riscv/rv64/rvv/Implies b/sysdeps/riscv/rv64/rvv/Implies new file mode 100644 index 0000000000..b07b4cb906 --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/Implies @@ -0,0 +1,2 @@ +riscv/rv64/rvd + diff --git a/sysdeps/riscv/rv64/rvv/memchr.S b/sysdeps/riscv/rv64/rvv/memchr.S new file mode 100644 index 0000000000..a7e32b8f25 --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/memchr.S @@ -0,0 +1,127 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + + +/* Optimised memchr for riscv with vector extension + * Assumptions: + * - cpu becomes bandwidth limited at or before + * 2 vector register sized read/write operations + * + 2 scalar operations + * + conditional branch + */ + +.globl memchr +.type memchr,@function + +.align 2 +memchr: + beqz a2, .Lnot_found + csrr t1, vlenb + bgeu a2, t1, .Lvector_path /* only use vector path if we're scanning + at least vlenb bytes */ + +#ifndef __riscv_strict_align + li a3, 8 + blt a2, a3, .Lbytewise + + li t1, 0x0101010101010101 + slli a4, t1, 7 /* a4 = 0x8080808080808080 */ + mul t2, a1, t1 /* entirety of t2 is now repeats of target character; + assume mul is at worst no worse than 3*(shift+OR), + otherwise do that instead */ + +/* + * strategy: + * t4 = ((*a0) ^ t2) + * - now t4 contains zero bytes if and only if next word of memory + * had target character at those positions + * + * t4 = ((t4-0x0101010101010101) & ~t4) & 0x8080808080808080 + * - all nonzero bytes of t4 become 0; zero bytes become 0x80 + * + * if t4 is nonzero, find the index of the byte within it, add to a0 and return + * otherwise, loop + */ + +1: + ld t4, (a0) /* t4 = load next 8 bytes */ + xor t4, t4, t2 + sub t5, t4, t1 + not t4, t4 + and t4, t5, t4 + and t4, t4, a4 + bnez t4, .Lbytewise /* could use ctzw, mod+lookup or just binary chop + to locate byte of interest in t4 but profiling + shows these approaches are at best no better */ + addi a2, a2, -8 + addi a0, a0, 8 + bgeu a2, a3, 1b + beqz a2, .Lnot_found +#endif // __riscv_strict_align + +/* too little data for a dword. mask calculation and branch mispredict costs + make checking a word not worthwhile. degrade to bytewise search. */ + +.Lbytewise: + add t2, a0, a2 + +1: + lb t1, (a0) + beq t1, a1, .Lfound + addi a0, a0, 1 + blt a0, t2, 1b + +.Lnot_found: + mv a0, zero +.Lfound: + ret + +.Lvector_path: + vsetvli t2, a2, e8, m2, ta, ma + +1: + vle8.v v2, (a0) + vmseq.vx v0, v2, a1 + vfirst.m t3, v0 + bgez t3, .Lvec_found + add a0, a0, t2 + sub a2, a2, t2 + bge a2, t2, 1b + bnez a2, 2f + mv a0, zero + ret + +2: + vsetvli t2, a2, e8, m2, ta, ma + vle8.v v2, (a0) + vmseq.vx v0, v2, a1 + vfirst.m t3, v0 + bgez t3, .Lvec_found + mv a0, zero + ret + +.Lvec_found: + add a0, a0, t3 + ret + +.size memchr, .-memchr +libc_hidden_builtin_def (memchr) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/memcmp.S b/sysdeps/riscv/rv64/rvv/memcmp.S new file mode 100644 index 0000000000..a945753a5f --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/memcmp.S @@ -0,0 +1,93 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +/* Optimised memcmp for riscv with vector extension + */ + +.globl memcmp +.type memcmp,@function + +.align 2 + +memcmp: + mv t2, zero + beqz a2, .Ldone + + li t1, 5 /* scalar path cheaper for 1-4 elts */ + bltu a2, t1, .Lscalar + + /* main loop, vlenb*2 elts at a time */ + vsetvli t1, a2, e8, m2, ta, ma + +1: + vle8.v v2, (a0) /* load elts */ + vle8.v v4, (a1) + vmsne.vv v0, v2, v4 /* compare */ + vfirst.m t3, v0 + bgez t3, .Lvec_diff /* found a difference ? */ + add a0, a0, t1 /* not yet, advance everything */ + add a1, a1, t1 + sub a2, a2, t1 + bgeu a2, t1, 1b + + bnez a2, .Ltail + mv a0, zero + ret + +.Ltail: + /* handle tail. we know a2 < vlenb*2 so just load and compare the lot */ + vsetvli t1, a2, e8, m2, ta, ma + vle8.v v2, (a0) + vle8.v v4, (a1) + vmsne.vv v0, v2, v4 + vfirst.m t3, v0 + bgez t3, .Lvec_diff + mv a0, zero /* no diff found */ + ret + +.Lvec_diff: /* v2, v4 differ at elt t3 */ + add a0, a0, t3 + add a1, a1, t3 + lbu t0, (a0) + lbu t1, (a1) + sub a0, t0, t1 + ret + +.Lscalar: + add t3, a0, a2 + +1: + lbu t0, (a0) + lbu t1, (a1) + sub t2, t0, t1 + bnez t2, .Ldone + addi a0, a0, 1 + addi a1, a1, 1 + bltu a0, t3, 1b + +.Ldone: + mv a0, t2 + ret + + +.size memcmp, .-memcmp +libc_hidden_builtin_def (memcmp) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/memcpy.S b/sysdeps/riscv/rv64/rvv/memcpy.S new file mode 100644 index 0000000000..7b37ec285d --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/memcpy.S @@ -0,0 +1,154 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +/* Optimised memcpy and memmove for riscv with vector extension + */ + +.globl memcpy +.type memcpy,@function +.globl memmove +.type memmove,@function + +.align 2 +memmove: + bge a0, a1, .Lmemcpy_rev + +memcpy: +.Lmemcpy_fwd: + mv t0, a0 /* t0 = preserve a0 so we can return it */ + csrr t2, vlenb /* t2 = number of bytes per vectorised copy op */ + slli t5, t2, 1 /* t5 = number of bytes per loop */ + addi t3, t5, -1 /* generate mask */ + not t4, t3 + and t4, a2, t4 /* t4 = bytes copied in vectorised pass */ + + beqz t4, .Lscalar_fwd /* size too small for even one pass? */ + + and a2, a2, t3 /* a2 = bytes still left to copy after pass */ + add t4, t4, a1 /* t4 = src at end of vectorised pass */ + +1: + vl2r.v v2, (a1) /* load, advance source */ + add a1, a1, t5 + vs2r.v v2, (t0) /* store, advance dest */ + add t0, t0, t5 + bltu a1, t4, 1b /* src at end? */ + + bltu a2, t2, .Lscalar_fwd /* should we do one more vec load/store? */ + vl1r.v v2, (a1) + sub a2, a2, t2 + add a1, a1, t2 + vs1r.v v2, (t0) + add t0, t0, t2 + +.Lscalar_fwd: + bnez a2, .Lnobail +.Lbail: + ret +.Lnobail: + +#ifndef __riscv_strict_align + addi t2, zero, 4 + bltu a2, t2, .Lsingle_bytes +1: + lw t3, 0(a1) + addi a1, a1, 4 + sw t3, 0(t0) + addi t0, t0, 4 + addi a2, a2, -4 + bgeu a2, t2, 1b +#endif // __riscv_strict_align + +.Lsingle_bytes: + beqz a2, .Lbail + add a2, a2, a1 /* a2 = src + remaining size */ +1: + lb t1, 0(a1) + sb t1, 0(t0) + addi a1, a1, 1 + addi t0, t0, 1 + bltu a1, a2, 1b + ret +.size memcpy, .-memcpy + + +.Lmemcpy_rev: + beq a0, a1, .Lmemcpy_rev_done + add t0, a0, a2 /* t0 = dest so we can return a0=dest later */ + add t6, a1, a2 /* dest and src both point to byte */ + /* immediately after end of buffer */ + + csrr t2, vlenb /* t2 = number of bytes per pass */ + slli t5, t2, 1 /* t5 = number of bytes per entire loop */ + addi t3, t5, -1 /* t3 = (bytes per loop) mask */ + not t4, t3 /* generate mask for bytes processed by loop */ + and t4, a2, t4 /* t4 = bytes copied in vectorised pass */ + + beqz t4, .Lscalar_rev /* size too small for even one pass? */ + + and a2, a2, t3 /* a2 = bytes still left to copy after pass */ + sub t4, t6, t4 /* t4 = src at end of vectorised pass */ + +1: + sub t6, t6, t5 + sub t0, t0, t5 + vl2r.v v2, (t6) /* load, advance source */ + vs2r.v v2, (t0) /* store, advance dest */ + bgtu t6, t4, 1b /* src at end? */ + + bltu a2, t2, .Lscalar_rev /* should we do one more vec load/store? */ + sub t6, t6, t2 + sub t0, t0, t2 + sub a2, a2, t2 + vl1r.v v2, (t6) + vs1r.v v2, (t0) + +.Lscalar_rev: +#ifndef __riscv_strict_align + beqz a2, .Lbail + + addi t2, zero, 4 + bltu a2, t2, 2f +1: + addi t6, t6, -4 + addi t0, t0, -4 + addi a2, a2, -4 + lw t3, 0(t6) + sw t3, 0(t0) + bgeu a2, t2, 1b +2: +#endif // __riscv_strict_align + + beqz a2, .Lbail +1: + addi t6, t6, -1 + addi t0, t0, -1 + lb t1, 0(t6) + sb t1, 0(t0) + bgtu t0, a0, 1b + +.Lmemcpy_rev_done: + ret + +.size memmove, .-memmove +libc_hidden_builtin_def (memcpy) +libc_hidden_builtin_def (memmove) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/memmove.c b/sysdeps/riscv/rv64/rvv/memmove.c new file mode 100644 index 0000000000..47734854f9 --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/memmove.c @@ -0,0 +1,22 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +/* memmove is implemented in memcpy.S + */ diff --git a/sysdeps/riscv/rv64/rvv/memset.S b/sysdeps/riscv/rv64/rvv/memset.S new file mode 100644 index 0000000000..6f82c542b1 --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/memset.S @@ -0,0 +1,89 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + + +/* Optimised memset for riscv with vector extension + */ + +.globl memset +.type memset,@function + +.align 2 +memset: + mv t0, a0 /* t0 = dest so we can return a0 later */ + vsetvli t2, a2, e8, m2, ta, ma /* t2 = elts per copy */ + beqz t2, .Lscalar + + vmv.v.x v2, a1 /* splat value across v2 */ + + slli t3, t2, 1 + bgtu t3, a2, .Lsinglestore + +1: + vse8.v v2, (t0) + add t0, t0, t2 + vse8.v v2, (t0) + add t0, t0, t2 + sub a2, a2, t3 + bgeu a2, t3, 1b + bgeu a2, t2, .Lsinglestore + bnez a2, .Lscalar + +.Lbail: + ret + +.Lsinglestore: + bgtu t2, a2, .Lscalar + vse8.v v2, (t0) + add t0, t0, t2 + sub a2, a2, t2 + +.Lscalar: + beqz a2, .Lbail + +#ifndef __riscv_strict_align + slli t2, a1, 8 + or a1, a1, t2 + slli t2, a1, 16 + or a1, a1, t2 + + addi t2, zero, 4 + bltu a2, t2, 2f + +1: + sw a1, 0(t0) + addi t0, t0, 4 + addi a2, a2, -4 + bgeu a2, t2, 1b +2: + beqz a2, .Lbail +#endif // __riscv_strict_align + + add a2, a2, t0 +1: + sb a1, 0(t0) + addi t0, t0, 1 + bltu t0, a2, 1b + ret + +.size memset, .-memset +libc_hidden_builtin_def (memset) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strchr.S b/sysdeps/riscv/rv64/rvv/strchr.S new file mode 100644 index 0000000000..0b37174c55 --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strchr.S @@ -0,0 +1,92 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl strchr +.type strchr,@function + +.globl __strchrnul +.type __strchrnul,@function + +/* + * optimized strchr for riscv with vector extension + * assumptions: + * - vlenb is a power of 2 + * - page size >= 2*vlenb + */ + +.align 2 +__strchrnul: + li t5, -1 + j 1f + +strchr: + mv t5, zero +1: csrr t1, vlenb /* find vlenb*2 */ + add t1, t1, t1 + addi t2, t1, -1 /* mask off unaligned part of pointer */ + and t2, a0, t2 + beqz t2, .Laligned + + sub t2, t1, t2 /* search however many bytes + are needed to align the pointer */ + vsetvli t2, t2, e8, m2, ta, mu + + vle8.v v2, (a0) /* load data into v2(,v3) */ + vmseq.vx v4, v2, zero + vfirst.m t4, v4 + vmsbf.m v0, v4 + vmseq.vx v0, v2, a1, v0.t + vfirst.m t3, v0 + bgez t3, .Lfound + bgez t4, .Lbufferend + add a0, a0, t2 + +.Laligned: + vsetvli zero, t1, e8, m2, ta, mu + li t4, -1 + +1: + vle8.v v2, (a0) + vmseq.vx v4, v2, zero + vfirst.m t4, v4 + vmsbf.m v0, v4 + vmseq.vx v0, v2, a1, v0.t + vfirst.m t3, v0 + bgez t3, .Lfound + bgez t4, .Lbufferend + add a0, a0, t1 + j 1b + +.Lfound: /* found the target at a0+t3 */ + add a0, a0, t3 + ret + +.Lbufferend: + add a0, a0, t4 + and a0, a0, t5 + ret + +.size strchr, .-strchr +.size __strchrnul, .-__strchrnul + +libc_hidden_builtin_def (strchr) +weak_alias (__strchrnul, strchrnul) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strchrnul.c b/sysdeps/riscv/rv64/rvv/strchrnul.c new file mode 100644 index 0000000000..259da80358 --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strchrnul.c @@ -0,0 +1,22 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +/* strchrnul is implemented in strchr.S + */ diff --git a/sysdeps/riscv/rv64/rvv/strcmp.S b/sysdeps/riscv/rv64/rvv/strcmp.S new file mode 100644 index 0000000000..4a219221ac --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strcmp.S @@ -0,0 +1,108 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl strcmp +.type strcmp,@function + +.align 2 + +/* most of the time, one or both sides is unaligned and their alignments differ + * we need to check for a null terminator before crossing a page boundary + * strategy: + * - for each side, calculate masks for alignment and (vlenb * 2) - alignment + * - while no difference encountered: + * - for each side: + * - load bytes to end of next vlenb*2 block + * - check for null terminator + * - if no terminator, load bytes to fill rest of register + * - compare sides + */ + +strcmp: + csrr t1, vlenb /* find vlenb*2 */ + add t1, t1, t1 + vsetvli zero, t1, e8, m2, ta, mu + vid.v v30 + addi t2, t1, -1 /* mask for unaligned part of ptr */ + and t6, a0, t2 /* unaligned part of lhs */ + and t5, a1, t2 /* unaligned part of rhs */ + sub t6, t1, t6 /* safe number of lhs bytes to read */ + sub t5, t1, t5 /* same, rhs */ + vmsltu.vx v28, v30, t6 /* v28 = mask for first half of lhs load */ + vmsltu.vx v26, v30, t5 /* v26 = mask for first half of rhs load */ + vmv.v.x v16, zero + vmv.v.x v18, zero + +1: vmv.v.v v0, v28 /* lhs mask */ + vle8.v v2, (a0), v0.t /* masked load from lhs */ + vmseq.vx v16, v2, zero, v0.t /* check loaded bytes for null */ + vmv.v.v v0, v26 /* rhs mask */ + vfirst.m t2, v16 /* get lhs check result */ + bgez t2, .Ltail /* bail if we can't safely check rest */ + vle8.v v4, (a1), v0.t /* masked load from rhs */ + vmseq.vx v18, v4, zero, v0.t /* check partial rhs for null */ + vmnot.m v0, v28 /* mask for rest of lhs */ + vfirst.m t3, v18 /* get check result */ + bltz t3, 2f /* test it */ + /* we see null terminator */ + bge t3, t6, .Ltail /* have enough bytes for vector cmp? */ + + vmsleu.vx v0, v30, t3 /* select rest + null */ + vmsne.vv v0, v2, v4, v0.t /* compare */ + vfirst.m t3, v0 + bgez t3, 3f + mv a0, zero /* no difference */ + ret +3: add a0, a0, t3 + add a1, a1, t3 + lbu t0, (a0) + lbu t1, (a1) +.Ldiff: + sub a0, t0, t1 + ret + + /* ...no null terminator */ +2: vle8.v v2, (a0), v0.t /* load rest of lhs */ + vmnot.m v0, v26 /* mask for rest of rhs */ + vle8.v v4, (a1), v0.t /* load rest of rhs */ + vmsne.vv v0, v2, v4 /* compare */ + add a0, a0, t1 /* advance ptrs */ + vfirst.m t3, v0 + add a1, a1, t1 + bltz t3, 1b + + sub t3, t3, t1 /* found difference but we've already advanced a0 and a1 */ + j 3b + +.Ltail: + lbu t0, (a0) + lbu t1, (a1) + bne t0, t1, .Ldiff + addi a0, a0, 1 + addi a1, a1, 1 + bnez t0, .Ltail + mv a0, zero + ret + + +.size strcmp, .-strcmp +libc_hidden_builtin_def (strcmp) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strcpy.S b/sysdeps/riscv/rv64/rvv/strcpy.S new file mode 100644 index 0000000000..b21909d66f --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strcpy.S @@ -0,0 +1,72 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl strcpy +.type strcpy,@function + +/* + * optimized strcpy for riscv with vector extension + * assumptions: + * - vlenb is a power of 2 + * - page size >= 2*vlenb + */ + +.align 2 +strcpy: + mv t0, a0 /* copy dest so we can return it */ + + csrr t1, vlenb /* find vlenb*2 */ + add t1, t1, t1 + + addi t2, t1, -1 /* mask unaligned part of ptr */ + and t2, a1, t2 + beqz t2, .Laligned + + sub t2, t1, t2 /* search enough to align ptr */ + vsetvli t2, t2, e8, m2, tu, mu + vle8.v v2, (a1) + vmseq.vx v4, v2, zero + vmsif.m v0, v4 /* copy but not past null */ + vfirst.m t3, v4 + vse8.v v2, (t0), v0.t + bgez t3, .Ldone + add t0, t0, t2 + add a1, a1, t2 + +.Laligned: + vsetvli zero, t1, e8, m2, ta, mu /* now do 2*vlenb bytes per pass */ + +1: + vle8.v v2, (a1) + add a1, a1, t1 + vmseq.vx v4, v2, zero + vmsif.m v0, v4 + vfirst.m t3, v4 + vse8.v v2, (t0), v0.t + add t0, t0, t1 + bltz t3, 1b + +.Ldone: + ret + +.size strcpy, .-strcpy +libc_hidden_builtin_def (strcpy) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strcspn.c b/sysdeps/riscv/rv64/rvv/strcspn.c new file mode 100644 index 0000000000..f0595a72fb --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strcspn.c @@ -0,0 +1,22 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +/* strcspn is implemented in strspn.S + */ diff --git a/sysdeps/riscv/rv64/rvv/strlen.S b/sysdeps/riscv/rv64/rvv/strlen.S new file mode 100644 index 0000000000..c77d500693 --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strlen.S @@ -0,0 +1,67 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl strlen +.type strlen,@function + +/* + * optimized strlen for riscv with vector extension + * assumptions: + * - vlenb is a power of 2 + * - page size >= 2*vlenb + */ + +.align 2 +strlen: + mv t4, a0 /* copy of buffer start */ + csrr t1, vlenb /* find vlenb*2 */ + add t1, t1, t1 + addi t2, t1, -1 /* mask off unaligned part of ptr */ + and t2, a0, t2 + beqz t2, .Laligned + + sub t2, t1, t2 /* search fwd to align ptr */ + vsetvli t2, t2, e8, m2, ta, ma + vle8.v v2, (a0) + vmseq.vx v0, v2, zero + vfirst.m t3, v0 + bgez t3, .Lfound + add a0, a0, t2 + +.Laligned: + vsetvli zero, t1, e8, m2, ta, ma /* search 2*vlenb bytes per pass */ + add t4, t4, t1 + +1: + vle8.v v2, (a0) + add a0, a0, t1 + vmseq.vx v0, v2, zero + vfirst.m t3, v0 + bltz t3, 1b + +.Lfound: /* found the 0; subtract */ + sub a0, a0, t4 /* buffer start from current ptr */ + add a0, a0, t3 /* and add offset into fetched */ + ret /* data to get length */ + +.size strlen, .-strlen +libc_hidden_builtin_def (strlen) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strncmp.S b/sysdeps/riscv/rv64/rvv/strncmp.S new file mode 100644 index 0000000000..863e5cb525 --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strncmp.S @@ -0,0 +1,104 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl strncmp +.type strncmp,@function + +.align 2 + +/* as strcmp, but with added checks on a2 (max count) + */ + +strncmp: + csrr t1, vlenb /* find vlenb*2 */ + add t1, t1, t1 + blt a2, t1, .Ltail /* degrade if max < vlenb*2 */ + vsetvli zero, t1, e8, m2, ta, mu + vid.v v30 + addi t2, t1, -1 /* mask unaligned part of ptr */ + and t6, a0, t2 /* unaligned part of lhs */ + and t5, a1, t2 /* unaligned part of rhs */ + sub t6, t1, t6 /* safe count to read from lhs */ + sub t5, t1, t5 /* same, rhs */ + vmsltu.vx v28, v30, t6 /* mask for first part of lhs */ + vmsltu.vx v26, v30, t5 /* mask for first part of rhs */ + vmv.v.x v16, zero + vmv.v.x v18, zero + + +1: blt a2, t1, .Ltail + vmv.v.v v0, v28 /* lhs mask */ + vle8.v v2, (a0), v0.t /* masked load from lhs */ + vmseq.vx v16, v2, zero, v0.t /* check loaded bytes for null */ + vmv.v.v v0, v26 /* rhs mask */ + vfirst.m t2, v16 /* get lhs check result */ + bgez t2, .Ltail /* can we safely check rest */ + vle8.v v4, (a1), v0.t /* masked load from rhs */ + vmseq.vx v18, v4, zero, v0.t /* check partial rhs */ + vmnot.m v0, v28 /* mask for rest of lhs */ + vfirst.m t3, v18 /* get check result */ + bltz t3, 2f /* test it */ + bge t3, t6, .Ltail + + vmsleu.vx v0, v30, t3 /* select rest of string + null */ + vmsne.vv v0, v2, v4, v0.t /* compare */ + vfirst.m t3, v0 + bgez t3, 3f + mv a0, zero + ret +3: add a0, a0, t3 + add a1, a1, t3 + lbu t0, (a0) + lbu t1, (a1) +.Ldiff: + sub a0, t0, t1 + ret + + /* ...no null terminator in first part of lhs or rhs */ +2: vle8.v v2, (a0), v0.t /* load rest of lhs */ + vmnot.m v0, v26 /* mask for rest of rhs */ + vle8.v v4, (a1), v0.t /* load rest of rhs */ + vmsne.vv v0, v2, v4 /* compare */ + add a0, a0, t1 /* advance ptrs */ + vfirst.m t3, v0 + add a1, a1, t1 + sub a2, a2, t1 + bltz t3, 1b + + sub t3, t3, t1 /* found a diff but we've already advanced a0 and a1 */ + j 3b + +.Ltail: + beqz a2, 1f + addi a2, a2, -1 + lbu t0, (a0) + lbu t1, (a1) + bne t0, t1, .Ldiff + addi a0, a0, 1 + addi a1, a1, 1 + bnez t0, .Ltail +1: mv a0, zero + ret + + +.size strncmp, .-strncmp +libc_hidden_builtin_def (strncmp) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strncpy.S b/sysdeps/riscv/rv64/rvv/strncpy.S new file mode 100644 index 0000000000..8b3a1e545c --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strncpy.S @@ -0,0 +1,96 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl strncpy +.type strncpy,@function + +/* + * optimized strcpy for riscv with vector extension + * assumptions: + * - vlenb is a power of 2 + * - page size >= 2*vlenb + */ + +.align 2 +strncpy: + mv t0, a0 /* need to return dest so copy */ + + csrr t1, vlenb /* find vlenb*2 */ + add t1, t1, t1 + + addi t2, t1, -1 /* mask off unaligned part of ptr */ + and t2, a1, t2 + beqz t2, .Laligned + + sub t2, t1, t2 /* search to align the pointer */ + vsetvli zero, t2, e8, m2, tu, mu + vle8.v v2, (a1) + vmseq.vx v4, v2, zero + vmsif.m v0, v4 /* copy to dest */ + vfirst.m t3, v4 + bgeu t2, a2, .Ldest_full + vse8.v v2, (t0), v0.t + bgez t3, .Lterminator_found + add t0, t0, t2 + add a1, a1, t2 + sub a2, a2, t2 + beqz a2, .Ldone + +.Laligned: + vsetvli zero, t1, e8, m2, ta, mu /* now do 2*vlenb bytes per pass */ + +1: + vle8.v v2, (a1) + add a1, a1, t1 + vmseq.vx v4, v2, zero + vmsif.m v0, v4 + vfirst.m t3, v4 + bgeu t1, a2, .Ldest_full + vse8.v v2, (t0), v0.t + add t0, t0, t1 + sub a2, a2, t1 + bltz t3, 1b + sub t0, t0, t1 + +.Lterminator_found: + addi sp, sp, -16 + sd ra, 0(sp) + sd a0, 8(sp) + add a0, t0, t3 + mv a1, zero + sub a2, a2, t3 + jal ra, memset + ld ra, 0(sp) + ld a0, 8(sp) + addi sp, sp, 16 +.Ldone: + ret + +.Ldest_full: + vid.v v6 + vmsltu.vx v4, v6, a2 + vmand.mm v0, v0, v4 + vse8.v v2, (t0), v0.t + ret + +.size strncpy, .-strncpy +libc_hidden_builtin_def (strncpy) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strnlen.S b/sysdeps/riscv/rv64/rvv/strnlen.S new file mode 100644 index 0000000000..6d7ee65c7a --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strnlen.S @@ -0,0 +1,81 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl __strnlen +.type __strnlen,@function + +/* vector optimized strnlen + * assume it's safe to read to the end of the page + * containing either a null terminator or the last byte of the count or both, + * but not past it + * assume page size >= vlenb*2 + */ + +.align 2 +__strnlen: + mv t4, a0 /* stash a copy of start for later */ + beqz a1, .LzeroCount + + csrr t1, vlenb /* find vlenb*2 */ + add t1, t1, t1 + addi t2, t1, -1 /* mask off unaligned part of ptr */ + and t2, a1, a0 + beqz t2, .Laligned + + sub t2, t1, t2 /* search to align pointer to t1 */ + bgeu t2, a1, 2f /* check it's safe */ + mv t2, a1 /* it's not! look as far as permitted */ +2: vsetvli t2, t2, e8, m2, ta, ma + vle8.v v2, (a0) + vmseq.vx v0, v2, zero + vfirst.m t3, v0 + bgez t3, .Lfound + add a0, a0, t2 + sub a1, a1, t2 + bltu a1, t1, .LreachedCount + +.Laligned: + vsetvli zero, t1, e8, m2, ta, ma /* do 2*vlenb bytes per pass */ + +1: vle8.v v2, (a0) + sub a1, a1, t1 + vmseq.vx v0, v2, zero + vfirst.m t3, v0 + bgez t3, .Lfound + add a0, a0, t1 + bgeu a1, t1, 1b +.LreachedCount: + mv t2, a1 /* in case 0 < a1 < t1 */ + bnez a1, 2b /* if so, still t2 bytes to check, all safe */ +.LzeroCount: + sub a0, a0, t4 + ret + +.Lfound: /* found the 0; subtract buffer start from current pointer */ + add a0, a0, t3 /* and add offset into fetched data */ + sub a0, a0, t4 + ret + +.size __strnlen, .-__strnlen +weak_alias (__strnlen, strnlen) +libc_hidden_builtin_def (__strnlen) +libc_hidden_builtin_def (strnlen) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strrchr.S b/sysdeps/riscv/rv64/rvv/strrchr.S new file mode 100644 index 0000000000..4bef8a3b9c --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strrchr.S @@ -0,0 +1,88 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl strrchr +.type strrchr,@function + +/* + * optimized strrchr for riscv with vector extension + * assumptions: + * - vlenb is a power of 2 + * - page size >= 2*vlenb + */ + +.align 2 +strrchr: + mv t5, a0 /* stash buffer ptr somewhere safe */ + mv a0, zero /* result is nullptr unless we find better below */ + + csrr t1, vlenb /* determine vlenb*2 */ + add t1, t1, t1 + addi t2, t1, -1 /* mask off unaligned part of ptr */ + and t2, t5, t2 + beqz t2, .Laligned + + sub t2, t1, t2 /* search to align ptr to 2*vlenb */ + vsetvli t2, t2, e8, m2, ta, mu + + vle8.v v2, (t5) /* load data into v2(,v3) */ + vmseq.vx v4, v2, zero /* check for null terminator */ + vfirst.m t4, v4 /* grab its position, if any */ + vmsbf.m v0, v4 /* select valid chars */ + vmseq.vx v0, v2, a1, v0.t /* search for candidate byte */ + vfirst.m t3, v0 /* grab its position, if any */ + bltz t3, 2f /* did we find a candidate? */ + +3: add a0, t3, t5 /* we did! grab the address */ + vmsof.m v1, v0 /* there might be more than one */ + vmandn.mm v0, v0, v1 /* so clear the one we just found */ + vfirst.m t3, v0 /* is there another? */ + bgez t3, 3b + +2: bgez t4, .Ldone /* did we see a null terminator? */ + add t5, t5, t2 + +.Laligned: + vsetvli zero, t1, e8, m2, ta, mu /* now do 2*vlenb bytes per pass */ + +1: vle8.v v2, (t5) + vmseq.vx v4, v2, zero + vfirst.m t4, v4 + vmsbf.m v0, v4 + vmseq.vx v0, v2, a1, v0.t + vfirst.m t3, v0 + bltz t3, 2f + +3: add a0, t3, t5 + vmsof.m v1, v0 + vmandn.mm v0, v0, v1 + vfirst.m t3, v0 + bgez t3, 3b + +2: add t5, t5, t1 + bltz t4, 1b + +.Ldone: + ret + +.size strrchr, .-strrchr +libc_hidden_builtin_def (strrchr) \ No newline at end of file diff --git a/sysdeps/riscv/rv64/rvv/strspn.S b/sysdeps/riscv/rv64/rvv/strspn.S new file mode 100644 index 0000000000..2b9af5cc2d --- /dev/null +++ b/sysdeps/riscv/rv64/rvv/strspn.S @@ -0,0 +1,189 @@ + +/* Copyright (C) 2012-2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + + +#include + +.globl strspn +.type strspn,@function + +.globl strcspn +.type strcspn,@function + +/* + * optimized strspn / strcspn for riscv with vector extension + * assumptions: + * - vlenb is a power of 2 + * - page size >= 32 + * strategy: + * - build a 256-bit table on the stack, where each elt is zero + * if encountering it should terminate computation and nonzero otherwise + * - use vectorised lookups into this to check 2*vlen elts at a time; + * this code is identical for strspan and strcspan and can be shared + * + * note that while V mandates at least 128 bit wide regs, + * we are building a 256 bit lookup table + * therefore we use either LMUL=1 or 2 depending on what the target supports + * therefore we only use even vector register numbers, + * so everything still works if we go with LMUL=2 + */ + +# ----------------------------- + +.align 2 + +strspn: + lbu t0, 0(a1) + bnez t0, .Lbuild_table + mv a0, zero + ret + +.Lbuild_table: + mv a6, a0 /* store incoming a0 */ + li t1, 32 /* want to deal with 256 bits at a time, so 32 bytes */ + + vsetvli zero, t1, e8, m1, tu, mu +#if __riscv_v_min_vlen < 256 + /* we want to build a 256-bit table, so use vlenb*2, + * m2 if regs are 128 bits wide or vlenb, m1 if >= 256 + * 'V' extension specifies a minimum vlen of 128 so this should cover + * all cases; we can skip the check if we know vlen >= 256 at compile time + */ + csrr t2, vlenb + bgeu t2, t1, 1f + vsetvli zero, t1, e8, m2, tu, mu +1: +#endif // __riscv_v_min_vlen + + /* read one char from the charset at a time and write the correct bit + * in the lookup table; we could do SIMD iff we ever get an extension + * that provides some way of scattering bytes into a reg group + */ + vmv.v.x v16, zero /* clear out table */ + vmv.v.x v8, zero /* clear out v8 */ + li t3, 1 + vmv.s.x v8, t3 /* v8 now all zeroes except bottom byte */ + +1: vmv.v.x v2, zero /* clear out v2 */ + addi a1, a1, 1 /* advance charset ptr */ + srli t2, t0, 3 /* divide the byte we read earlier by 8 */ + vslideup.vx v2, v8, t2 /* v2 now 1 in the correct byte 0 elsewhere */ + vsll.vx v2, v2, t0 /* v2 now 1 in the correct bit, 0 elsewhere */ + vor.vv v16, v16, v2 /* or it in */ + lbu t0, 0(a1) /* fetch next bute */ + bnez t0, 1b /* if it's null, go round again */ + +/* + * Table is now built in v16. + * Strategy: + * - fetch next t1 bytes from memory + * - vrgather on their values divided by 8 to get relevant bytes of table + * - shift right to get the correct bit into bit 1 + * - and with 1, compare with expected terminator value, then check mask + * to see if we've found a terminator + * + * Before we can begin, a0 needs to be t1-aligned, so that when we fetch + * the next t1 bytes - any of which may be the null terminator - + * we do not cross a page boundary and read unmapped memory. Therefore + * we have one read of however many bytes are needed to align a0, + * before the main loop. + */ + +.Lscan_table: + vmv.v.x v8, t3 /* v8 now t1 bytes of 0x01 */ + + and t2, a0, t1 /* mask to align to t1 */ + beqz t2, 2f /* or skip if we're already aligned */ + sub t2, t1, t2 /* t2 now bytes to read to align to t1 */ + + vid.v v2 /* build mask instead of changing vl */ + vmsltu.vx v0, v2, t2 /* so we don't need to track LMUL */ + + vle8.v v2, (a0), v0.t /* load next bytes from input */ + vsrl.vi v4, v2, 3 /* divide by 8 */ + vrgather.vv v6, v16, v4 /* corresponding bytes of bit table */ + vsrl.vv v6, v6, v2 /* shift correct bits to lsb */ + vand.vv v6, v6, v8 /* and with 1 to complete the lookups */ + vmseq.vx v4, v6, zero, v0.t /* check to see if any 0s are present */ + vfirst.m t0, v4 /* index of the first 0, if any */ + bgez t0, .Lscan_end /* if we found one, stop */ + add a0, a0, t2 /* advance by number of bytes we read */ + +2: add a6, a6, t1 /* we'll advance a0 before the exit check */ +1: vle8.v v2, (a0) /* as above but unmasked so t1 elts per pass */ + add a0, a0, t1 + + vsrl.vi v4, v2, 3 + vrgather.vv v6, v16, v4 + vsrl.vv v6, v6, v2 + vand.vv v6, v6, v8 + + vmseq.vx v4, v6, zero + vfirst.m t0, v4 + bltz t0, 1b + +.Lscan_end: + add a0, a0, t0 /* calculate offset to terminating byte */ + sub a0, a0, a6 + ret +.size strspn, .-strspn + +/* strcspn + * + * table build exactly as for strspn, except: + * - the lookup table starts with all bits except bit 0 of byte 0 set + * - we clear the corresponding bit for each byte in the charset + * once table is built, we can reuse the scan code directly + */ + +strcspn: + lbu t0, 0(a1) + beqz t0, strlen /* no rejections -> prefix is whole string */ + + mv a6, a0 + li t1, 32 + + vsetvli zero, t1, e8, m1, tu, mu +#if __riscv_v_min_vlen < 256 + csrr t2, vlenb + bgeu t2, t1, 1f + vsetvli zero, t1, e8, m2, tu, mu +1: +#endif // __riscv_v_min_vlen + + vmv.v.x v8, zero + li t3, 1 /* all bits clear except bit 0 of byte 0 */ + vmv.s.x v8, t3 + vnot.v v16, v8 /* v16 is the inverse of that */ + li t4, -1 + +1: vmv.v.x v2, zero + addi a1, a1, 1 /* advance charset ptr */ + srli t2, t0, 3 /* select correct bit in v2 */ + vslideup.vx v2, v8, t2 + vsll.vx v2, v2, t0 + vnot.v v2, v2 /* invert */ + vand.vv v16, v16, v2 /* clear the relevant bit of table */ + lbu t0, 0(a1) + bnez t0, 1b + j .Lscan_table +.size strcspn, .-strcspn + +libc_hidden_builtin_def (strspn) +libc_hidden_builtin_def (strcspn) \ No newline at end of file