From patchwork Fri Sep 8 09:33:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 75522 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ACE233852763 for ; Fri, 8 Sep 2023 09:34:22 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 211E53858D35 for ; Fri, 8 Sep 2023 09:34:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 211E53858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8AxJuiJ6vpk_xIiAA--.8711S3; Fri, 08 Sep 2023 17:34:01 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxzM6H6vpkfDJyAA--.10780S3; Fri, 08 Sep 2023 17:34:00 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Subject: [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} Date: Fri, 8 Sep 2023 17:33:54 +0800 Message-Id: <20230908093357.3119822-2-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230908093357.3119822-1-dengjianbo@loongson.cn> References: <20230908093357.3119822-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxzM6H6vpkfDJyAA--.10780S3 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj9fXoW3Cr1UGryxJw43XFyfCw1xJFc_yoW8CF4xZo WSyF4qqws7K34DJFZ7Cwn8X3s7WrW3Kr1jq3yFva1rta40kw4xtryrZF1a9FZrGrZ5CF4k Xas7ZFsxCrZrKFn7l-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY47kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r1I6r4UM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r1j6r4UM28EF7xvwVC2z280aVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVCY1x0267AK xVWxJr0_GcWle2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44I27w Aqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE 14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCF04k20xvY0x 0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwCFI7km07C267AKxVWUXVWUAwC20s026c02 F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw 1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7Cj xVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r 1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU8r9 N3UUUUU== X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org According to glibc strcpy microbenchmark test results(changed to use generic_strcpy instead of strlen + memcpy), comparing with generic_strcpy, this implementation could reduce the runtime as following: Name Percent of rutime reduced strcpy-aligned 10%-45% strcpy-unaligned 10%-49%, comparing with the aligned version,unaligned version experience better performance in case src and dest cannot be both aligned with 8bytes strcpy-lsx 20%-80% strcpy-lasx 15%-86% --- sysdeps/loongarch/lp64/multiarch/Makefile | 4 + .../lp64/multiarch/ifunc-impl-list.c | 9 + .../loongarch/lp64/multiarch/strcpy-aligned.S | 185 ++++++++++++++++ .../loongarch/lp64/multiarch/strcpy-lasx.S | 208 ++++++++++++++++++ sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S | 197 +++++++++++++++++ .../lp64/multiarch/strcpy-unaligned.S | 131 +++++++++++ sysdeps/loongarch/lp64/multiarch/strcpy.c | 35 +++ 7 files changed, 769 insertions(+) create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy.c diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile index 360a6718c0..f05685ceec 100644 --- a/sysdeps/loongarch/lp64/multiarch/Makefile +++ b/sysdeps/loongarch/lp64/multiarch/Makefile @@ -16,6 +16,10 @@ sysdep_routines += \ strcmp-lsx \ strncmp-aligned \ strncmp-lsx \ + strcpy-aligned \ + strcpy-unaligned \ + strcpy-lsx \ + strcpy-lasx \ memcpy-aligned \ memcpy-unaligned \ memmove-unaligned \ diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c index e397d58c9d..b556bacbd1 100644 --- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c @@ -76,6 +76,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_aligned) ) + IFUNC_IMPL (i, name, strcpy, +#if !defined __loongarch_soft_float + IFUNC_IMPL_ADD (array, i, strcpy, SUPPORT_LASX, __strcpy_lasx) + IFUNC_IMPL_ADD (array, i, strcpy, SUPPORT_LSX, __strcpy_lsx) +#endif + IFUNC_IMPL_ADD (array, i, strcpy, SUPPORT_UAL, __strcpy_unaligned) + IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_aligned) + ) + IFUNC_IMPL (i, name, memcpy, #if !defined __loongarch_soft_float IFUNC_IMPL_ADD (array, i, memcpy, SUPPORT_LASX, __memcpy_lasx) diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S b/sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S new file mode 100644 index 0000000000..d5926e5e11 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S @@ -0,0 +1,185 @@ +/* Optimized strcpy aligned implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) +# define STRCPY __strcpy_aligned +#else +# define STRCPY strcpy +#endif + +LEAF(STRCPY, 6) + andi a3, a0, 0x7 + move a2, a0 + beqz a3, L(dest_align) + sub.d a5, a1, a3 + addi.d a5, a5, 8 + +L(make_dest_align): + ld.b t0, a1, 0 + addi.d a1, a1, 1 + st.b t0, a2, 0 + beqz t0, L(al_out) + + addi.d a2, a2, 1 + bne a1, a5, L(make_dest_align) + +L(dest_align): + andi a4, a1, 7 + bstrins.d a1, zero, 2, 0 + + lu12i.w t5, 0x1010 + ld.d t0, a1, 0 + ori t5, t5, 0x101 + bstrins.d t5, t5, 63, 32 + + slli.d t6, t5, 0x7 + bnez a4, L(unalign) + sub.d t1, t0, t5 + andn t2, t6, t0 + + and t3, t1, t2 + bnez t3, L(al_end) + +L(al_loop): + st.d t0, a2, 0 + ld.d t0, a1, 8 + + addi.d a1, a1, 8 + addi.d a2, a2, 8 + sub.d t1, t0, t5 + andn t2, t6, t0 + + and t3, t1, t2 + beqz t3, L(al_loop) + +L(al_end): + ctz.d t1, t3 + srli.d t1, t1, 3 + addi.d t1, t1, 1 + + andi a3, t1, 8 + andi a4, t1, 4 + andi a5, t1, 2 + andi a6, t1, 1 + +L(al_end_8): + beqz a3, L(al_end_4) + st.d t0, a2, 0 + jr ra +L(al_end_4): + beqz a4, L(al_end_2) + st.w t0, a2, 0 + addi.d a2, a2, 4 + srli.d t0, t0, 32 +L(al_end_2): + beqz a5, L(al_end_1) + st.h t0, a2, 0 + addi.d a2, a2, 2 + srli.d t0, t0, 16 +L(al_end_1): + beqz a6, L(al_out) + st.b t0, a2, 0 +L(al_out): + jr ra + +L(unalign): + slli.d a5, a4, 3 + li.d t1, -1 + sub.d a6, zero, a5 + + srl.d a7, t0, a5 + sll.d t7, t1, a6 + + or t0, a7, t7 + sub.d t1, t0, t5 + andn t2, t6, t0 + and t3, t1, t2 + + bnez t3, L(un_end) + + ld.d t4, a1, 8 + + sub.d t1, t4, t5 + andn t2, t6, t4 + sll.d t0, t4, a6 + and t3, t1, t2 + + or t0, t0, a7 + bnez t3, L(un_end_with_remaining) + +L(un_loop): + srl.d a7, t4, a5 + + ld.d t4, a1, 16 + addi.d a1, a1, 8 + + st.d t0, a2, 0 + addi.d a2, a2, 8 + + sub.d t1, t4, t5 + andn t2, t6, t4 + sll.d t0, t4, a6 + and t3, t1, t2 + + or t0, t0, a7 + beqz t3, L(un_loop) + +L(un_end_with_remaining): + ctz.d t1, t3 + srli.d t1, t1, 3 + addi.d t1, t1, 1 + sub.d t1, t1, a4 + + blt t1, zero, L(un_end_less_8) + st.d t0, a2, 0 + addi.d a2, a2, 8 + beqz t1, L(un_out) + srl.d t0, t4, a5 + b L(un_end_less_8) + +L(un_end): + ctz.d t1, t3 + srli.d t1, t1, 3 + addi.d t1, t1, 1 + +L(un_end_less_8): + andi a4, t1, 4 + andi a5, t1, 2 + andi a6, t1, 1 +L(un_end_4): + beqz a4, L(un_end_2) + st.w t0, a2, 0 + addi.d a2, a2, 4 + srli.d t0, t0, 32 +L(un_end_2): + beqz a5, L(un_end_1) + st.h t0, a2, 0 + addi.d a2, a2, 2 + srli.d t0, t0, 16 +L(un_end_1): + beqz a6, L(un_out) + st.b t0, a2, 0 +L(un_out): + jr ra +END(STRCPY) + +libc_hidden_builtin_def (STRCPY) diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S b/sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S new file mode 100644 index 0000000000..d928db5b91 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S @@ -0,0 +1,208 @@ +/* Optimized strcpy implementation using LoongArch LASX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +#define STRCPY __strcpy_lasx + +LEAF(STRCPY, 6) + ori t8, zero, 0xfe0 + andi t0, a1, 0xfff + li.d t7, -1 + move a2, a0 + + bltu t8, t0, L(page_cross_start) +L(start_entry): + xvld xr0, a1, 0 + li.d t0, 32 + andi t1, a2, 0x1f + + xvsetanyeqz.b fcc0, xr0 + sub.d t0, t0, t1 + bcnez fcc0, L(end) + add.d a1, a1, t0 + + xvst xr0, a2, 0 + andi a3, a1, 0x1f + add.d a2, a2, t0 + bnez a3, L(unaligned) + + + xvld xr0, a1, 0 + xvsetanyeqz.b fcc0, xr0 + bcnez fcc0, L(al_end) +L(al_loop): + xvst xr0, a2, 0 + + xvld xr0, a1, 32 + addi.d a2, a2, 32 + addi.d a1, a1, 32 + xvsetanyeqz.b fcc0, xr0 + + bceqz fcc0, L(al_loop) +L(al_end): + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + + movfr2gr.s t0, fa0 + cto.w t0, t0 + add.d a1, a1, t0 + xvld xr0, a1, -31 + + + add.d a2, a2, t0 + xvst xr0, a2, -31 + jr ra + nop + +L(page_cross_start): + move a4, a1 + bstrins.d a4, zero, 4, 0 + xvld xr0, a4, 0 + xvmsknz.b xr0, xr0 + + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + sra.w t0, t0, a1 + + beq t0, t7, L(start_entry) + b L(tail) +L(unaligned): + andi t0, a1, 0xfff + bltu t8, t0, L(un_page_cross) + + +L(un_start_entry): + xvld xr0, a1, 0 + xvsetanyeqz.b fcc0, xr0 + bcnez fcc0, L(un_end) + addi.d a1, a1, 32 + +L(un_loop): + xvst xr0, a2, 0 + andi t0, a1, 0xfff + addi.d a2, a2, 32 + bltu t8, t0, L(page_cross_loop) + +L(un_loop_entry): + xvld xr0, a1, 0 + addi.d a1, a1, 32 + xvsetanyeqz.b fcc0, xr0 + bceqz fcc0, L(un_loop) + + addi.d a1, a1, -32 +L(un_end): + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + + + movfr2gr.s t0, fa0 +L(un_tail): + cto.w t0, t0 + add.d a1, a1, t0 + xvld xr0, a1, -31 + + add.d a2, a2, t0 + xvst xr0, a2, -31 + jr ra +L(un_page_cross): + sub.d a4, a1, a3 + + xvld xr0, a4, 0 + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + + movfr2gr.s t0, fa0 + sra.w t0, t0, a1 + beq t0, t7, L(un_start_entry) + b L(un_tail) + + +L(page_cross_loop): + sub.d a4, a1, a3 + xvld xr0, a4, 0 + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + sra.w t0, t0, a1 + beq t0, t7, L(un_loop_entry) + + b L(un_tail) +L(end): + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + + movfr2gr.s t0, fa0 +L(tail): + cto.w t0, t0 + add.d a4, a2, t0 + add.d a5, a1, t0 + +L(less_32): + srli.d t1, t0, 4 + beqz t1, L(less_16) + vld vr0, a1, 0 + vld vr1, a5, -15 + + vst vr0, a2, 0 + vst vr1, a4, -15 + jr ra +L(less_16): + srli.d t1, t0, 3 + + beqz t1, L(less_8) + ld.d t2, a1, 0 + ld.d t3, a5, -7 + st.d t2, a2, 0 + + st.d t3, a4, -7 + jr ra +L(less_8): + li.d t1, 3 + bltu t0, t1, L(less_4) + + ld.w t2, a1, 0 + ld.w t3, a5, -3 + st.w t2, a2, 0 + st.w t3, a4, -3 + + jr ra +L(less_4): + srli.d t1, t0, 2 + bgeu t1, t0, L(zero_byte) + ld.h t2, a1, 0 + + st.h t2, a2, 0 +L(zero_byte): + st.b zero, a4, 0 + jr ra +END(STRCPY) + +libc_hidden_builtin_def (STRCPY) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S b/sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S new file mode 100644 index 0000000000..7a17af12a3 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S @@ -0,0 +1,197 @@ +/* Optimized strcpy implementation using LoongArch LSX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define STRCPY __strcpy_lsx + +LEAF(STRCPY, 6) + pcalau12i t0, %pc_hi20(L(INDEX)) + andi a4, a1, 0xf + vld vr1, t0, %pc_lo12(L(INDEX)) + move a2, a0 + + beqz a4, L(load_start) + xor t0, a1, a4 + vld vr0, t0, 0 + vreplgr2vr.b vr2, a4 + + vadd.b vr2, vr2, vr1 + vshuf.b vr0, vr2, vr0, vr2 + vsetanyeqz.b fcc0, vr0 + bcnez fcc0, L(end) + +L(load_start): + vld vr0, a1, 0 + li.d t1, 16 + andi a3, a2, 0xf + vsetanyeqz.b fcc0, vr0 + + + sub.d t0, t1, a3 + bcnez fcc0, L(end) + add.d a1, a1, t0 + vst vr0, a2, 0 + + andi a3, a1, 0xf + add.d a2, a2, t0 + bnez a3, L(unaligned) + vld vr0, a1, 0 + + vsetanyeqz.b fcc0, vr0 + bcnez fcc0, L(al_end) +L(al_loop): + vst vr0, a2, 0 + vld vr0, a1, 16 + + addi.d a2, a2, 16 + addi.d a1, a1, 16 + vsetanyeqz.b fcc0, vr0 + bceqz fcc0, L(al_loop) + + +L(al_end): + vmsknz.b vr1, vr0 + movfr2gr.s t0, fa1 + cto.w t0, t0 + add.d a1, a1, t0 + + vld vr0, a1, -15 + add.d a2, a2, t0 + vst vr0, a2, -15 + jr ra + +L(end): + vmsknz.b vr1, vr0 + movfr2gr.s t0, fa1 + cto.w t0, t0 + addi.d t0, t0, 1 + +L(end_16): + andi t1, t0, 16 + beqz t1, L(end_8) + vst vr0, a2, 0 + jr ra + + +L(end_8): + andi t2, t0, 8 + andi t3, t0, 4 + andi t4, t0, 2 + andi t5, t0, 1 + + beqz t2, L(end_4) + vstelm.d vr0, a2, 0, 0 + addi.d a2, a2, 8 + vbsrl.v vr0, vr0, 8 + +L(end_4): + beqz t3, L(end_2) + vstelm.w vr0, a2, 0, 0 + addi.d a2, a2, 4 + vbsrl.v vr0, vr0, 4 + +L(end_2): + beqz t4, L(end_1) + vstelm.h vr0, a2, 0, 0 + addi.d a2, a2, 2 + vbsrl.v vr0, vr0, 2 + + +L(end_1): + beqz t5, L(out) + vstelm.b vr0, a2, 0, 0 +L(out): + jr ra + nop + +L(unaligned): + bstrins.d a1, zero, 3, 0 + vld vr2, a1, 0 + vreplgr2vr.b vr3, a3 + vslt.b vr4, vr1, vr3 + + vor.v vr0, vr2, vr4 + vsetanyeqz.b fcc0, vr0 + bcnez fcc0, L(un_first_end) + vld vr0, a1, 16 + + vadd.b vr3, vr3, vr1 + vshuf.b vr4, vr0, vr2, vr3 + vsetanyeqz.b fcc0, vr0 + bcnez fcc0, L(un_end) + + + vor.v vr2, vr0, vr0 + addi.d a1, a1, 16 +L(un_loop): + vld vr0, a1, 16 + vst vr4, a2, 0 + + addi.d a2, a2, 16 + vshuf.b vr4, vr0, vr2, vr3 + vsetanyeqz.b fcc0, vr0 + bcnez fcc0, L(un_end) + + vld vr2, a1, 32 + vst vr4, a2, 0 + addi.d a1, a1, 32 + addi.d a2, a2, 16 + + vshuf.b vr4, vr2, vr0, vr3 + vsetanyeqz.b fcc0, vr2 + bceqz fcc0, L(un_loop) + vor.v vr0, vr2, vr2 + + + addi.d a1, a1, -16 +L(un_end): + vsetanyeqz.b fcc0, vr4 + bcnez fcc0, 1f + vst vr4, a2, 0 + +1: + vmsknz.b vr1, vr0 + movfr2gr.s t0, fa1 + cto.w t0, t0 + add.d a1, a1, t0 + + vld vr0, a1, 1 + add.d a2, a2, t0 + sub.d a2, a2, a3 + vst vr0, a2, 1 + + jr ra +L(un_first_end): + addi.d a2, a2, -16 + addi.d a1, a1, -16 + b 1b +END(STRCPY) + + .section .rodata.cst16,"M",@progbits,16 + .align 4 +L(INDEX): + .dword 0x0706050403020100 + .dword 0x0f0e0d0c0b0a0908 + +libc_hidden_builtin_def (STRCPY) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S b/sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S new file mode 100644 index 0000000000..12e79f2ac0 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S @@ -0,0 +1,131 @@ +/* Optimized strcpy unaligned implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) + +# define STRCPY __strcpy_unaligned + +LEAF(STRCPY, 4) + move t8, a0 + lu12i.w t5, 0x01010 + lu12i.w t6, 0x7f7f7 + ori t5, t5, 0x101 + + ori t6, t6, 0xf7f + bstrins.d t5, t5, 63, 32 + bstrins.d t6, t6, 63, 32 + andi a3, a1, 0x7 + + beqz a3, L(strcpy_loop_aligned_1) + b L(strcpy_mutual_align) +L(strcpy_loop_aligned): + st.d t0, a0, 0 + addi.d a0, a0, 8 + +L(strcpy_loop_aligned_1): + ld.d t0, a1, 0 + addi.d a1, a1, 8 +L(strcpy_start_realigned): + sub.d a4, t0, t5 + or a5, t0, t6 + + andn t2, a4, a5 + beqz t2, L(strcpy_loop_aligned) +L(strcpy_end): + ctz.d t7, t2 + srli.d t7, t7, 3 + addi.d t7, t7, 1 + +L(strcpy_end_8): + andi a4, t7, 0x8 + beqz a4, L(strcpy_end_4) + st.d t0, a0, 0 + move a0, t8 + jr ra + +L(strcpy_end_4): + andi a4, t7, 0x4 + beqz a4, L(strcpy_end_2) + st.w t0, a0, 0 + srli.d t0, t0, 32 + addi.d a0, a0, 4 + +L(strcpy_end_2): + andi a4, t7, 0x2 + beqz a4, L(strcpy_end_1) + st.h t0, a0, 0 + srli.d t0, t0, 16 + addi.d a0, a0, 2 + +L(strcpy_end_1): + andi a4, t7, 0x1 + beqz a4, L(strcpy_end_ret) + st.b t0, a0, 0 + +L(strcpy_end_ret): + move a0, t8 + jr ra + + +L(strcpy_mutual_align): + li.w a5, 0xff8 + andi a4, a1, 0xff8 + beq a4, a5, L(strcpy_page_cross) + +L(strcpy_page_cross_ok): + ld.d t0, a1, 0 + sub.d a4, t0, t5 + or a5, t0, t6 + andn t2, a4, a5 + bnez t2, L(strcpy_end) + +L(strcpy_mutual_align_finish): + li.w a4, 8 + st.d t0, a0, 0 + sub.d a4, a4, a3 + add.d a1, a1, a4 + add.d a0, a0, a4 + + b L(strcpy_loop_aligned_1) + +L(strcpy_page_cross): + li.w a4, 0x7 + andn a6, a1, a4 + ld.d t0, a6, 0 + li.w a7, -1 + + slli.d a5, a3, 3 + srl.d a7, a7, a5 + srl.d t0, t0, a5 + nor a7, a7, zero + + or t0, t0, a7 + sub.d a4, t0, t5 + or a5, t0, t6 + andn t2, a4, a5 + beqz t2, L(strcpy_page_cross_ok) + + b L(strcpy_end) +END(STRCPY) + +libc_hidden_builtin_def (STRCPY) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy.c b/sysdeps/loongarch/lp64/multiarch/strcpy.c new file mode 100644 index 0000000000..46afd068f9 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strcpy.c @@ -0,0 +1,35 @@ +/* Multiple versions of strcpy. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define strcpy __redirect_strcpy +# include +# undef strcpy + +# define SYMBOL_NAME strcpy +# include "ifunc-lasx.h" + +libc_ifunc_redirected (__redirect_strcpy, strcpy, IFUNC_SELECTOR ()); + +# ifdef SHARED +__hidden_ver1 (strcpy, __GI_strcpy, __redirect_strcpy) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (strcpy); +# endif +#endif From patchwork Fri Sep 8 09:33:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 75524 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0E827385770D for ; Fri, 8 Sep 2023 09:34:47 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id A91CB385782B for ; Fri, 8 Sep 2023 09:34:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A91CB385782B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8AxJuiK6vpkAhMiAA--.8714S3; Fri, 08 Sep 2023 17:34:02 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxzM6H6vpkfDJyAA--.10780S4; Fri, 08 Sep 2023 17:34:02 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Subject: [PATCH 2/4] LoongArch: Add ifunc support for stpcpy{aligned, lsx, lasx} Date: Fri, 8 Sep 2023 17:33:55 +0800 Message-Id: <20230908093357.3119822-3-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230908093357.3119822-1-dengjianbo@loongson.cn> References: <20230908093357.3119822-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxzM6H6vpkfDJyAA--.10780S4 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj9fXoWfJF1kXFyDtFWfXFWxuw15GFX_yoW8AFykXo WSyF4qqws2krZ8tFZrCwsxX3s7WrW2gr1jq3yrZa1rta48Kw10krWru3Wa9rW7Gr95CF4k Xas7XFsxCrZrKFn7l-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY47kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVCY1x0267AK xVWxJr0_GcWle2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44I27w Aqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jw0_WrylYx0Ex4A2jsIE 14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCF04k20xvY0x 0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwCFI7km07C267AKxVWUAVWUtwC20s026c02 F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw 1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7Cj xVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r 1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU8QJ 57UUUUU== X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org According to glibc stpcpy microbenchmark test results(changed to use generic_stpcpy instead of strlen + memcpy), This implementation could reduce the runtime as following: Name Percent of rutime reduced stpcpy-lasx 10%-87% stpcpy-lsx 10%-80% stpcpy-aligned 5%-45% --- sysdeps/loongarch/lp64/multiarch/Makefile | 3 + .../lp64/multiarch/ifunc-impl-list.c | 8 + .../loongarch/lp64/multiarch/ifunc-stpcpy.h | 40 ++++ .../loongarch/lp64/multiarch/stpcpy-aligned.S | 191 ++++++++++++++++ .../loongarch/lp64/multiarch/stpcpy-lasx.S | 208 ++++++++++++++++++ sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S | 206 +++++++++++++++++ sysdeps/loongarch/lp64/multiarch/stpcpy.c | 42 ++++ 7 files changed, 698 insertions(+) create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy.c diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile index f05685ceec..f95eb5c4fe 100644 --- a/sysdeps/loongarch/lp64/multiarch/Makefile +++ b/sysdeps/loongarch/lp64/multiarch/Makefile @@ -20,6 +20,9 @@ sysdep_routines += \ strcpy-unaligned \ strcpy-lsx \ strcpy-lasx \ + stpcpy-aligned \ + stpcpy-lsx \ + stpcpy-lasx \ memcpy-aligned \ memcpy-unaligned \ memmove-unaligned \ diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c index b556bacbd1..539aa681f9 100644 --- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c @@ -85,6 +85,14 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_aligned) ) + IFUNC_IMPL (i, name, stpcpy, +#if !defined __loongarch_soft_float + IFUNC_IMPL_ADD (array, i, stpcpy, SUPPORT_LASX, __stpcpy_lasx) + IFUNC_IMPL_ADD (array, i, stpcpy, SUPPORT_LSX, __stpcpy_lsx) +#endif + IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_aligned) + ) + IFUNC_IMPL (i, name, memcpy, #if !defined __loongarch_soft_float IFUNC_IMPL_ADD (array, i, memcpy, SUPPORT_LASX, __memcpy_lasx) diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h b/sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h new file mode 100644 index 0000000000..3827ec5a7e --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h @@ -0,0 +1,40 @@ +/* Common definition for stpcpy ifunc selections. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +#if !defined __loongarch_soft_float +extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden; +#endif +extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{ +#if !defined __loongarch_soft_float + if (SUPPORT_LASX) + return OPTIMIZE (lasx); + else if (SUPPORT_LSX) + return OPTIMIZE (lsx); + else +#endif + return OPTIMIZE (aligned); +} diff --git a/sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S b/sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S new file mode 100644 index 0000000000..1520597b91 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S @@ -0,0 +1,191 @@ +/* Optimized stpcpy aligned implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) +# define STPCPY_NAME __stpcpy_aligned +#else +# define STPCPY_NAME __stpcpy +#endif + +LEAF(STPCPY_NAME, 6) + andi a3, a0, 0x7 + beqz a3, L(dest_align) + sub.d a5, a1, a3 + addi.d a5, a5, 8 + +L(make_dest_align): + ld.b t0, a1, 0 + addi.d a1, a1, 1 + st.b t0, a0, 0 + addi.d a0, a0, 1 + + beqz t0, L(al_out) + bne a1, a5, L(make_dest_align) + +L(dest_align): + andi a4, a1, 7 + bstrins.d a1, zero, 2, 0 + + lu12i.w t5, 0x1010 + ld.d t0, a1, 0 + ori t5, t5, 0x101 + bstrins.d t5, t5, 63, 32 + + slli.d t6, t5, 0x7 + bnez a4, L(unalign) + sub.d t1, t0, t5 + andn t2, t6, t0 + + and t3, t1, t2 + bnez t3, L(al_end) + +L(al_loop): + st.d t0, a0, 0 + ld.d t0, a1, 8 + + addi.d a1, a1, 8 + addi.d a0, a0, 8 + sub.d t1, t0, t5 + andn t2, t6, t0 + + and t3, t1, t2 + beqz t3, L(al_loop) + +L(al_end): + ctz.d t1, t3 + srli.d t1, t1, 3 + addi.d t1, t1, 1 + + andi a3, t1, 8 + andi a4, t1, 4 + andi a5, t1, 2 + andi a6, t1, 1 + +L(al_end_8): + beqz a3, L(al_end_4) + st.d t0, a0, 0 + addi.d a0, a0, 7 + jr ra +L(al_end_4): + beqz a4, L(al_end_2) + st.w t0, a0, 0 + addi.d a0, a0, 4 + srli.d t0, t0, 32 +L(al_end_2): + beqz a5, L(al_end_1) + st.h t0, a0, 0 + addi.d a0, a0, 2 + srli.d t0, t0, 16 +L(al_end_1): + beqz a6, L(al_out) + st.b t0, a0, 0 + addi.d a0, a0, 1 +L(al_out): + addi.d a0, a0, -1 + jr ra + +L(unalign): + slli.d a5, a4, 3 + li.d t1, -1 + sub.d a6, zero, a5 + + srl.d a7, t0, a5 + sll.d t7, t1, a6 + + or t0, a7, t7 + sub.d t1, t0, t5 + andn t2, t6, t0 + and t3, t1, t2 + + bnez t3, L(un_end) + + ld.d t4, a1, 8 + addi.d a1, a1, 8 + + sub.d t1, t4, t5 + andn t2, t6, t4 + sll.d t0, t4, a6 + and t3, t1, t2 + + or t0, t0, a7 + bnez t3, L(un_end_with_remaining) + +L(un_loop): + srl.d a7, t4, a5 + + ld.d t4, a1, 8 + addi.d a1, a1, 8 + + st.d t0, a0, 0 + addi.d a0, a0, 8 + + sub.d t1, t4, t5 + andn t2, t6, t4 + sll.d t0, t4, a6 + and t3, t1, t2 + + or t0, t0, a7 + beqz t3, L(un_loop) + +L(un_end_with_remaining): + ctz.d t1, t3 + srli.d t1, t1, 3 + addi.d t1, t1, 1 + sub.d t1, t1, a4 + + blt t1, zero, L(un_end_less_8) + st.d t0, a0, 0 + addi.d a0, a0, 8 + beqz t1, L(un_out) + srl.d t0, t4, a5 + b L(un_end_less_8) + +L(un_end): + ctz.d t1, t3 + srli.d t1, t1, 3 + addi.d t1, t1, 1 + +L(un_end_less_8): + andi a4, t1, 4 + andi a5, t1, 2 + andi a6, t1, 1 +L(un_end_4): + beqz a4, L(un_end_2) + st.w t0, a0, 0 + addi.d a0, a0, 4 + srli.d t0, t0, 32 +L(un_end_2): + beqz a5, L(un_end_1) + st.h t0, a0, 0 + addi.d a0, a0, 2 + srli.d t0, t0, 16 +L(un_end_1): + beqz a6, L(un_out) + st.b t0, a0, 0 + addi.d a0, a0, 1 +L(un_out): + addi.d a0, a0, -1 + jr ra + +END(STPCPY_NAME) + +libc_hidden_builtin_def (STPCPY_NAME) diff --git a/sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S b/sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S new file mode 100644 index 0000000000..c21b132239 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S @@ -0,0 +1,208 @@ +/* Optimized stpcpy implementation using LoongArch LASX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define STPCPY __stpcpy_lasx + +LEAF(STPCPY, 6) + ori t8, zero, 0xfe0 + andi t0, a1, 0xfff + li.d t7, -1 + move a2, a0 + + bltu t8, t0, L(page_cross_start) +L(start_entry): + xvld xr0, a1, 0 + li.d t0, 32 + andi t1, a2, 0x1f + + xvsetanyeqz.b fcc0, xr0 + sub.d t0, t0, t1 + bcnez fcc0, L(end) + add.d a1, a1, t0 + + xvst xr0, a2, 0 + andi a3, a1, 0x1f + add.d a2, a2, t0 + bnez a3, L(unaligned) + + + xvld xr0, a1, 0 + xvsetanyeqz.b fcc0, xr0 + bcnez fcc0, L(al_end) +L(al_loop): + xvst xr0, a2, 0 + + xvld xr0, a1, 32 + addi.d a2, a2, 32 + addi.d a1, a1, 32 + xvsetanyeqz.b fcc0, xr0 + + bceqz fcc0, L(al_loop) +L(al_end): + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + + movfr2gr.s t0, fa0 + cto.w t0, t0 + add.d a1, a1, t0 + xvld xr0, a1, -31 + + + add.d a0, a2, t0 + xvst xr0, a0, -31 + jr ra + nop + +L(page_cross_start): + move a4, a1 + bstrins.d a4, zero, 4, 0 + xvld xr0, a4, 0 + xvmsknz.b xr0, xr0 + + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + sra.w t0, t0, a1 + + beq t0, t7, L(start_entry) + b L(tail) +L(unaligned): + andi t0, a1, 0xfff + bltu t8, t0, L(un_page_cross) + + +L(un_start_entry): + xvld xr0, a1, 0 + xvsetanyeqz.b fcc0, xr0 + bcnez fcc0, L(un_end) + addi.d a1, a1, 32 + +L(un_loop): + xvst xr0, a2, 0 + andi t0, a1, 0xfff + addi.d a2, a2, 32 + bltu t8, t0, L(page_cross_loop) + +L(un_loop_entry): + xvld xr0, a1, 0 + addi.d a1, a1, 32 + xvsetanyeqz.b fcc0, xr0 + bceqz fcc0, L(un_loop) + + addi.d a1, a1, -32 +L(un_end): + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + + + movfr2gr.s t0, fa0 +L(un_tail): + cto.w t0, t0 + add.d a1, a1, t0 + xvld xr0, a1, -31 + + add.d a0, a2, t0 + xvst xr0, a0, -31 + jr ra +L(un_page_cross): + sub.d a4, a1, a3 + + xvld xr0, a4, 0 + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + + movfr2gr.s t0, fa0 + sra.w t0, t0, a1 + beq t0, t7, L(un_start_entry) + b L(un_tail) + + +L(page_cross_loop): + sub.d a4, a1, a3 + xvld xr0, a4, 0 + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + sra.w t0, t0, a1 + beq t0, t7, L(un_loop_entry) + + b L(un_tail) +L(end): + xvmsknz.b xr0, xr0 + xvpickve.w xr1, xr0, 4 + vilvl.h vr0, vr1, vr0 + + movfr2gr.s t0, fa0 +L(tail): + cto.w t0, t0 + add.d a0, a2, t0 + add.d a5, a1, t0 + +L(less_32): + srli.d t1, t0, 4 + beqz t1, L(less_16) + vld vr0, a1, 0 + vld vr1, a5, -15 + + vst vr0, a2, 0 + vst vr1, a0, -15 + jr ra +L(less_16): + srli.d t1, t0, 3 + + beqz t1, L(less_8) + ld.d t2, a1, 0 + ld.d t3, a5, -7 + st.d t2, a2, 0 + + st.d t3, a0, -7 + jr ra +L(less_8): + li.d t1, 3 + bltu t0, t1, L(less_4) + + ld.w t2, a1, 0 + ld.w t3, a5, -3 + st.w t2, a2, 0 + st.w t3, a0, -3 + + jr ra +L(less_4): + srli.d t1, t0, 2 + bgeu t1, t0, L(zero_byte) + ld.h t2, a1, 0 + + st.h t2, a2, 0 +L(zero_byte): + st.b zero, a0, 0 + jr ra +END(STPCPY) + +libc_hidden_builtin_def (STPCPY) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S b/sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S new file mode 100644 index 0000000000..34ceadee66 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S @@ -0,0 +1,206 @@ +/* Optimized stpcpy implementation using LoongArch LSX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define STPCPY __stpcpy_lsx + +LEAF(STPCPY, 6) + pcalau12i t0, %pc_hi20(L(INDEX)) + andi a4, a1, 0xf + vld vr1, t0, %pc_lo12(L(INDEX)) + beqz a4, L(load_start) + + xor t0, a1, a4 + vld vr0, t0, 0 + vreplgr2vr.b vr2, a4 + vadd.b vr2, vr2, vr1 + + vshuf.b vr0, vr2, vr0, vr2 + vsetanyeqz.b fcc0, vr0 + bcnez fcc0, L(end) +L(load_start): + vld vr0, a1, 0 + + li.d t1, 16 + andi a3, a0, 0xf + vsetanyeqz.b fcc0, vr0 + sub.d t0, t1, a3 + + + bcnez fcc0, L(end) + add.d a1, a1, t0 + vst vr0, a0, 0 + add.d a0, a0, t0 + + bne a3, a4, L(unaligned) + vld vr0, a1, 0 + vsetanyeqz.b fcc0, vr0 + bcnez fcc0, L(al_end) + +L(al_loop): + vst vr0, a0, 0 + vld vr0, a1, 16 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + + vsetanyeqz.b fcc0, vr0 + bceqz fcc0, L(al_loop) +L(al_end): + vmsknz.b vr1, vr0 + movfr2gr.s t0, fa1 + + + cto.w t0, t0 + add.d a1, a1, t0 + vld vr0, a1, -15 + add.d a0, a0, t0 + + vst vr0, a0, -15 + jr ra + nop + nop + +L(end): + vseqi.b vr1, vr0, 0 + vfrstpi.b vr1, vr1, 0 + vpickve2gr.bu t0, vr1, 0 + addi.d t0, t0, 1 + +L(end_16): + andi t1, t0, 16 + beqz t1, L(end_8) + vst vr0, a0, 0 + addi.d a0, a0, 15 + + + jr ra +L(end_8): + andi t2, t0, 8 + andi t3, t0, 4 + andi t4, t0, 2 + + andi t5, t0, 1 + beqz t2, L(end_4) + vstelm.d vr0, a0, 0, 0 + addi.d a0, a0, 8 + + vbsrl.v vr0, vr0, 8 +L(end_4): + beqz t3, L(end_2) + vstelm.w vr0, a0, 0, 0 + addi.d a0, a0, 4 + + vbsrl.v vr0, vr0, 4 +L(end_2): + beqz t4, L(end_1) + vstelm.h vr0, a0, 0, 0 + addi.d a0, a0, 2 + + + vbsrl.v vr0, vr0, 2 +L(end_1): + beqz t5, L(out) + vstelm.b vr0, a0, 0, 0 + addi.d a0, a0, 1 + +L(out): + addi.d a0, a0, -1 + jr ra + nop + nop + +L(unaligned): + andi a3, a1, 0xf + bstrins.d a1, zero, 3, 0 + vld vr2, a1, 0 + vreplgr2vr.b vr3, a3 + + vslt.b vr4, vr1, vr3 + vor.v vr0, vr2, vr4 + vsetanyeqz.b fcc0, vr0 + bcnez fcc0, L(un_first_end) + + + vld vr0, a1, 16 + vadd.b vr3, vr3, vr1 + vshuf.b vr4, vr0, vr2, vr3 + vsetanyeqz.b fcc0, vr0 + + bcnez fcc0, L(un_end) + vor.v vr2, vr0, vr0 + addi.d a1, a1, 16 +L(un_loop): + vld vr0, a1, 16 + + vst vr4, a0, 0 + addi.d a0, a0, 16 + vshuf.b vr4, vr0, vr2, vr3 + vsetanyeqz.b fcc0, vr0 + + bcnez fcc0, L(un_end) + vld vr2, a1, 32 + vst vr4, a0, 0 + addi.d a1, a1, 32 + + + addi.d a0, a0, 16 + vshuf.b vr4, vr2, vr0, vr3 + vsetanyeqz.b fcc0, vr2 + bceqz fcc0, L(un_loop) + + vor.v vr0, vr2, vr2 + addi.d a1, a1, -16 +L(un_end): + vsetanyeqz.b fcc0, vr4 + bcnez fcc0, 1f + + vst vr4, a0, 0 +1: + vmsknz.b vr1, vr0 + movfr2gr.s t0, fa1 + cto.w t0, t0 + + add.d a1, a1, t0 + vld vr0, a1, 1 + add.d a0, a0, t0 + sub.d a0, a0, a3 + + + vst vr0, a0, 1 + addi.d a0, a0, 16 + jr ra +L(un_first_end): + addi.d a0, a0, -16 + + addi.d a1, a1, -16 + b 1b +END(STPCPY) + + .section .rodata.cst16,"M",@progbits,16 + .align 4 +L(INDEX): + .dword 0x0706050403020100 + .dword 0x0f0e0d0c0b0a0908 + +libc_hidden_builtin_def (STPCPY) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/stpcpy.c b/sysdeps/loongarch/lp64/multiarch/stpcpy.c new file mode 100644 index 0000000000..62115e4055 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/stpcpy.c @@ -0,0 +1,42 @@ +/* Multiple versions of stpcpy. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017-2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define stpcpy __redirect_stpcpy +# define __stpcpy __redirect___stpcpy +# define NO_MEMPCPY_STPCPY_REDIRECT +# define __NO_STRING_INLINES +# include +# undef stpcpy +# undef __stpcpy + +# define SYMBOL_NAME stpcpy +# include "ifunc-stpcpy.h" + +libc_ifunc_redirected (__redirect_stpcpy, __stpcpy, IFUNC_SELECTOR ()); + +weak_alias (__stpcpy, stpcpy) +# ifdef SHARED +__hidden_ver1 (__stpcpy, __GI___stpcpy, __redirect___stpcpy) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (stpcpy); +__hidden_ver1 (stpcpy, __GI_stpcpy, __redirect_stpcpy) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (stpcpy); +# endif +#endif From patchwork Fri Sep 8 09:33:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 75525 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8B71D385701B for ; Fri, 8 Sep 2023 09:35:13 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 751223858433 for ; Fri, 8 Sep 2023 09:34:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 751223858433 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8Dxg_CM6vpkBBMiAA--.2694S3; Fri, 08 Sep 2023 17:34:04 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxzM6H6vpkfDJyAA--.10780S5; Fri, 08 Sep 2023 17:34:03 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Subject: [PATCH 3/4] LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx} Date: Fri, 8 Sep 2023 17:33:56 +0800 Message-Id: <20230908093357.3119822-4-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230908093357.3119822-1-dengjianbo@loongson.cn> References: <20230908093357.3119822-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxzM6H6vpkfDJyAA--.10780S5 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj9fXoWfJF1DZr4UZFWrAr43Gr4Dtrc_yoW8WryxAo WakF4qqrs7Cr4rG398CrnxX3sxGrWrKr1jvay0va1rJw4rK342kFWrZw1a9rZ7tr95Wr4r Zas7Z3W5G347GFn3l-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY47kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVCY1x0267AK xVWxJr0_GcWle2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44I27w Aqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jw0_WrylYx0Ex4A2jsIE 14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCF04k20xvY0x 0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwCFI7km07C267AKxVWUtVW8ZwC20s026c02 F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw 1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7Cj xVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r 1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU8QJ 57UUUUU== X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org According to glibc strrchr microbenchmark test results, this implementation could reduce the runtime time as following: Name Percent of rutime reduced strrchr-lasx 10%-50% strrchr-lsx 0%-50% strrchr-aligned 5%-50% Generic strrchr is implemented by function strlen + memrchr, the lasx version will compare with generic strrchr implemented by strlen-lasx + memrchr-lasx, the lsx version will compare with generic strrchr implemented by strlen-lsx + memrchr-lsx, the aligned version will compare with generic strrchr implemented by strlen-aligned + memrchr-generic. --- sysdeps/loongarch/lp64/multiarch/Makefile | 3 + .../lp64/multiarch/ifunc-impl-list.c | 8 + .../loongarch/lp64/multiarch/ifunc-strrchr.h | 41 ++++ .../lp64/multiarch/strrchr-aligned.S | 170 +++++++++++++++++ .../loongarch/lp64/multiarch/strrchr-lasx.S | 176 ++++++++++++++++++ .../loongarch/lp64/multiarch/strrchr-lsx.S | 144 ++++++++++++++ sysdeps/loongarch/lp64/multiarch/strrchr.c | 36 ++++ 7 files changed, 578 insertions(+) create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr.c diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile index f95eb5c4fe..23041fd727 100644 --- a/sysdeps/loongarch/lp64/multiarch/Makefile +++ b/sysdeps/loongarch/lp64/multiarch/Makefile @@ -23,6 +23,9 @@ sysdep_routines += \ stpcpy-aligned \ stpcpy-lsx \ stpcpy-lasx \ + strrchr-aligned \ + strrchr-lsx \ + strrchr-lasx \ memcpy-aligned \ memcpy-unaligned \ memmove-unaligned \ diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c index 539aa681f9..ceab78dbfe 100644 --- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c @@ -93,6 +93,14 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_aligned) ) + IFUNC_IMPL (i, name, strrchr, +#if !defined __loongarch_soft_float + IFUNC_IMPL_ADD (array, i, strrchr, SUPPORT_LASX, __strrchr_lasx) + IFUNC_IMPL_ADD (array, i, strrchr, SUPPORT_LSX, __strrchr_lsx) +#endif + IFUNC_IMPL_ADD (array, i, strrchr, 1, __strrchr_aligned) + ) + IFUNC_IMPL (i, name, memcpy, #if !defined __loongarch_soft_float IFUNC_IMPL_ADD (array, i, memcpy, SUPPORT_LASX, __memcpy_lasx) diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h b/sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h new file mode 100644 index 0000000000..bbb34089ef --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h @@ -0,0 +1,41 @@ +/* Common definition for strrchr ifunc selections. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +#if !defined __loongarch_soft_float +extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden; +#endif + +extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{ +#if !defined __loongarch_soft_float + if (SUPPORT_LASX) + return OPTIMIZE (lasx); + else if (SUPPORT_LSX) + return OPTIMIZE (lsx); + else +#endif + return OPTIMIZE (aligned); +} diff --git a/sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S b/sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S new file mode 100644 index 0000000000..a73deb7840 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S @@ -0,0 +1,170 @@ +/* Optimized strrchr implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) +# define STRRCHR __strrchr_aligned +#else +# define STRRCHR strrchr +#endif + +LEAF(STRRCHR, 6) + slli.d t0, a0, 3 + bstrins.d a0, zero, 2, 0 + lu12i.w a2, 0x01010 + ld.d t2, a0, 0 + + andi a1, a1, 0xff + ori a2, a2, 0x101 + li.d t3, -1 + bstrins.d a2, a2, 63, 32 + + sll.d t5, t3, t0 + slli.d a3, a2, 7 + orn t4, t2, t5 + mul.d a1, a1, a2 + + sub.d t0, t4, a2 + andn t1, a3, t4 + and t1, t0, t1 + beqz t1, L(find_tail) + + + ctz.d t0, t1 + orn t0, zero, t0 + xor t2, t4, a1 + srl.d t0, t3, t0 + + orn t2, t2, t0 + orn t2, t2, t5 + revb.d t2, t2 + sub.d t1, t2, a2 + + andn t0, a3, t2 + and t1, t0, t1 + ctz.d t0, t1 + srli.d t0, t0, 3 + + addi.d a0, a0, 7 + sub.d a0, a0, t0 + maskeqz a0, a0, t1 + jr ra + + +L(find_tail): + addi.d a4, a0, 8 + addi.d a0, a0, 8 +L(loop_ascii): + ld.d t2, a0, 0 + sub.d t1, t2, a2 + + and t0, t1, a3 + bnez t0, L(more_check) + ld.d t2, a0, 8 + sub.d t1, t2, a2 + + and t0, t1, a3 + addi.d a0, a0, 16 + beqz t0, L(loop_ascii) + addi.d a0, a0, -8 + +L(more_check): + andn t0, a3, t2 + and t1, t1, t0 + bnez t1, L(tail) + addi.d a0, a0, 8 + + +L(loop_nonascii): + ld.d t2, a0, 0 + sub.d t1, t2, a2 + andn t0, a3, t2 + and t1, t0, t1 + + bnez t1, L(tail) + ld.d t2, a0, 8 + addi.d a0, a0, 16 + sub.d t1, t2, a2 + + andn t0, a3, t2 + and t1, t0, t1 + beqz t1, L(loop_nonascii) + addi.d a0, a0, -8 + +L(tail): + ctz.d t0, t1 + orn t0, zero, t0 + xor t2, t2, a1 + srl.d t0, t3, t0 + + + orn t2, t2, t0 + revb.d t2, t2 + sub.d t1, t2, a2 + andn t0, a3, t2 + + and t1, t0, t1 + bnez t1, L(count_pos) +L(find_loop): + beq a0, a4, L(find_end) + ld.d t2, a0, -8 + + addi.d a0, a0, -8 + xor t2, t2, a1 + sub.d t1, t2, a2 + andn t0, a3, t2 + + and t1, t0, t1 + beqz t1, L(find_loop) + revb.d t2, t2 + sub.d t1, t2, a2 + + + andn t0, a3, t2 + and t1, t0, t1 +L(count_pos): + ctz.d t0, t1 + addi.d a0, a0, 7 + + srli.d t0, t0, 3 + sub.d a0, a0, t0 + jr ra + nop + +L(find_end): + xor t2, t4, a1 + orn t2, t2, t5 + revb.d t2, t2 + sub.d t1, t2, a2 + + + andn t0, a3, t2 + and t1, t0, t1 + ctz.d t0, t1 + srli.d t0, t0, 3 + + addi.d a0, a4, -1 + sub.d a0, a0, t0 + maskeqz a0, a0, t1 + jr ra +END(STRRCHR) + +libc_hidden_builtin_def(STRRCHR) diff --git a/sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S new file mode 100644 index 0000000000..5a6e22979a --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S @@ -0,0 +1,176 @@ +/* Optimized strrchr implementation using LoongArch LASX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +#define STRRCHR __strrchr_lasx + +LEAF(STRRCHR, 6) + move a2, a0 + bstrins.d a0, zero, 5, 0 + xvld xr0, a0, 0 + xvld xr1, a0, 32 + + li.d t2, -1 + xvreplgr2vr.b xr4, a1 + xvmsknz.b xr2, xr0 + xvmsknz.b xr3, xr1 + + xvpickve.w xr5, xr2, 4 + xvpickve.w xr6, xr3, 4 + vilvl.h vr2, vr5, vr2 + vilvl.h vr3, vr6, vr3 + + vilvl.w vr2, vr3, vr2 + movfr2gr.d t0, fa2 + sra.d t0, t0, a2 + beq t0, t2, L(find_tail) + + + xvseq.b xr2, xr0, xr4 + xvseq.b xr3, xr1, xr4 + xvmsknz.b xr2, xr2 + xvmsknz.b xr3, xr3 + + xvpickve.w xr4, xr2, 4 + xvpickve.w xr5, xr3, 4 + vilvl.h vr2, vr4, vr2 + vilvl.h vr3, vr5, vr3 + + vilvl.w vr1, vr3, vr2 + slli.d t3, t2, 1 + movfr2gr.d t1, fa1 + cto.d t0, t0 + + srl.d t1, t1, a2 + sll.d t3, t3, t0 + addi.d a0, a2, 63 + andn t1, t1, t3 + + + clz.d t0, t1 + sub.d a0, a0, t0 + maskeqz a0, a0, t1 + jr ra + + .align 5 +L(find_tail): + addi.d a3, a0, 64 +L(loop): + xvld xr2, a0, 64 + xvld xr3, a0, 96 + addi.d a0, a0, 64 + + xvmin.bu xr5, xr2, xr3 + xvsetanyeqz.b fcc0, xr5 + bceqz fcc0, L(loop) + xvmsknz.b xr5, xr2 + + + xvmsknz.b xr6, xr3 + xvpickve.w xr7, xr5, 4 + xvpickve.w xr8, xr6, 4 + vilvl.h vr5, vr7, vr5 + + vilvl.h vr6, vr8, vr6 + xvseq.b xr2, xr2, xr4 + xvseq.b xr3, xr3, xr4 + xvmsknz.b xr2, xr2 + + xvmsknz.b xr3, xr3 + xvpickve.w xr7, xr2, 4 + xvpickve.w xr8, xr3, 4 + vilvl.h vr2, vr7, vr2 + + vilvl.h vr3, vr8, vr3 + vilvl.w vr5, vr6, vr5 + vilvl.w vr2, vr3, vr2 + movfr2gr.d t0, fa5 + + + movfr2gr.d t1, fa2 + slli.d t3, t2, 1 + cto.d t0, t0 + sll.d t3, t3, t0 + + andn t1, t1, t3 + beqz t1, L(find_loop) + clz.d t0, t1 + addi.d a0, a0, 63 + + sub.d a0, a0, t0 + jr ra +L(find_loop): + beq a0, a3, L(find_end) + xvld xr2, a0, -64 + + xvld xr3, a0, -32 + addi.d a0, a0, -64 + xvseq.b xr2, xr2, xr4 + xvseq.b xr3, xr3, xr4 + + + xvmax.bu xr5, xr2, xr3 + xvseteqz.v fcc0, xr5 + bcnez fcc0, L(find_loop) + xvmsknz.b xr0, xr2 + + xvmsknz.b xr1, xr3 + xvpickve.w xr2, xr0, 4 + xvpickve.w xr3, xr1, 4 + vilvl.h vr0, vr2, vr0 + + vilvl.h vr1, vr3, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 + addi.d a0, a0, 63 + + clz.d t0, t0 + sub.d a0, a0, t0 + jr ra + nop + + +L(find_end): + xvseq.b xr2, xr0, xr4 + xvseq.b xr3, xr1, xr4 + xvmsknz.b xr2, xr2 + xvmsknz.b xr3, xr3 + + xvpickve.w xr4, xr2, 4 + xvpickve.w xr5, xr3, 4 + vilvl.h vr2, vr4, vr2 + vilvl.h vr3, vr5, vr3 + + vilvl.w vr1, vr3, vr2 + movfr2gr.d t1, fa1 + addi.d a0, a2, 63 + srl.d t1, t1, a2 + + clz.d t0, t1 + sub.d a0, a0, t0 + maskeqz a0, a0, t1 + jr ra +END(STRRCHR) + +libc_hidden_builtin_def(STRRCHR) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S new file mode 100644 index 0000000000..8f2fd22e50 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S @@ -0,0 +1,144 @@ +/* Optimized strrchr implementation using LoongArch LSX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +#define STRRCHR __strrchr_lsx + +LEAF(STRRCHR, 6) + move a2, a0 + bstrins.d a0, zero, 4, 0 + vld vr0, a0, 0 + vld vr1, a0, 16 + + li.d t2, -1 + vreplgr2vr.b vr4, a1 + vmsknz.b vr2, vr0 + vmsknz.b vr3, vr1 + + vilvl.h vr2, vr3, vr2 + movfr2gr.s t0, fa2 + sra.w t0, t0, a2 + beq t0, t2, L(find_tail) + + vseq.b vr2, vr0, vr4 + vseq.b vr3, vr1, vr4 + vmsknz.b vr2, vr2 + vmsknz.b vr3, vr3 + + + vilvl.h vr1, vr3, vr2 + slli.d t3, t2, 1 + movfr2gr.s t1, fa1 + cto.w t0, t0 + + srl.w t1, t1, a2 + sll.d t3, t3, t0 + addi.d a0, a2, 31 + andn t1, t1, t3 + + clz.w t0, t1 + sub.d a0, a0, t0 + maskeqz a0, a0, t1 + jr ra + + .align 5 +L(find_tail): + addi.d a3, a0, 32 +L(loop): + vld vr2, a0, 32 + vld vr3, a0, 48 + addi.d a0, a0, 32 + + vmin.bu vr5, vr2, vr3 + vsetanyeqz.b fcc0, vr5 + bceqz fcc0, L(loop) + vmsknz.b vr5, vr2 + + vmsknz.b vr6, vr3 + vilvl.h vr5, vr6, vr5 + vseq.b vr2, vr2, vr4 + vseq.b vr3, vr3, vr4 + + vmsknz.b vr2, vr2 + vmsknz.b vr3, vr3 + vilvl.h vr2, vr3, vr2 + movfr2gr.s t0, fa5 + + + movfr2gr.s t1, fa2 + slli.d t3, t2, 1 + cto.w t0, t0 + sll.d t3, t3, t0 + + andn t1, t1, t3 + beqz t1, L(find_loop) + clz.w t0, t1 + addi.d a0, a0, 31 + + sub.d a0, a0, t0 + jr ra +L(find_loop): + beq a0, a3, L(find_end) + vld vr2, a0, -32 + + vld vr3, a0, -16 + addi.d a0, a0, -32 + vseq.b vr2, vr2, vr4 + vseq.b vr3, vr3, vr4 + + + vmax.bu vr5, vr2, vr3 + vseteqz.v fcc0, vr5 + bcnez fcc0, L(find_loop) + vmsknz.b vr0, vr2 + + vmsknz.b vr1, vr3 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + addi.d a0, a0, 31 + + clz.w t0, t0 + sub.d a0, a0, t0 + jr ra + nop + +L(find_end): + vseq.b vr2, vr0, vr4 + vseq.b vr3, vr1, vr4 + vmsknz.b vr2, vr2 + vmsknz.b vr3, vr3 + + + vilvl.h vr1, vr3, vr2 + movfr2gr.s t1, fa1 + addi.d a0, a2, 31 + srl.w t1, t1, a2 + + clz.w t0, t1 + sub.d a0, a0, t0 + maskeqz a0, a0, t1 + jr ra +END(STRRCHR) + +libc_hidden_builtin_def(STRRCHR) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/strrchr.c b/sysdeps/loongarch/lp64/multiarch/strrchr.c new file mode 100644 index 0000000000..d9c9f660a0 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/strrchr.c @@ -0,0 +1,36 @@ +/* Multiple versions of strrchr. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define strrchr __redirect_strrchr +# include +# undef strrchr + +# define SYMBOL_NAME strrchr +# include "ifunc-strrchr.h" + +libc_ifunc_redirected (__redirect_strrchr, strrchr, IFUNC_SELECTOR ()); +weak_alias (strrchr, rindex) +# ifdef SHARED +__hidden_ver1 (strrchr, __GI_strrchr, __redirect_strrchr) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (strrchr); +# endif + +#endif From patchwork Fri Sep 8 09:33:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 75523 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E2D673850221 for ; Fri, 8 Sep 2023 09:34:26 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id F39033858D1E for ; Fri, 8 Sep 2023 09:34:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F39033858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8BxuOiN6vpkCBMiAA--.32020S3; Fri, 08 Sep 2023 17:34:05 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxzM6H6vpkfDJyAA--.10780S6; Fri, 08 Sep 2023 17:34:05 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Subject: [PATCH 4/4] LoongArch: Change to put magic number to .rodata section Date: Fri, 8 Sep 2023 17:33:57 +0800 Message-Id: <20230908093357.3119822-5-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230908093357.3119822-1-dengjianbo@loongson.cn> References: <20230908093357.3119822-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxzM6H6vpkfDJyAA--.10780S6 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoW7ArW8ZryDuFW7WFW5Ar4fJFc_yoW8Xr45p3 sxCrZxW3Z7W3sxKr4fKwn3Xa18AFW8Cw17uFyakr1UZryxuw1jg397A3srJa4Uu3y8X3y0 9rn5KF10gayqyabCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkYb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v2 6F4UJVW0owAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc 02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUtVWrXwAv7VC2z280aVAF wI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28IcxkI7V AKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCj r7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42IY6x IIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x 0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU8QJ57UUUUU== X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Change to put magic number to .rodata section in memmove-lsx, and use pcalau12i and %pc_lo12 with vld to get the data. --- .../loongarch/lp64/multiarch/memmove-lsx.S | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S b/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S index 8a9367708d..5eb819ef74 100644 --- a/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S @@ -209,13 +209,10 @@ L(al_less_16): nop -L(magic_num): - .dword 0x0706050403020100 - .dword 0x0f0e0d0c0b0a0908 L(unaligned): - pcaddi t2, -4 + pcalau12i t2, %pc_hi20(L(INDEX)) bstrins.d a1, zero, 3, 0 - vld vr8, t2, 0 + vld vr8, t2, %pc_lo12(L(INDEX)) vld vr0, a1, 0 vld vr1, a1, 16 @@ -413,13 +410,10 @@ L(back_al_less_16): vst vr1, a0, 0 jr ra -L(magic_num_2): - .dword 0x0706050403020100 - .dword 0x0f0e0d0c0b0a0908 L(back_unaligned): - pcaddi t2, -4 + pcalau12i t2, %pc_hi20(L(INDEX)) bstrins.d a4, zero, 3, 0 - vld vr8, t2, 0 + vld vr8, t2, %pc_lo12(L(INDEX)) vld vr0, a4, 0 vld vr1, a4, -16 @@ -529,6 +523,12 @@ L(back_un_less_16): jr ra END(MEMMOVE_NAME) + .section .rodata.cst16,"M",@progbits,16 + .align 4 +L(INDEX): + .dword 0x0706050403020100 + .dword 0x0f0e0d0c0b0a0908 + libc_hidden_builtin_def (MEMCPY_NAME) libc_hidden_builtin_def (MEMMOVE_NAME) #endif