From patchwork Fri Sep 8 09:33:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 56096 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 48F9C385DC19 for ; Fri, 8 Sep 2023 09:34:19 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 2AEEB3858005 for ; Fri, 8 Sep 2023 09:34:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2AEEB3858005 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8AxEvCH6vpk_BIiAA--.1507S3; Fri, 08 Sep 2023 17:33:59 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxzM6H6vpkfDJyAA--.10780S2; Fri, 08 Sep 2023 17:33:59 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Subject: [PATCH 0/4] LoongArch: Add ifunc support for str{cpy, rchr}, Date: Fri, 8 Sep 2023 17:33:53 +0800 Message-Id: <20230908093357.3119822-1-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxzM6H6vpkfDJyAA--.10780S2 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoWxAr43WF17uFWruFy7ur1Dtwc_yoWrtFy3p3 97Crn8JF4fua42gw4fta4aq3yrX3ykGr129FZIy345GrWIgr93XrySy3WkZF1UXw18JrWI qrnakr1UW3W5AacCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUk0b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v2 6F4UJVW0owAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc 02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAF wI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28IcxkI7V AKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCj r7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42IY6x IIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x 0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxUzsqWUUUUU X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org This patch add mutiple versions of strcpy, stpcpy, strrchr implemented by basic LoongArch instructions, LSX instructions, LASX instructions. Even though this implementation experience degradation in a few cases, overall, the performance gains are significant. See: https://github.com/jiadengx/glibc_test/blob/main/bench/strcpy_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/stpcpy_compare.out Test results are compared with generic strcpy and stpcpy, not strlen + memcpy in the benchmark. Generic strrchr is implemented by strlen + memrchr, the strrchr_lasx will be compared with generic_strrchr implemented by strlen-lasx and memrchr-lasx, strrchr-lsx will be compared with generic_strrchr implemented by strlen-lsx and memrchr-lsx, strrchr-aligned will be compared with generic_strrchr implemented by strlen-aligned and memrchr-generic. https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_lasx_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_lsx_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_aligned_compare.out In the data, positive values in the parentheses indicate that our implementation took less time, indicating a performance improvement; negative values in the parentheses mean that our implementation took more time, indicating a decrease in performance. Following is the summarise of the performance comparing with the generic version in the glibc microbenchmark, name reduce time percent strcpy-aligned 10%-45% strcpy-unaligned 10%-49%, comparing with the aligned version,unaligned version experience better performance in case src and dest cannot be both aligned with 8bytes strcpy-lsx 20%-80% strcpy-lasx 15%-86% stpcpy-lasx 10%-87% stpcpy-lsx 10%-80% stpcpy-aligned 5%-45% strrchr-lasx 10%-50% strrchr-lsx 0%-50% strrchr-aligned 5%-50% dengjianbo (4): LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} LoongArch: Add ifunc support for stpcpy{aligned, lsx, lasx} LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx} LoongArch: Change to put magic number to .rodata section sysdeps/loongarch/lp64/multiarch/Makefile | 10 + .../lp64/multiarch/ifunc-impl-list.c | 25 +++ .../loongarch/lp64/multiarch/ifunc-stpcpy.h | 40 ++++ .../loongarch/lp64/multiarch/ifunc-strrchr.h | 41 ++++ .../loongarch/lp64/multiarch/memmove-lsx.S | 20 +- .../loongarch/lp64/multiarch/stpcpy-aligned.S | 191 ++++++++++++++++ .../loongarch/lp64/multiarch/stpcpy-lasx.S | 208 ++++++++++++++++++ sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S | 206 +++++++++++++++++ sysdeps/loongarch/lp64/multiarch/stpcpy.c | 42 ++++ .../loongarch/lp64/multiarch/strcpy-aligned.S | 185 ++++++++++++++++ .../loongarch/lp64/multiarch/strcpy-lasx.S | 208 ++++++++++++++++++ sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S | 197 +++++++++++++++++ .../lp64/multiarch/strcpy-unaligned.S | 131 +++++++++++ sysdeps/loongarch/lp64/multiarch/strcpy.c | 35 +++ .../lp64/multiarch/strrchr-aligned.S | 170 ++++++++++++++ .../loongarch/lp64/multiarch/strrchr-lasx.S | 176 +++++++++++++++ .../loongarch/lp64/multiarch/strrchr-lsx.S | 144 ++++++++++++ sysdeps/loongarch/lp64/multiarch/strrchr.c | 36 +++ 18 files changed, 2055 insertions(+), 10 deletions(-) create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy.c create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy.c create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr.c