Message ID | 20230415112340.38431-1-xry111@xry111.site |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D45783856964 for <patchwork@sourceware.org>; Sat, 15 Apr 2023 11:24:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D45783856964 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1681557860; bh=ZjR3BeVIhmImiiDWAPTQn76k8WPBAmdblexe67VwF9s=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=jv/d7BLM14xjAeE1ysJUL5CUHXSca7rfI04tbXGkt6jaUj/VfkiaS9CYDTzpo+Rsr d6GWeColA4wLyTFulcEkzgoj/NnRpWoWGXu4zgF1cp53mruurtjX+pgmN5gJt9VeYq GNHCyviG1fFkau+p9hmlVqVVENbhCJHaRJc1zgBk= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from xry111.site (xry111.site [IPv6:2001:470:683e::1]) by sourceware.org (Postfix) with ESMTPS id 98BA4385772C for <libc-alpha@sourceware.org>; Sat, 15 Apr 2023 11:23:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 98BA4385772C Received: from stargazer.. (unknown [113.140.11.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 37E1D660B9; Sat, 15 Apr 2023 07:23:53 -0400 (EDT) To: libc-alpha@sourceware.org Cc: caiyinyu <caiyinyu@loongson.cn>, Wang Xuerui <i@xen0n.name>, Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>, Xi Ruoyao <xry111@xry111.site> Subject: [PATCH 0/5] LoongArch: Multiarch string and memory copy routines for unaligned access Date: Sat, 15 Apr 2023 19:23:35 +0800 Message-Id: <20230415112340.38431-1-xry111@xry111.site> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, LIKELY_SPAM_FROM, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Xi Ruoyao via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Xi Ruoyao <xry111@xry111.site> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
LoongArch: Multiarch string and memory copy routines for unaligned access
|
|
Message
Xi Ruoyao
April 15, 2023, 11:23 a.m. UTC
LoongArch CPUs may have hardware unaligned access support. For the launched LoongArch CPUs, those branded as Loongson-3 (for desktops or servers) have hardware unaligned access support, but those branded as Loongson-2 (for embedded or industrial applications) do not. On Linux, the unaligned access support is indicated by a HWCAP bit provided by the kernel. So we can multiarch stpcpy and memcpy with ifunc to take the advantage on the CPUs with unaligned access support. On a Loongson-3A5000HV CPU running at 2.5GHz, "make bench" has shown these changes can really improve the performance: - https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-stpcpy-summary.txt - https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-memcpy-summary.txt Xi Ruoyao (5): LoongArch: Add bits/hwcap.h for Linux LoongArch: Add LOONGARCH_HAVE_UAL macro string: stpcpy.c: Only alias __stpcpy to stpcpy if STPCPY undefined LoongArch: Multiarch stpcpy for unaligned access LoongArch: Multiarch memcpy for unaligned access string/stpcpy.c | 3 ++ sysdeps/loongarch/loongarch-features.h | 26 ++++++++++ sysdeps/loongarch/multiarch/Makefile | 6 +++ sysdeps/loongarch/multiarch/memcpy-generic.c | 27 ++++++++++ sysdeps/loongarch/multiarch/memcpy-ual.c | 50 +++++++++++++++++++ sysdeps/loongarch/multiarch/memcpy.c | 39 +++++++++++++++ sysdeps/loongarch/multiarch/stpcpy-generic.c | 25 ++++++++++ sysdeps/loongarch/multiarch/stpcpy-ual.c | 43 ++++++++++++++++ sysdeps/loongarch/multiarch/stpcpy.c | 37 ++++++++++++++ .../loongarch/multiarch/wordcopy-ual-inline.c | 31 ++++++++++++ .../unix/sysv/linux/loongarch/bits/hwcap.h | 37 ++++++++++++++ .../sysv/linux/loongarch/loongarch-features.h | 30 +++++++++++ sysdeps/unix/sysv/linux/loongarch/sysdep.h | 1 + 13 files changed, 355 insertions(+) create mode 100644 sysdeps/loongarch/loongarch-features.h create mode 100644 sysdeps/loongarch/multiarch/Makefile create mode 100644 sysdeps/loongarch/multiarch/memcpy-generic.c create mode 100644 sysdeps/loongarch/multiarch/memcpy-ual.c create mode 100644 sysdeps/loongarch/multiarch/memcpy.c create mode 100644 sysdeps/loongarch/multiarch/stpcpy-generic.c create mode 100644 sysdeps/loongarch/multiarch/stpcpy-ual.c create mode 100644 sysdeps/loongarch/multiarch/stpcpy.c create mode 100644 sysdeps/loongarch/multiarch/wordcopy-ual-inline.c create mode 100644 sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h create mode 100644 sysdeps/unix/sysv/linux/loongarch/loongarch-features.h
Comments
We are preparing a series of patches that include ifunc support (aligned/unaligned/vectorized assembly implementation) for str/mem functions, tunable functionality, and vectorized _dl_runtime_resolve. However, we are not currently able to submit them to the upstream community. We may consider publishing them on GitHub in the future like gcc and binutils. We will temporarily keep your patches. 在 2023/4/15 下午7:23, Xi Ruoyao 写道: > LoongArch CPUs may have hardware unaligned access support. For the > launched LoongArch CPUs, those branded as Loongson-3 (for desktops or > servers) have hardware unaligned access support, but those branded as > Loongson-2 (for embedded or industrial applications) do not. > > On Linux, the unaligned access support is indicated by a HWCAP bit > provided by the kernel. So we can multiarch stpcpy and memcpy with > ifunc to take the advantage on the CPUs with unaligned access support. > > On a Loongson-3A5000HV CPU running at 2.5GHz, "make bench" has shown > these changes can really improve the performance: > > - https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-stpcpy-summary.txt > - https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-memcpy-summary.txt > > Xi Ruoyao (5): > LoongArch: Add bits/hwcap.h for Linux > LoongArch: Add LOONGARCH_HAVE_UAL macro > string: stpcpy.c: Only alias __stpcpy to stpcpy if STPCPY undefined > LoongArch: Multiarch stpcpy for unaligned access > LoongArch: Multiarch memcpy for unaligned access > > string/stpcpy.c | 3 ++ > sysdeps/loongarch/loongarch-features.h | 26 ++++++++++ > sysdeps/loongarch/multiarch/Makefile | 6 +++ > sysdeps/loongarch/multiarch/memcpy-generic.c | 27 ++++++++++ > sysdeps/loongarch/multiarch/memcpy-ual.c | 50 +++++++++++++++++++ > sysdeps/loongarch/multiarch/memcpy.c | 39 +++++++++++++++ > sysdeps/loongarch/multiarch/stpcpy-generic.c | 25 ++++++++++ > sysdeps/loongarch/multiarch/stpcpy-ual.c | 43 ++++++++++++++++ > sysdeps/loongarch/multiarch/stpcpy.c | 37 ++++++++++++++ > .../loongarch/multiarch/wordcopy-ual-inline.c | 31 ++++++++++++ > .../unix/sysv/linux/loongarch/bits/hwcap.h | 37 ++++++++++++++ > .../sysv/linux/loongarch/loongarch-features.h | 30 +++++++++++ > sysdeps/unix/sysv/linux/loongarch/sysdep.h | 1 + > 13 files changed, 355 insertions(+) > create mode 100644 sysdeps/loongarch/loongarch-features.h > create mode 100644 sysdeps/loongarch/multiarch/Makefile > create mode 100644 sysdeps/loongarch/multiarch/memcpy-generic.c > create mode 100644 sysdeps/loongarch/multiarch/memcpy-ual.c > create mode 100644 sysdeps/loongarch/multiarch/memcpy.c > create mode 100644 sysdeps/loongarch/multiarch/stpcpy-generic.c > create mode 100644 sysdeps/loongarch/multiarch/stpcpy-ual.c > create mode 100644 sysdeps/loongarch/multiarch/stpcpy.c > create mode 100644 sysdeps/loongarch/multiarch/wordcopy-ual-inline.c > create mode 100644 sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h > create mode 100644 sysdeps/unix/sysv/linux/loongarch/loongarch-features.h >