[0/5] LoongArch: Multiarch string and memory copy routines for unaligned access

Message ID 20230415112340.38431-1-xry111@xry111.site
Headers
Series LoongArch: Multiarch string and memory copy routines for unaligned access |

Message

Xi Ruoyao April 15, 2023, 11:23 a.m. UTC
  LoongArch CPUs may have hardware unaligned access support.  For the
launched LoongArch CPUs, those branded as Loongson-3 (for desktops or
servers) have hardware unaligned access support, but those branded as
Loongson-2 (for embedded or industrial applications) do not.

On Linux, the unaligned access support is indicated by a HWCAP bit
provided by the kernel.  So we can multiarch stpcpy and memcpy with
ifunc to take the advantage on the CPUs with unaligned access support.

On a Loongson-3A5000HV CPU running at 2.5GHz, "make bench" has shown
these changes can really improve the performance:

- https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-stpcpy-summary.txt
- https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-memcpy-summary.txt

Xi Ruoyao (5):
  LoongArch: Add bits/hwcap.h for Linux
  LoongArch: Add LOONGARCH_HAVE_UAL macro
  string: stpcpy.c: Only alias __stpcpy to stpcpy if STPCPY undefined
  LoongArch: Multiarch stpcpy for unaligned access
  LoongArch: Multiarch memcpy for unaligned access

 string/stpcpy.c                               |  3 ++
 sysdeps/loongarch/loongarch-features.h        | 26 ++++++++++
 sysdeps/loongarch/multiarch/Makefile          |  6 +++
 sysdeps/loongarch/multiarch/memcpy-generic.c  | 27 ++++++++++
 sysdeps/loongarch/multiarch/memcpy-ual.c      | 50 +++++++++++++++++++
 sysdeps/loongarch/multiarch/memcpy.c          | 39 +++++++++++++++
 sysdeps/loongarch/multiarch/stpcpy-generic.c  | 25 ++++++++++
 sysdeps/loongarch/multiarch/stpcpy-ual.c      | 43 ++++++++++++++++
 sysdeps/loongarch/multiarch/stpcpy.c          | 37 ++++++++++++++
 .../loongarch/multiarch/wordcopy-ual-inline.c | 31 ++++++++++++
 .../unix/sysv/linux/loongarch/bits/hwcap.h    | 37 ++++++++++++++
 .../sysv/linux/loongarch/loongarch-features.h | 30 +++++++++++
 sysdeps/unix/sysv/linux/loongarch/sysdep.h    |  1 +
 13 files changed, 355 insertions(+)
 create mode 100644 sysdeps/loongarch/loongarch-features.h
 create mode 100644 sysdeps/loongarch/multiarch/Makefile
 create mode 100644 sysdeps/loongarch/multiarch/memcpy-generic.c
 create mode 100644 sysdeps/loongarch/multiarch/memcpy-ual.c
 create mode 100644 sysdeps/loongarch/multiarch/memcpy.c
 create mode 100644 sysdeps/loongarch/multiarch/stpcpy-generic.c
 create mode 100644 sysdeps/loongarch/multiarch/stpcpy-ual.c
 create mode 100644 sysdeps/loongarch/multiarch/stpcpy.c
 create mode 100644 sysdeps/loongarch/multiarch/wordcopy-ual-inline.c
 create mode 100644 sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h
 create mode 100644 sysdeps/unix/sysv/linux/loongarch/loongarch-features.h
  

Comments

caiyinyu April 18, 2023, 3:01 a.m. UTC | #1
We are preparing a series of patches that include ifunc support 
(aligned/unaligned/vectorized assembly implementation) for str/mem 
functions, tunable functionality, and vectorized _dl_runtime_resolve. 
However, we are not currently able to submit them to the upstream 
community. We may consider publishing them on GitHub in the future like 
gcc and binutils.

We will temporarily keep your patches.

在 2023/4/15 下午7:23, Xi Ruoyao 写道:
> LoongArch CPUs may have hardware unaligned access support.  For the
> launched LoongArch CPUs, those branded as Loongson-3 (for desktops or
> servers) have hardware unaligned access support, but those branded as
> Loongson-2 (for embedded or industrial applications) do not.
>
> On Linux, the unaligned access support is indicated by a HWCAP bit
> provided by the kernel.  So we can multiarch stpcpy and memcpy with
> ifunc to take the advantage on the CPUs with unaligned access support.
>
> On a Loongson-3A5000HV CPU running at 2.5GHz, "make bench" has shown
> these changes can really improve the performance:
>
> - https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-stpcpy-summary.txt
> - https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-memcpy-summary.txt
>
> Xi Ruoyao (5):
>    LoongArch: Add bits/hwcap.h for Linux
>    LoongArch: Add LOONGARCH_HAVE_UAL macro
>    string: stpcpy.c: Only alias __stpcpy to stpcpy if STPCPY undefined
>    LoongArch: Multiarch stpcpy for unaligned access
>    LoongArch: Multiarch memcpy for unaligned access
>
>   string/stpcpy.c                               |  3 ++
>   sysdeps/loongarch/loongarch-features.h        | 26 ++++++++++
>   sysdeps/loongarch/multiarch/Makefile          |  6 +++
>   sysdeps/loongarch/multiarch/memcpy-generic.c  | 27 ++++++++++
>   sysdeps/loongarch/multiarch/memcpy-ual.c      | 50 +++++++++++++++++++
>   sysdeps/loongarch/multiarch/memcpy.c          | 39 +++++++++++++++
>   sysdeps/loongarch/multiarch/stpcpy-generic.c  | 25 ++++++++++
>   sysdeps/loongarch/multiarch/stpcpy-ual.c      | 43 ++++++++++++++++
>   sysdeps/loongarch/multiarch/stpcpy.c          | 37 ++++++++++++++
>   .../loongarch/multiarch/wordcopy-ual-inline.c | 31 ++++++++++++
>   .../unix/sysv/linux/loongarch/bits/hwcap.h    | 37 ++++++++++++++
>   .../sysv/linux/loongarch/loongarch-features.h | 30 +++++++++++
>   sysdeps/unix/sysv/linux/loongarch/sysdep.h    |  1 +
>   13 files changed, 355 insertions(+)
>   create mode 100644 sysdeps/loongarch/loongarch-features.h
>   create mode 100644 sysdeps/loongarch/multiarch/Makefile
>   create mode 100644 sysdeps/loongarch/multiarch/memcpy-generic.c
>   create mode 100644 sysdeps/loongarch/multiarch/memcpy-ual.c
>   create mode 100644 sysdeps/loongarch/multiarch/memcpy.c
>   create mode 100644 sysdeps/loongarch/multiarch/stpcpy-generic.c
>   create mode 100644 sysdeps/loongarch/multiarch/stpcpy-ual.c
>   create mode 100644 sysdeps/loongarch/multiarch/stpcpy.c
>   create mode 100644 sysdeps/loongarch/multiarch/wordcopy-ual-inline.c
>   create mode 100644 sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h
>   create mode 100644 sysdeps/unix/sysv/linux/loongarch/loongarch-features.h
>