[0/4] riscv: Vectorized mem*/str* function

Message ID 20230301153247.1499566-1-vincent.chen@sifive.com
Headers
Series riscv: Vectorized mem*/str* function |

Message

Vincent Chen March 1, 2023, 3:32 p.m. UTC
  This patch proposes implementations of memchr, memcmp, memcpy, memmove,
memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp,
strncpy and strnlen that leverage the RISC-V V extension (RVV), version
1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These
routines are from https://github.com/sifive/sifive-libc, which we agree
to be contributed to the Free Software Foundation. With regards to
IFUNC, some details concerning `hwcap` are still under discussion in the
community. For the purposes of reviewing this patch, we have temporarily
opted for RVV delegation at compile time. Once the `hwcap` mechanism is
ready, we’ll rebase on it.

These routines assume VLEN is at least 32 bits, as is required by all
currently defined vector extensions, and they support arbitrarily large
VLEN. All implementations work for both RV32 and RV64 platforms, and
make no assumptions about page size.

The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code
size, while the `str*` (unknown-length) routines use LMUL=1 instead.
Longer LMUL will still minimize dynamic code size for the latter
routines, but it will also increase the cost of the remainder/tail loop:
more data loaded and comparisons performed past the `\0`. This overhead
will be particularly pronounced for smaller strings.

Measured performance improvements of the vectorized ("rvv")
implementations vs. the existing Glibc ("scalar") implementations are as
follows:
memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms)
memcmp: 55%
memcpy: 88%
memmove: 80%
memset: 88%
strcmp: 85%
strlen: 70%
strcat: 53%
strchr: 85%
strcpy: 70%
strncmp 90%
strncat: 50%
strncpy: 60%
strnlen: 80%
Above data are collected on SiFive X280 (FPGA simulation), across a wide
range of problem sizes.


Jerry Shih (2):
  riscv: vectorized mem* functions
  riscv: vectorized str* functions

Nick Knight (1):
  riscv: vectorized strchr and strnlen functions

Vincent Chen (1):
  riscv: Enabling vectorized mem*/str* functions in build time

 scripts/build-many-glibcs.py   |  10 +++
 sysdeps/riscv/preconfigure     |  19 +++++
 sysdeps/riscv/preconfigure.ac  |  18 +++++
 sysdeps/riscv/rv32/rvv/Implies |   2 +
 sysdeps/riscv/rv64/rvv/Implies |   2 +
 sysdeps/riscv/rvv/memchr.S     |  63 +++++++++++++++
 sysdeps/riscv/rvv/memcmp.S     |  75 ++++++++++++++++++
 sysdeps/riscv/rvv/memcpy.S     |  51 +++++++++++++
 sysdeps/riscv/rvv/memmove.S    |  72 ++++++++++++++++++
 sysdeps/riscv/rvv/memset.S     |  51 +++++++++++++
 sysdeps/riscv/rvv/strcat.S     |  72 ++++++++++++++++++
 sysdeps/riscv/rvv/strchr.S     |  53 +++++++++++++
 sysdeps/riscv/rvv/strcmp.S     | 135 +++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strcpy.S     |  56 ++++++++++++++
 sysdeps/riscv/rvv/strlen.S     |  54 +++++++++++++
 sysdeps/riscv/rvv/strncat.S    |  83 ++++++++++++++++++++
 sysdeps/riscv/rvv/strncmp.S    |  85 +++++++++++++++++++++
 sysdeps/riscv/rvv/strncpy.S    |  86 +++++++++++++++++++++
 sysdeps/riscv/rvv/strnlen.S    |  56 ++++++++++++++
 19 files changed, 1043 insertions(+)
 create mode 100644 sysdeps/riscv/rv32/rvv/Implies
 create mode 100644 sysdeps/riscv/rv64/rvv/Implies
 create mode 100644 sysdeps/riscv/rvv/memchr.S
 create mode 100644 sysdeps/riscv/rvv/memcmp.S
 create mode 100644 sysdeps/riscv/rvv/memcpy.S
 create mode 100644 sysdeps/riscv/rvv/memmove.S
 create mode 100644 sysdeps/riscv/rvv/memset.S
 create mode 100644 sysdeps/riscv/rvv/strcat.S
 create mode 100644 sysdeps/riscv/rvv/strchr.S
 create mode 100644 sysdeps/riscv/rvv/strcmp.S
 create mode 100644 sysdeps/riscv/rvv/strcpy.S
 create mode 100644 sysdeps/riscv/rvv/strlen.S
 create mode 100644 sysdeps/riscv/rvv/strncat.S
 create mode 100644 sysdeps/riscv/rvv/strncmp.S
 create mode 100644 sysdeps/riscv/rvv/strncpy.S
 create mode 100644 sysdeps/riscv/rvv/strnlen.S