Message ID | 20230207001618.458947-1-christoph.muellner@vrull.eu |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1BE893858002 for <patchwork@sourceware.org>; Tue, 7 Feb 2023 00:17:18 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by sourceware.org (Postfix) with ESMTPS id 718923858D1E for <libc-alpha@sourceware.org>; Tue, 7 Feb 2023 00:16:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 718923858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-wm1-x32a.google.com with SMTP id f23-20020a05600c491700b003dff4480a17so45513wmp.1 for <libc-alpha@sourceware.org>; Mon, 06 Feb 2023 16:16:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=WgKnV+3J2v4YuJoDkYlfCx91mosE+Gmesa3XwKhI0xk=; b=BAj9pjg7Fe2kx/rHFWVjrKBNEWK/ZscsJUGyHVAwewZU8zcObwvw2MynR5ZHtWwFSc jRJ+owdTlYCoCvM1n6JZFa1OgCgK6GaBrtB9HcyTJRDYfW4aRPsAndV4SLiIjkkO0sq+ FuccNtboYOXTt6j6pmOx2Fvp+D6tSAnOrnPgNz4wNLfCiSSr52ik7TILl+Bb7MwLSTOs mDpEk/bk+xCc6+UZx0Ozap7m+rpRdPhSAjcvLHN/mF1gk/vQe0IWWndOZHZ+6b4ZQdRj 21i4bbfqVRcZtwVcEvRixmWil8ePGQyDUtemHIVnkHTc3dBP1WCehXOBBoLr31h7qpwq kNdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WgKnV+3J2v4YuJoDkYlfCx91mosE+Gmesa3XwKhI0xk=; b=Ue7/hIE0Bt6WCKyQu0O9iwLZWQOm60DtGTy6KWnThexPZkCLPnjHdrGhi/VXGz0vgQ 9DrJKyOao48Ac+hKvpQv8HIlo5JYIt3zdwQbXmKoto0OtHuo+NcBaK6PRNINwWloazzH aigXLfRtzHgSRj+5SIqWSFfyX7s8hEjaTsbFdVKoyOcY2Qe7bhkrjrYl/rv1LvrrXiSf TpTI6N2Vw+S59GkYtLk9Ft8Hq692OzqdgU7XJq/cCwHDfzZPIFB+DzhH/kUvOomCHihk irrRSFiNo1nRrx4zXlOZ4yK1kfjRP37EfEGQt16MFxMg235BzVcdk+RQEHrq565o4T0b /ihw== X-Gm-Message-State: AO0yUKWjyAExta7ZkXltGZJnPFqogj764SDw8JQUB1HbZYIdqyq7srzC 7+oPSMngJfIlIUquHhGPJwloJBxNluoEXyFs X-Google-Smtp-Source: AK7set8ca9J5FlbMZ1CMyT96P2nL1odmw/FipmX+fmnyaPfPFKHU/TUWK1Dv0/RxN4ZT//G03Lpx1Q== X-Received: by 2002:a05:600c:4d23:b0:3da:fc07:5e80 with SMTP id u35-20020a05600c4d2300b003dafc075e80mr787871wmp.12.1675728990650; Mon, 06 Feb 2023 16:16:30 -0800 (PST) Received: from beast.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id f1-20020a1cc901000000b003df14531724sm16862050wmb.21.2023.02.06.16.16.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Feb 2023 16:16:30 -0800 (PST) From: Christoph Muellner <christoph.muellner@vrull.eu> To: libc-alpha@sourceware.org, Palmer Dabbelt <palmer@dabbelt.com>, Darius Rad <darius@bluespec.com>, Andrew Waterman <andrew@sifive.com>, DJ Delorie <dj@redhat.com>, Vineet Gupta <vineetg@rivosinc.com>, Kito Cheng <kito.cheng@sifive.com>, Jeff Law <jeffreyalaw@gmail.com>, Philipp Tomsich <philipp.tomsich@vrull.eu>, Heiko Stuebner <heiko.stuebner@vrull.eu> Cc: =?utf-8?q?Christoph_M=C3=BCllner?= <christoph.muellner@vrull.eu> Subject: [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Date: Tue, 7 Feb 2023 01:15:59 +0100 Message-Id: <20230207001618.458947-1-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, KAM_MANYTO, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
riscv: ifunc support with optimized mem*/str*/cpu_relax routines
|
|
Message
Christoph Müllner
Feb. 7, 2023, 12:15 a.m. UTC
From: Christoph Müllner <christoph.muellner@vrull.eu>
This RFC series introduces ifunc support for RISC-V and adds
optimized routines of memset(), memcpy()/memmove(), strlen(),
strcmp(), strncmp(), and cpu_relax().
The ifunc mechanism desides based on the following hart features:
- Available extensions
- Cache block size
- Fast unaligned accesses
Since we don't have an interface to get this information from the
kernel (at the moment), this patch uses environment variables instead,
which is also why this patch should not be considered for upstream
inclusion and is explicitly tagged as RFC.
The environment variables are:
- RISCV_RT_MARCH (e.g. "rv64gc_zicboz")
- RISCV_RT_CBOZ_BLOCKSIZE (e.g. "64")
- RISCV_RT_CBOM_BLOCKSIZE (e.g. "64")
- RISCV_RT_FAST_UNALIGNED (e.g. "1")
The environment variables are looked up and parsed early during
startup, where other architectures query similar properties from
the kernel or the CPU.
The ifunc implementation can use test macros to select a matching
implementation (e.g. HAVE_RV(zbb) or HAVE_FAST_UNALIGNED()).
The following optimized routines exist:
- memset
- memcpy/memmove
- strlen
- strcmp
- strncmp
- cpu_relax
The following optimizations have been applied:
- excessive loop unrolling
- Zbb's orc.b instruction
- Zbb's ctz intruction
- Zicboz/Zic64b ability to clear a cache block in memory
- Fast unaligned accesses (but with keeping exception guarantees intact)
- Fast overlapping accesses
The patch was developed more than a year ago and was tested as part
of a vendor SDK since then. One of the areas where this patchset
was used is benchmarking (e.g. SPEC CPU2017).
The optimized string functions have been tested with the glibc tests
for that purpose.
The first patch of the series does not strictly belong to this series,
but was required to build and test SPEC CPU2017 benchmarks.
To build a cross-toolchain that includes these patches,
the riscv-gnu-toolchain or any other cross-toolchain
builder can be used.
Christoph Müllner (19):
Inhibit early libcalls before ifunc support is ready
riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol
riscv: Add ENTRY_ALIGN() macro
riscv: Add hart feature run-time detection framework
riscv: Introduction of ISA extensions
riscv: Adding ISA string parser for environment variables
riscv: hart-features: Add fast_unaligned property
riscv: Add (empty) ifunc framework
riscv: Add ifunc support for memset
riscv: Add accelerated memset routines for RV64
riscv: Add ifunc support for memcpy/memmove
riscv: Add accelerated memcpy/memmove routines for RV64
riscv: Add ifunc support for strlen
riscv: Add accelerated strlen routine
riscv: Add ifunc support for strcmp
riscv: Add accelerated strcmp routines
riscv: Add ifunc support for strncmp
riscv: Add an optimized strncmp routine
riscv: Add __riscv_cpu_relax() to allow yielding in busy loops
csu/libc-start.c | 1 +
elf/dl-support.c | 1 +
sysdeps/riscv/dl-machine.h | 13 +
sysdeps/riscv/ldsodefs.h | 1 +
sysdeps/riscv/multiarch/Makefile | 24 +
sysdeps/riscv/multiarch/cpu_relax.c | 36 ++
sysdeps/riscv/multiarch/cpu_relax_impl.S | 40 ++
sysdeps/riscv/multiarch/ifunc-impl-list.c | 70 +++
sysdeps/riscv/multiarch/init-arch.h | 24 +
sysdeps/riscv/multiarch/memcpy.c | 49 ++
sysdeps/riscv/multiarch/memcpy_generic.c | 32 ++
.../riscv/multiarch/memcpy_rv64_unaligned.S | 475 ++++++++++++++++++
sysdeps/riscv/multiarch/memmove.c | 49 ++
sysdeps/riscv/multiarch/memmove_generic.c | 32 ++
sysdeps/riscv/multiarch/memset.c | 52 ++
sysdeps/riscv/multiarch/memset_generic.c | 32 ++
.../riscv/multiarch/memset_rv64_unaligned.S | 31 ++
.../multiarch/memset_rv64_unaligned_cboz64.S | 217 ++++++++
sysdeps/riscv/multiarch/strcmp.c | 47 ++
sysdeps/riscv/multiarch/strcmp_generic.c | 32 ++
sysdeps/riscv/multiarch/strcmp_zbb.S | 104 ++++
.../riscv/multiarch/strcmp_zbb_unaligned.S | 213 ++++++++
sysdeps/riscv/multiarch/strlen.c | 44 ++
sysdeps/riscv/multiarch/strlen_generic.c | 32 ++
sysdeps/riscv/multiarch/strlen_zbb.S | 105 ++++
sysdeps/riscv/multiarch/strncmp.c | 44 ++
sysdeps/riscv/multiarch/strncmp_generic.c | 32 ++
sysdeps/riscv/multiarch/strncmp_zbb.S | 119 +++++
sysdeps/riscv/sys/asm.h | 14 +-
.../unix/sysv/linux/riscv/atomic-machine.h | 3 +
sysdeps/unix/sysv/linux/riscv/dl-procinfo.c | 62 +++
sysdeps/unix/sysv/linux/riscv/dl-procinfo.h | 46 ++
sysdeps/unix/sysv/linux/riscv/hart-features.c | 356 +++++++++++++
sysdeps/unix/sysv/linux/riscv/hart-features.h | 58 +++
.../unix/sysv/linux/riscv/isa-extensions.def | 72 +++
sysdeps/unix/sysv/linux/riscv/libc-start.c | 29 ++
.../unix/sysv/linux/riscv/macro-for-each.h | 24 +
37 files changed, 2610 insertions(+), 5 deletions(-)
create mode 100644 sysdeps/riscv/multiarch/Makefile
create mode 100644 sysdeps/riscv/multiarch/cpu_relax.c
create mode 100644 sysdeps/riscv/multiarch/cpu_relax_impl.S
create mode 100644 sysdeps/riscv/multiarch/ifunc-impl-list.c
create mode 100644 sysdeps/riscv/multiarch/init-arch.h
create mode 100644 sysdeps/riscv/multiarch/memcpy.c
create mode 100644 sysdeps/riscv/multiarch/memcpy_generic.c
create mode 100644 sysdeps/riscv/multiarch/memcpy_rv64_unaligned.S
create mode 100644 sysdeps/riscv/multiarch/memmove.c
create mode 100644 sysdeps/riscv/multiarch/memmove_generic.c
create mode 100644 sysdeps/riscv/multiarch/memset.c
create mode 100644 sysdeps/riscv/multiarch/memset_generic.c
create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned.S
create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned_cboz64.S
create mode 100644 sysdeps/riscv/multiarch/strcmp.c
create mode 100644 sysdeps/riscv/multiarch/strcmp_generic.c
create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb.S
create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb_unaligned.S
create mode 100644 sysdeps/riscv/multiarch/strlen.c
create mode 100644 sysdeps/riscv/multiarch/strlen_generic.c
create mode 100644 sysdeps/riscv/multiarch/strlen_zbb.S
create mode 100644 sysdeps/riscv/multiarch/strncmp.c
create mode 100644 sysdeps/riscv/multiarch/strncmp_generic.c
create mode 100644 sysdeps/riscv/multiarch/strncmp_zbb.S
create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.c
create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.h
create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.c
create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.h
create mode 100644 sysdeps/unix/sysv/linux/riscv/isa-extensions.def
create mode 100644 sysdeps/unix/sysv/linux/riscv/libc-start.c
create mode 100644 sysdeps/unix/sysv/linux/riscv/macro-for-each.h
Comments
nit: copyright year should be 2023
On 06/02/23 21:15, Christoph Muellner wrote: > From: Christoph Müllner <christoph.muellner@vrull.eu> > > This RFC series introduces ifunc support for RISC-V and adds > optimized routines of memset(), memcpy()/memmove(), strlen(), > strcmp(), strncmp(), and cpu_relax(). > > The ifunc mechanism desides based on the following hart features: > - Available extensions > - Cache block size > - Fast unaligned accesses > > Since we don't have an interface to get this information from the > kernel (at the moment), this patch uses environment variables instead, > which is also why this patch should not be considered for upstream > inclusion and is explicitly tagged as RFC. > > The environment variables are: > - RISCV_RT_MARCH (e.g. "rv64gc_zicboz") > - RISCV_RT_CBOZ_BLOCKSIZE (e.g. "64") > - RISCV_RT_CBOM_BLOCKSIZE (e.g. "64") > - RISCV_RT_FAST_UNALIGNED (e.g. "1") > > The environment variables are looked up and parsed early during > startup, where other architectures query similar properties from > the kernel or the CPU. > The ifunc implementation can use test macros to select a matching > implementation (e.g. HAVE_RV(zbb) or HAVE_FAST_UNALIGNED()). So now we have 3 different proposal mechanism to provide implementation runtime selection on riscv: 1. The sysdep mechanism to select optimized routines based on compiler/ABI done at build time. It is the current mechanism and it is also used on rvv routines [1]. 2. A ifunc one using a new riscv syscall to query the kernel the required information. 3. Another ifunc one using riscv specific environment variable. Although all of them are interchangeable in a sense they can be used independently, RISCV is following MIPS on having uncountable minor ABI variants due this exactly available permutations. This incurs in extra maintanance, extra documentation, extra testing, etc. So I would like you RISCV arch-maintainers to first figure out what scheme you want focus on, instead of trying to push multiple fronts with different ad-hoc schemes. The first scheme, which is the oldest one used by architectures like arm, powerpc, mips, etc. is the sysdep where you select the variant at build time. It has the advantage of no need to extra runtime cost or probing, and a slight code size reduction. However it ties the ABI used to build glibc, which means you need multiple libc build if you targeting different chips/ABIs. I recall that Red Hat and SuSE used to provided specialized glibc build for POWER machines to try leverage new chips optimization (libm showed some gain, specially back when ISA 2.05 added rounding instruction, and isa 2.07 GRP to FP special register). But I also recall that it was deprecated over using ifunc to optimize only the required functions that does show performance improvement, since each glibc build variantion required all the steps to validation. And that's why aarch64 and x86_64 initially followed the patch to avoid using sysdeps folder and have a minimum default implementation that works on the minimum support ISA and provide any optimized variant through iFUNC. And that is what I suggest you to do for *rvv*. You can follow x86_64/s390 and add an extra optimization to only build certain variant if the ISA is high enough (for instance, if you targeting rbb, use it as default). It requires a *lot* of boilerplate code, as you can see for the x86_64-vX code recently; but it should integrate better with current ldconfig and newer RPM support (and I expect that other packages managers to follow as well). And it lead us on *how* to select the ABI variants. I am sorry, but I will *block the new environment variables* as is. You might rework it through glibc hardware tunables [2], nevertheless I *strong* suggest you to *first* figure out the kernel interface first prior starting working on providing the optimized routine in glibc. The glibc tunable might then work a way to tune/test/filter the already in place mechanism, ideally it should not rely on user intervention as default. It was not clear from the 'hardware probing user interface' thread [3] why current Linux auxv advertise mechanism are not suffice enough for this specific interface (maybe you want something more generic like a cpuid-like interface). It works for aarch64 and powerpc, so I am not sure why RISCV can't start using it. [1] https://sourceware.org/pipermail/libc-alpha/2023-February/145102.html [2] https://www.gnu.org/software/libc/manual/html_node/Hardware-Capability-Tunables.html [3] https://yhbt.net/lore/all/20221013163551.6775-1-palmer@rivosinc.com/ > > The following optimized routines exist: > - memset It seems that main gain here is unaligned access, loop unrolling, and cache clear instruction. Unfortuantely current implementation does not provide support for any of this, however I wonder if we could parametrize the generic implementation to allow at least some support for fast unaligned memory (we can factor cache clear as well). I am working on refactor memcpy, memmove, memset, and memcmp to get rid of old code and allow to work toward it. > - memcpy/memmove The generic implementation already does some loop unrolling, so I wonder if we can improve the generic implementation by adding a swtich to assume unaligned access (so there is no need to use the two load/merge strategy). One advantage that is not easily reproducable on C is to branch to memcpy on memmove if the copy shoud be fone fowards. This is not easily done on generic implementation because we can't simply call memcpy in such case (since source and destiny can overlap and it might call a memcpy routine that does not support it). My approach on my generic refactor is just to remove the wordcopy and make memcpy and memmove using the same strategy, but with different code. > - strlen The optimized routine seems quite similar to the generic one I installed recently [4], which should use both cbz and orc.b with RISCV hooks [5] [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=350d8d13661a863e6b189f02d876fa265fe71302 [5] https://sourceware.org/git/?p=glibc.git;a=commit;h=25788431c0f5264c4830415de0cdd4d9926cbad9 > - strcmp > - strncmp The current generic implementations [6][7] now have a small advantage where unaligned inputs are also improved by first aligning one input and operating with a double load and merge comparision. [6] https://sourceware.org/git/?p=glibc.git;a=commit;h=30cf54bf3072be942847400c1669bcd63aab039e [7] https://sourceware.org/git/?p=glibc.git;a=commit;h=367c31b5d61164db97834917f5487094ebef2f58 > - cpu_relax > > The following optimizations have been applied: > - excessive loop unrolling > - Zbb's orc.b instruction > - Zbb's ctz intruction > - Zicboz/Zic64b ability to clear a cache block in memory > - Fast unaligned accesses (but with keeping exception guarantees intact) > - Fast overlapping accesses > > The patch was developed more than a year ago and was tested as part > of a vendor SDK since then. One of the areas where this patchset > was used is benchmarking (e.g. SPEC CPU2017). > The optimized string functions have been tested with the glibc tests > for that purpose. > > The first patch of the series does not strictly belong to this series, > but was required to build and test SPEC CPU2017 benchmarks. > > To build a cross-toolchain that includes these patches, > the riscv-gnu-toolchain or any other cross-toolchain > builder can be used. > > Christoph Müllner (19): > Inhibit early libcalls before ifunc support is ready > riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol > riscv: Add ENTRY_ALIGN() macro > riscv: Add hart feature run-time detection framework > riscv: Introduction of ISA extensions > riscv: Adding ISA string parser for environment variables > riscv: hart-features: Add fast_unaligned property > riscv: Add (empty) ifunc framework > riscv: Add ifunc support for memset > riscv: Add accelerated memset routines for RV64 > riscv: Add ifunc support for memcpy/memmove > riscv: Add accelerated memcpy/memmove routines for RV64 > riscv: Add ifunc support for strlen > riscv: Add accelerated strlen routine > riscv: Add ifunc support for strcmp > riscv: Add accelerated strcmp routines > riscv: Add ifunc support for strncmp > riscv: Add an optimized strncmp routine > riscv: Add __riscv_cpu_relax() to allow yielding in busy loops > > csu/libc-start.c | 1 + > elf/dl-support.c | 1 + > sysdeps/riscv/dl-machine.h | 13 + > sysdeps/riscv/ldsodefs.h | 1 + > sysdeps/riscv/multiarch/Makefile | 24 + > sysdeps/riscv/multiarch/cpu_relax.c | 36 ++ > sysdeps/riscv/multiarch/cpu_relax_impl.S | 40 ++ > sysdeps/riscv/multiarch/ifunc-impl-list.c | 70 +++ > sysdeps/riscv/multiarch/init-arch.h | 24 + > sysdeps/riscv/multiarch/memcpy.c | 49 ++ > sysdeps/riscv/multiarch/memcpy_generic.c | 32 ++ > .../riscv/multiarch/memcpy_rv64_unaligned.S | 475 ++++++++++++++++++ > sysdeps/riscv/multiarch/memmove.c | 49 ++ > sysdeps/riscv/multiarch/memmove_generic.c | 32 ++ > sysdeps/riscv/multiarch/memset.c | 52 ++ > sysdeps/riscv/multiarch/memset_generic.c | 32 ++ > .../riscv/multiarch/memset_rv64_unaligned.S | 31 ++ > .../multiarch/memset_rv64_unaligned_cboz64.S | 217 ++++++++ > sysdeps/riscv/multiarch/strcmp.c | 47 ++ > sysdeps/riscv/multiarch/strcmp_generic.c | 32 ++ > sysdeps/riscv/multiarch/strcmp_zbb.S | 104 ++++ > .../riscv/multiarch/strcmp_zbb_unaligned.S | 213 ++++++++ > sysdeps/riscv/multiarch/strlen.c | 44 ++ > sysdeps/riscv/multiarch/strlen_generic.c | 32 ++ > sysdeps/riscv/multiarch/strlen_zbb.S | 105 ++++ > sysdeps/riscv/multiarch/strncmp.c | 44 ++ > sysdeps/riscv/multiarch/strncmp_generic.c | 32 ++ > sysdeps/riscv/multiarch/strncmp_zbb.S | 119 +++++ > sysdeps/riscv/sys/asm.h | 14 +- > .../unix/sysv/linux/riscv/atomic-machine.h | 3 + > sysdeps/unix/sysv/linux/riscv/dl-procinfo.c | 62 +++ > sysdeps/unix/sysv/linux/riscv/dl-procinfo.h | 46 ++ > sysdeps/unix/sysv/linux/riscv/hart-features.c | 356 +++++++++++++ > sysdeps/unix/sysv/linux/riscv/hart-features.h | 58 +++ > .../unix/sysv/linux/riscv/isa-extensions.def | 72 +++ > sysdeps/unix/sysv/linux/riscv/libc-start.c | 29 ++ > .../unix/sysv/linux/riscv/macro-for-each.h | 24 + > 37 files changed, 2610 insertions(+), 5 deletions(-) > create mode 100644 sysdeps/riscv/multiarch/Makefile > create mode 100644 sysdeps/riscv/multiarch/cpu_relax.c > create mode 100644 sysdeps/riscv/multiarch/cpu_relax_impl.S > create mode 100644 sysdeps/riscv/multiarch/ifunc-impl-list.c > create mode 100644 sysdeps/riscv/multiarch/init-arch.h > create mode 100644 sysdeps/riscv/multiarch/memcpy.c > create mode 100644 sysdeps/riscv/multiarch/memcpy_generic.c > create mode 100644 sysdeps/riscv/multiarch/memcpy_rv64_unaligned.S > create mode 100644 sysdeps/riscv/multiarch/memmove.c > create mode 100644 sysdeps/riscv/multiarch/memmove_generic.c > create mode 100644 sysdeps/riscv/multiarch/memset.c > create mode 100644 sysdeps/riscv/multiarch/memset_generic.c > create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned.S > create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned_cboz64.S > create mode 100644 sysdeps/riscv/multiarch/strcmp.c > create mode 100644 sysdeps/riscv/multiarch/strcmp_generic.c > create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb.S > create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb_unaligned.S > create mode 100644 sysdeps/riscv/multiarch/strlen.c > create mode 100644 sysdeps/riscv/multiarch/strlen_generic.c > create mode 100644 sysdeps/riscv/multiarch/strlen_zbb.S > create mode 100644 sysdeps/riscv/multiarch/strncmp.c > create mode 100644 sysdeps/riscv/multiarch/strncmp_generic.c > create mode 100644 sysdeps/riscv/multiarch/strncmp_zbb.S > create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.h > create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.h > create mode 100644 sysdeps/unix/sysv/linux/riscv/isa-extensions.def > create mode 100644 sysdeps/unix/sysv/linux/riscv/libc-start.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/macro-for-each.h >
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes: > So now we have 3 different proposal mechanism to provide implementation runtime > selection on riscv: > > 1. The sysdep mechanism to select optimized routines based on compiler/ABI > done at build time. It is the current mechanism and it is also used > on rvv routines [1]. > > 2. A ifunc one using a new riscv syscall to query the kernel the required > information. > > 3. Another ifunc one using riscv specific environment variable. I'm also going to oppose #3 on principles. We've been removing the use of environment variables for tuning, in favor of tunables. If we have a way to auto-detect the best implementation without relying on the user, that's my preference. Users are unreliable and require documentation. The compiler likely doesn't have access to the hardware[*], so must rely on the user. Thus, my preference is #2 - the kernel has access to the hardware and its device tree, and can tell the userspace what capabilities are available. I would not be opposed to a tunable that overrides the autodetection; we have something similar for x86. But the default (and should be) is "works basically correctly without user intervention". [*] you can run gcc on the "right" hardware, but typically we build-once-run-everywhere.
On Tue, 7 Feb 2023 at 18:16, DJ Delorie <dj@redhat.com> wrote: > > Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes: > > So now we have 3 different proposal mechanism to provide implementation runtime > > selection on riscv: > > > > 1. The sysdep mechanism to select optimized routines based on compiler/ABI > > done at build time. It is the current mechanism and it is also used > > on rvv routines [1]. > > > > 2. A ifunc one using a new riscv syscall to query the kernel the required > > information. > > > > 3. Another ifunc one using riscv specific environment variable. > > I'm also going to oppose #3 on principles. We've been removing the use > of environment variables for tuning, in favor of tunables. You may have missed the essential part of the commit message: > > Since we don't have an interface to get this information from the > > kernel (at the moment), this patch uses environment variables instead, > > which is also why this patch should not be considered for upstream > > inclusion and is explicitly tagged as RFC. So this patch has always been a stand-in until option #2 is ready. I am strongly opinionated towards a mechanism that uses existing mechanisms in the ELF auxiliary vector to pass information — and tries to avoid the introduction of a new arch-specific syscall. if possible. > If we have a way to auto-detect the best implementation without relying > on the user, that's my preference. Users are unreliable and require > documentation. The compiler likely doesn't have access to the > hardware[*], so must rely on the user. Thus, my preference is #2 - the > kernel has access to the hardware and its device tree, and can tell the > userspace what capabilities are available. > > I would not be opposed to a tunable that overrides the autodetection; we > have something similar for x86. But the default (and should be) is > "works basically correctly without user intervention". > > [*] you can run gcc on the "right" hardware, but typically we > build-once-run-everywhere. >
Philipp Tomsich <philipp.tomsich@vrull.eu> writes: > So this patch has always been a stand-in until option #2 is ready. > I am strongly opinionated towards a mechanism that uses existing > mechanisms in the ELF auxiliary vector to pass information — and tries > to avoid the introduction of a new arch-specific syscall. if possible. If the patch were converted to use tunables, it could be more than a standin. It's the environment variable itself I'm opposed to.
On Tue, Feb 7, 2023 at 10:14 PM DJ Delorie <dj@redhat.com> wrote: > > Philipp Tomsich <philipp.tomsich@vrull.eu> writes: > > So this patch has always been a stand-in until option #2 is ready. > > I am strongly opinionated towards a mechanism that uses existing > > mechanisms in the ELF auxiliary vector to pass information — and tries > > to avoid the introduction of a new arch-specific syscall. if possible. > > If the patch were converted to use tunables, it could be more than a > standin. It's the environment variable itself I'm opposed to. > Thanks DJ and Adhemerval for your valuable inputs! As said in the cover letter, the environment variable approach was not meant to be merged but represents a starting point for discussions. It is not what we want but serves as a dirty placeholder, that allows development of optimized routines. The IFUNC support and the kernel-userspace API was discussed multiple times in the past (e.g. on LPC2021 and LPC2022). There are different opinions on the approaches so the whole process is regularly getting stuck. The topic, where most (if not all) in the RISC-V community agree on, is that we don't want a compile-time-only approach. Patches that rely on a compile-time-only approach are most likely written such, because of the absence of ifunc support. Meanwhile, we have heard from multiple vendors to work on their own solutions downstream, which results in a duplication of work and not necessarily in a common solution upstream. This patchset was sent out to move the discussion from the idea level to actual code, which can be reviewed, criticized, tested, improved, and reused (get a common ground for RISC-V vendors). Both of your comments show how we can move this patchset forward: - eliminate the first patch See also: https://sourceware.org/bugzilla/show_bug.cgi?id=30095 - work on the kernel-userspace interface to query capabilities - use tunables instead of env variable - look how we can reuse more of the generic implementation to minimize ASM code with little or no benefits I fully agree with all the mentioned points. Thanks, Christoph