From patchwork Tue Feb 7 00:15:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 55465 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1BE893858002 for ; Tue, 7 Feb 2023 00:17:18 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by sourceware.org (Postfix) with ESMTPS id 718923858D1E for ; Tue, 7 Feb 2023 00:16:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 718923858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-wm1-x32a.google.com with SMTP id f23-20020a05600c491700b003dff4480a17so45513wmp.1 for ; Mon, 06 Feb 2023 16:16:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=WgKnV+3J2v4YuJoDkYlfCx91mosE+Gmesa3XwKhI0xk=; b=BAj9pjg7Fe2kx/rHFWVjrKBNEWK/ZscsJUGyHVAwewZU8zcObwvw2MynR5ZHtWwFSc jRJ+owdTlYCoCvM1n6JZFa1OgCgK6GaBrtB9HcyTJRDYfW4aRPsAndV4SLiIjkkO0sq+ FuccNtboYOXTt6j6pmOx2Fvp+D6tSAnOrnPgNz4wNLfCiSSr52ik7TILl+Bb7MwLSTOs mDpEk/bk+xCc6+UZx0Ozap7m+rpRdPhSAjcvLHN/mF1gk/vQe0IWWndOZHZ+6b4ZQdRj 21i4bbfqVRcZtwVcEvRixmWil8ePGQyDUtemHIVnkHTc3dBP1WCehXOBBoLr31h7qpwq kNdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WgKnV+3J2v4YuJoDkYlfCx91mosE+Gmesa3XwKhI0xk=; b=Ue7/hIE0Bt6WCKyQu0O9iwLZWQOm60DtGTy6KWnThexPZkCLPnjHdrGhi/VXGz0vgQ 9DrJKyOao48Ac+hKvpQv8HIlo5JYIt3zdwQbXmKoto0OtHuo+NcBaK6PRNINwWloazzH aigXLfRtzHgSRj+5SIqWSFfyX7s8hEjaTsbFdVKoyOcY2Qe7bhkrjrYl/rv1LvrrXiSf TpTI6N2Vw+S59GkYtLk9Ft8Hq692OzqdgU7XJq/cCwHDfzZPIFB+DzhH/kUvOomCHihk irrRSFiNo1nRrx4zXlOZ4yK1kfjRP37EfEGQt16MFxMg235BzVcdk+RQEHrq565o4T0b /ihw== X-Gm-Message-State: AO0yUKWjyAExta7ZkXltGZJnPFqogj764SDw8JQUB1HbZYIdqyq7srzC 7+oPSMngJfIlIUquHhGPJwloJBxNluoEXyFs X-Google-Smtp-Source: AK7set8ca9J5FlbMZ1CMyT96P2nL1odmw/FipmX+fmnyaPfPFKHU/TUWK1Dv0/RxN4ZT//G03Lpx1Q== X-Received: by 2002:a05:600c:4d23:b0:3da:fc07:5e80 with SMTP id u35-20020a05600c4d2300b003dafc075e80mr787871wmp.12.1675728990650; Mon, 06 Feb 2023 16:16:30 -0800 (PST) Received: from beast.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id f1-20020a1cc901000000b003df14531724sm16862050wmb.21.2023.02.06.16.16.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Feb 2023 16:16:30 -0800 (PST) From: Christoph Muellner To: libc-alpha@sourceware.org, Palmer Dabbelt , Darius Rad , Andrew Waterman , DJ Delorie , Vineet Gupta , Kito Cheng , Jeff Law , Philipp Tomsich , Heiko Stuebner Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Date: Tue, 7 Feb 2023 01:15:59 +0100 Message-Id: <20230207001618.458947-1-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, KAM_MANYTO, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From: Christoph Müllner This RFC series introduces ifunc support for RISC-V and adds optimized routines of memset(), memcpy()/memmove(), strlen(), strcmp(), strncmp(), and cpu_relax(). The ifunc mechanism desides based on the following hart features: - Available extensions - Cache block size - Fast unaligned accesses Since we don't have an interface to get this information from the kernel (at the moment), this patch uses environment variables instead, which is also why this patch should not be considered for upstream inclusion and is explicitly tagged as RFC. The environment variables are: - RISCV_RT_MARCH (e.g. "rv64gc_zicboz") - RISCV_RT_CBOZ_BLOCKSIZE (e.g. "64") - RISCV_RT_CBOM_BLOCKSIZE (e.g. "64") - RISCV_RT_FAST_UNALIGNED (e.g. "1") The environment variables are looked up and parsed early during startup, where other architectures query similar properties from the kernel or the CPU. The ifunc implementation can use test macros to select a matching implementation (e.g. HAVE_RV(zbb) or HAVE_FAST_UNALIGNED()). The following optimized routines exist: - memset - memcpy/memmove - strlen - strcmp - strncmp - cpu_relax The following optimizations have been applied: - excessive loop unrolling - Zbb's orc.b instruction - Zbb's ctz intruction - Zicboz/Zic64b ability to clear a cache block in memory - Fast unaligned accesses (but with keeping exception guarantees intact) - Fast overlapping accesses The patch was developed more than a year ago and was tested as part of a vendor SDK since then. One of the areas where this patchset was used is benchmarking (e.g. SPEC CPU2017). The optimized string functions have been tested with the glibc tests for that purpose. The first patch of the series does not strictly belong to this series, but was required to build and test SPEC CPU2017 benchmarks. To build a cross-toolchain that includes these patches, the riscv-gnu-toolchain or any other cross-toolchain builder can be used. Christoph Müllner (19): Inhibit early libcalls before ifunc support is ready riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol riscv: Add ENTRY_ALIGN() macro riscv: Add hart feature run-time detection framework riscv: Introduction of ISA extensions riscv: Adding ISA string parser for environment variables riscv: hart-features: Add fast_unaligned property riscv: Add (empty) ifunc framework riscv: Add ifunc support for memset riscv: Add accelerated memset routines for RV64 riscv: Add ifunc support for memcpy/memmove riscv: Add accelerated memcpy/memmove routines for RV64 riscv: Add ifunc support for strlen riscv: Add accelerated strlen routine riscv: Add ifunc support for strcmp riscv: Add accelerated strcmp routines riscv: Add ifunc support for strncmp riscv: Add an optimized strncmp routine riscv: Add __riscv_cpu_relax() to allow yielding in busy loops csu/libc-start.c | 1 + elf/dl-support.c | 1 + sysdeps/riscv/dl-machine.h | 13 + sysdeps/riscv/ldsodefs.h | 1 + sysdeps/riscv/multiarch/Makefile | 24 + sysdeps/riscv/multiarch/cpu_relax.c | 36 ++ sysdeps/riscv/multiarch/cpu_relax_impl.S | 40 ++ sysdeps/riscv/multiarch/ifunc-impl-list.c | 70 +++ sysdeps/riscv/multiarch/init-arch.h | 24 + sysdeps/riscv/multiarch/memcpy.c | 49 ++ sysdeps/riscv/multiarch/memcpy_generic.c | 32 ++ .../riscv/multiarch/memcpy_rv64_unaligned.S | 475 ++++++++++++++++++ sysdeps/riscv/multiarch/memmove.c | 49 ++ sysdeps/riscv/multiarch/memmove_generic.c | 32 ++ sysdeps/riscv/multiarch/memset.c | 52 ++ sysdeps/riscv/multiarch/memset_generic.c | 32 ++ .../riscv/multiarch/memset_rv64_unaligned.S | 31 ++ .../multiarch/memset_rv64_unaligned_cboz64.S | 217 ++++++++ sysdeps/riscv/multiarch/strcmp.c | 47 ++ sysdeps/riscv/multiarch/strcmp_generic.c | 32 ++ sysdeps/riscv/multiarch/strcmp_zbb.S | 104 ++++ .../riscv/multiarch/strcmp_zbb_unaligned.S | 213 ++++++++ sysdeps/riscv/multiarch/strlen.c | 44 ++ sysdeps/riscv/multiarch/strlen_generic.c | 32 ++ sysdeps/riscv/multiarch/strlen_zbb.S | 105 ++++ sysdeps/riscv/multiarch/strncmp.c | 44 ++ sysdeps/riscv/multiarch/strncmp_generic.c | 32 ++ sysdeps/riscv/multiarch/strncmp_zbb.S | 119 +++++ sysdeps/riscv/sys/asm.h | 14 +- .../unix/sysv/linux/riscv/atomic-machine.h | 3 + sysdeps/unix/sysv/linux/riscv/dl-procinfo.c | 62 +++ sysdeps/unix/sysv/linux/riscv/dl-procinfo.h | 46 ++ sysdeps/unix/sysv/linux/riscv/hart-features.c | 356 +++++++++++++ sysdeps/unix/sysv/linux/riscv/hart-features.h | 58 +++ .../unix/sysv/linux/riscv/isa-extensions.def | 72 +++ sysdeps/unix/sysv/linux/riscv/libc-start.c | 29 ++ .../unix/sysv/linux/riscv/macro-for-each.h | 24 + 37 files changed, 2610 insertions(+), 5 deletions(-) create mode 100644 sysdeps/riscv/multiarch/Makefile create mode 100644 sysdeps/riscv/multiarch/cpu_relax.c create mode 100644 sysdeps/riscv/multiarch/cpu_relax_impl.S create mode 100644 sysdeps/riscv/multiarch/ifunc-impl-list.c create mode 100644 sysdeps/riscv/multiarch/init-arch.h create mode 100644 sysdeps/riscv/multiarch/memcpy.c create mode 100644 sysdeps/riscv/multiarch/memcpy_generic.c create mode 100644 sysdeps/riscv/multiarch/memcpy_rv64_unaligned.S create mode 100644 sysdeps/riscv/multiarch/memmove.c create mode 100644 sysdeps/riscv/multiarch/memmove_generic.c create mode 100644 sysdeps/riscv/multiarch/memset.c create mode 100644 sysdeps/riscv/multiarch/memset_generic.c create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned.S create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned_cboz64.S create mode 100644 sysdeps/riscv/multiarch/strcmp.c create mode 100644 sysdeps/riscv/multiarch/strcmp_generic.c create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb.S create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb_unaligned.S create mode 100644 sysdeps/riscv/multiarch/strlen.c create mode 100644 sysdeps/riscv/multiarch/strlen_generic.c create mode 100644 sysdeps/riscv/multiarch/strlen_zbb.S create mode 100644 sysdeps/riscv/multiarch/strncmp.c create mode 100644 sysdeps/riscv/multiarch/strncmp_generic.c create mode 100644 sysdeps/riscv/multiarch/strncmp_zbb.S create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.c create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.h create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.c create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.h create mode 100644 sysdeps/unix/sysv/linux/riscv/isa-extensions.def create mode 100644 sysdeps/unix/sysv/linux/riscv/libc-start.c create mode 100644 sysdeps/unix/sysv/linux/riscv/macro-for-each.h