Message ID | 20230504074851.38763-1-hau.hsu@sifive.com |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B83F93856DF1 for <patchwork@sourceware.org>; Thu, 4 May 2023 07:49:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B83F93856DF1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1683186598; bh=J9ZLr3bKV0oV4ZVjlQcw9tFCuZJ99Stl46gW5EJz/Yk=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=LPqvxV2JwhS5fbuKOmsgFDng784zGwveVPtX0UT/lXohmL48IDAIQqYMTRHfyIEme dOE3qbjaxMwFZlxQ9lr4pumBrzkwpjSKNLZsO88HqnSlGkz21F2SjWxdI/JO2WOR8S sYBExL3IkpADvS6HeWR5RXftAH9x3P0n2iqxsQ3U= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 5BCFD3858439 for <libc-alpha@sourceware.org>; Thu, 4 May 2023 07:49:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5BCFD3858439 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1ab0c697c2bso1022385ad.1 for <libc-alpha@sourceware.org>; Thu, 04 May 2023 00:49:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683186557; x=1685778557; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=J9ZLr3bKV0oV4ZVjlQcw9tFCuZJ99Stl46gW5EJz/Yk=; b=l5QvhF4cYmylJAlfJFOnmq+wHLA/irnc1hsVCmxw91k7D4W6RK23aj5sSrg6ZFsHp4 spFOvO8Is5dYUWIPpxRBJK1WqRd7v18t1opK/DuiCA9xu2XV/a1oS2/ifDRLe9vUKv3g QC7oa8bLUtp0e1oSRmu+Pw6aYJwxf7186twOQIS62Tsyk86vhEhoBmIvsGfzQxQjscXk IJf0UJkYooXZZqngeZICP77Kjrsu78+kki8ZJuyem6YOHY+jH0Ac4DutBISVfOY8a1CN RkhMakHCXLpQ8/wIMntuvUAoGHd8X/XpD9rDQVVmaYXoyrAWIU8Lw7onS4Gkrozpt1rU m3zA== X-Gm-Message-State: AC+VfDyA2IL4yiwP+5Mas2yhZn+YYmair9C+PUArVnQLe+R2S22i+9Wp DiaNsGPFSTaxElx19BPcYUnk2LPaKAr1flX5iq1jukJiX1Gpeu7Qmh8VW7Pa6jKVNfnGPAUjLpD HzDrh2P30+R3JQgcqpBsUrweSwqCDT5i5HNj5qOFv5dntygJrXIWQPl0MCl3bbiaYhv3+NENq1E Zb6Q== X-Google-Smtp-Source: ACHHUZ67JudaSU+UEsSJ0kni1bvipTF69hzy9cD1f/VfcKII3WXBTTQuAiFs+gK1lC+63iW9veRoEQ== X-Received: by 2002:a17:902:82c3:b0:1a9:1b4:9fd5 with SMTP id u3-20020a17090282c300b001a901b49fd5mr2572348plz.68.1683186556996; Thu, 04 May 2023 00:49:16 -0700 (PDT) Received: from localhost.localdomain (36-238-22-214.dynamic-ip.hinet.net. [36.238.22.214]) by smtp.gmail.com with ESMTPSA id y18-20020a17090322d200b001ab06958770sm4875294plg.161.2023.05.04.00.49.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 May 2023 00:49:16 -0700 (PDT) To: libc-alpha@sourceware.org Cc: hau.hsu@sifive.com, kito.cheng@sifive.com, nick.knight@sifive.com, jerry.shih@sifive.com, vincent.chen@sifive.com, hongrong.hsu@sifive.com Subject: [PATCH v3 0/5] riscv: Vectorized mem*/str* function Date: Thu, 4 May 2023 15:48:46 +0800 Message-Id: <20230504074851.38763-1-hau.hsu@sifive.com> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Hau Hsu via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Hau Hsu <hau.hsu@sifive.com> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
riscv: Vectorized mem*/str* function
|
|
Message
Hau Hsu
May 4, 2023, 7:48 a.m. UTC
This is v3 patchset of adding vectorized mem*/str* functions for RISC-V. This patch proposes implementations of memchr, memcmp, memcpy, memmove, memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp, strncpy and strnlen that leverage the RISC-V V extension (RVV), version 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These routines are from https://github.com/sifive/sifive-libc, which we agree to be contributed to the Free Software Foundation. With regards to IFUNC, some details concerning `hwcap` are still under discussion in the community. For the purposes of reviewing this patch, we have temporarily opted for RVV delegation at compile time. Once the `hwcap` mechanism is ready, we’ll rebase on it. These routines assume VLEN is at least 32 bits, as is required by all currently defined vector extensions, and they support arbitrarily large VLEN. All implementations work for both RV32 and RV64 platforms, and make no assumptions about page size. The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code size, while the `str*` (unknown-length) routines use LMUL=1 instead. Longer LMUL will still minimize dynamic code size for the latter routines, but it will also increase the cost of the remainder/tail loop: more data loaded and comparisons performed past the `\0`. This overhead will be particularly pronounced for smaller strings. Measured performance improvements of the vectorized ("rvv") implementations vs. the existing Glibc ("scalar") implementations are as follows: memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms) memcmp: 55% memcpy: 88% memmove: 80% memset: 88% strcmp: 85% strlen: 70% strcat: 53% strchr: 85% strcpy: 70% strncmp 90% strncat: 50% strncpy: 60% strnlen: 80% Above data are collected on SiFive X280 (FPGA simulation), across a wide range of problem sizes. v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html * add RISC-V vectoriezed mem*/str* functions v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html * include the __memcmpeq function * set lmul=1 for memcmp for generality v3: * remove "Contributed by" comments * fix licesnce headers * avoid using camelcase variables * avoid using C99 one line comment Jerry Shih (2): riscv: vectorized mem* functions riscv: vectorized str* functions Nick Knight (1): riscv: vectorized strchr and strnlen functions Vincent Chen (1): riscv: Enabling vectorized mem*/str* functions in build time Yun Hsiang (1): riscv: add vectorized __memcmpeq scripts/build-many-glibcs.py | 10 ++++ sysdeps/riscv/preconfigure | 19 ++++++++ sysdeps/riscv/preconfigure.ac | 18 +++++++ sysdeps/riscv/rv32/rvv/Implies | 2 + sysdeps/riscv/rv64/rvv/Implies | 2 + sysdeps/riscv/rvv/memchr.S | 62 ++++++++++++++++++++++++ sysdeps/riscv/rvv/memcmp.S | 70 +++++++++++++++++++++++++++ sysdeps/riscv/rvv/memcmpeq.S | 67 ++++++++++++++++++++++++++ sysdeps/riscv/rvv/memcpy.S | 50 +++++++++++++++++++ sysdeps/riscv/rvv/memmove.S | 71 +++++++++++++++++++++++++++ sysdeps/riscv/rvv/memset.S | 49 +++++++++++++++++++ sysdeps/riscv/rvv/strcat.S | 71 +++++++++++++++++++++++++++ sysdeps/riscv/rvv/strchr.S | 62 ++++++++++++++++++++++++ sysdeps/riscv/rvv/strcmp.S | 88 ++++++++++++++++++++++++++++++++++ sysdeps/riscv/rvv/strcpy.S | 55 +++++++++++++++++++++ sysdeps/riscv/rvv/strlen.S | 53 ++++++++++++++++++++ sysdeps/riscv/rvv/strncat.S | 82 +++++++++++++++++++++++++++++++ sysdeps/riscv/rvv/strncmp.S | 84 ++++++++++++++++++++++++++++++++ sysdeps/riscv/rvv/strncpy.S | 85 ++++++++++++++++++++++++++++++++ sysdeps/riscv/rvv/strnlen.S | 55 +++++++++++++++++++++ 20 files changed, 1055 insertions(+) create mode 100644 sysdeps/riscv/rv32/rvv/Implies create mode 100644 sysdeps/riscv/rv64/rvv/Implies create mode 100644 sysdeps/riscv/rvv/memchr.S create mode 100644 sysdeps/riscv/rvv/memcmp.S create mode 100644 sysdeps/riscv/rvv/memcmpeq.S create mode 100644 sysdeps/riscv/rvv/memcpy.S create mode 100644 sysdeps/riscv/rvv/memmove.S create mode 100644 sysdeps/riscv/rvv/memset.S create mode 100644 sysdeps/riscv/rvv/strcat.S create mode 100644 sysdeps/riscv/rvv/strchr.S create mode 100644 sysdeps/riscv/rvv/strcmp.S create mode 100644 sysdeps/riscv/rvv/strcpy.S create mode 100644 sysdeps/riscv/rvv/strlen.S create mode 100644 sysdeps/riscv/rvv/strncat.S create mode 100644 sysdeps/riscv/rvv/strncmp.S create mode 100644 sysdeps/riscv/rvv/strncpy.S create mode 100644 sysdeps/riscv/rvv/strnlen.S
Comments
On Thu, 04 May 2023 00:48:46 PDT (-0700), hau.hsu@sifive.com wrote: > This is v3 patchset of adding vectorized mem*/str* functions for > RISC-V. > > This patch proposes implementations of memchr, memcmp, memcpy, memmove, > memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp, > strncpy and strnlen that leverage the RISC-V V extension (RVV), version > 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These > routines are from https://github.com/sifive/sifive-libc, which we agree > to be contributed to the Free Software Foundation. With regards to > IFUNC, some details concerning `hwcap` are still under discussion in the > community. For the purposes of reviewing this patch, we have temporarily > opted for RVV delegation at compile time. Once the `hwcap` mechanism is > ready, we’ll rebase on it. IMO it's fine to allow users to build a glibc that assumes the V extension, so we don't need to block this on having the dynamic probing working. That said, we do need to get the Linux uABI sorted out as right now we can't even turn on V for userspace. > These routines assume VLEN is at least 32 bits, as is required by all > currently defined vector extensions, and they support arbitrarily large > VLEN. All implementations work for both RV32 and RV64 platforms, and > make no assumptions about page size. > > The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code > size, while the `str*` (unknown-length) routines use LMUL=1 instead. > Longer LMUL will still minimize dynamic code size for the latter > routines, but it will also increase the cost of the remainder/tail loop: > more data loaded and comparisons performed past the `\0`. This overhead > will be particularly pronounced for smaller strings. > > Measured performance improvements of the vectorized ("rvv") > implementations vs. the existing Glibc ("scalar") implementations are as There's been a few of these posted so I forget exactly where the reviews ended up, but at least one of the asks was to compare these against vectorized versions of the standard glibc routines. > follows: > memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms) > memcmp: 55% > memcpy: 88% > memmove: 80% > memset: 88% > strcmp: 85% > strlen: 70% > strcat: 53% > strchr: 85% > strcpy: 70% > strncmp 90% > strncat: 50% > strncpy: 60% > strnlen: 80% > Above data are collected on SiFive X280 (FPGA simulation), across a wide > range of problem sizes. That's certainly more realistic of a system than the QEMU results, but the general consensus has been that FPGA-based development systems don't count as hardware -- not so much because of the FPGA, but because we're looking for production systems. If there's real production systems running on FPGAs that's a different story, but it looks like these are just pre-silicon development systems. > v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html > * add RISC-V vectoriezed mem*/str* functions > > v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html > * include the __memcmpeq function > * set lmul=1 for memcmp for generality > > v3: > * remove "Contributed by" comments > * fix licesnce headers > * avoid using camelcase variables > * avoid using C99 one line comment > > Jerry Shih (2): > riscv: vectorized mem* functions > riscv: vectorized str* functions > > Nick Knight (1): > riscv: vectorized strchr and strnlen functions > > Vincent Chen (1): > riscv: Enabling vectorized mem*/str* functions in build time > > Yun Hsiang (1): > riscv: add vectorized __memcmpeq > > scripts/build-many-glibcs.py | 10 ++++ > sysdeps/riscv/preconfigure | 19 ++++++++ > sysdeps/riscv/preconfigure.ac | 18 +++++++ > sysdeps/riscv/rv32/rvv/Implies | 2 + > sysdeps/riscv/rv64/rvv/Implies | 2 + > sysdeps/riscv/rvv/memchr.S | 62 ++++++++++++++++++++++++ > sysdeps/riscv/rvv/memcmp.S | 70 +++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memcmpeq.S | 67 ++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memcpy.S | 50 +++++++++++++++++++ > sysdeps/riscv/rvv/memmove.S | 71 +++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memset.S | 49 +++++++++++++++++++ > sysdeps/riscv/rvv/strcat.S | 71 +++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strchr.S | 62 ++++++++++++++++++++++++ > sysdeps/riscv/rvv/strcmp.S | 88 ++++++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strcpy.S | 55 +++++++++++++++++++++ > sysdeps/riscv/rvv/strlen.S | 53 ++++++++++++++++++++ > sysdeps/riscv/rvv/strncat.S | 82 +++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strncmp.S | 84 ++++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strncpy.S | 85 ++++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strnlen.S | 55 +++++++++++++++++++++ > 20 files changed, 1055 insertions(+) > create mode 100644 sysdeps/riscv/rv32/rvv/Implies > create mode 100644 sysdeps/riscv/rv64/rvv/Implies > create mode 100644 sysdeps/riscv/rvv/memchr.S > create mode 100644 sysdeps/riscv/rvv/memcmp.S > create mode 100644 sysdeps/riscv/rvv/memcmpeq.S > create mode 100644 sysdeps/riscv/rvv/memcpy.S > create mode 100644 sysdeps/riscv/rvv/memmove.S > create mode 100644 sysdeps/riscv/rvv/memset.S > create mode 100644 sysdeps/riscv/rvv/strcat.S > create mode 100644 sysdeps/riscv/rvv/strchr.S > create mode 100644 sysdeps/riscv/rvv/strcmp.S > create mode 100644 sysdeps/riscv/rvv/strcpy.S > create mode 100644 sysdeps/riscv/rvv/strlen.S > create mode 100644 sysdeps/riscv/rvv/strncat.S > create mode 100644 sysdeps/riscv/rvv/strncmp.S > create mode 100644 sysdeps/riscv/rvv/strncpy.S > create mode 100644 sysdeps/riscv/rvv/strnlen.S
> On May 8, 2023, at 10:06 PM, Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Thu, 04 May 2023 00:48:46 PDT (-0700), hau.hsu@sifive.com <mailto:hau.hsu@sifive.com> wrote: >> This is v3 patchset of adding vectorized mem*/str* functions for >> RISC-V. >> >> This patch proposes implementations of memchr, memcmp, memcpy, memmove, >> memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp, >> strncpy and strnlen that leverage the RISC-V V extension (RVV), version >> 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These >> routines are from https://github.com/sifive/sifive-libc, which we agree >> to be contributed to the Free Software Foundation. With regards to >> IFUNC, some details concerning `hwcap` are still under discussion in the >> community. For the purposes of reviewing this patch, we have temporarily >> opted for RVV delegation at compile time. Once the `hwcap` mechanism is >> ready, we’ll rebase on it. > > IMO it's fine to allow users to build a glibc that assumes the V extension, so we don't need to block this on having the dynamic probing working. > > That said, we do need to get the Linux uABI sorted out as right now we can't even turn on V for userspace. Does this mean that our current implementation that checks whether a user is building glibc with RVV compile flags is acceptable, at least for now? >> These routines assume VLEN is at least 32 bits, as is required by all >> currently defined vector extensions, and they support arbitrarily large >> VLEN. All implementations work for both RV32 and RV64 platforms, and >> make no assumptions about page size. >> >> The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code >> size, while the `str*` (unknown-length) routines use LMUL=1 instead. >> Longer LMUL will still minimize dynamic code size for the latter >> routines, but it will also increase the cost of the remainder/tail loop: >> more data loaded and comparisons performed past the `\0`. This overhead >> will be particularly pronounced for smaller strings. >> >> Measured performance improvements of the vectorized ("rvv") >> implementations vs. the existing Glibc ("scalar") implementations are as > > There's been a few of these posted so I forget exactly where the reviews ended up, but at least one of the asks was to compare these against vectorized versions of the standard glibc routines. I guess you mean this thread? https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html <https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html> > >> follows: >> memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms) >> memcmp: 55% >> memcpy: 88% >> memmove: 80% >> memset: 88% >> strcmp: 85% >> strlen: 70% >> strcat: 53% >> strchr: 85% >> strcpy: 70% >> strncmp 90% >> strncat: 50% >> strncpy: 60% >> strnlen: 80% >> Above data are collected on SiFive X280 (FPGA simulation), across a wide >> range of problem sizes. > > That's certainly more realistic of a system than the QEMU results, but the general consensus has been that FPGA-based development systems don't count as hardware -- not so much because of the FPGA, but because we're looking for production systems. If there's real production systems running on FPGAs that's a different story, but it looks like these are just pre-silicon development systems. Yes, the FPGA environment is not a production system, but currently we don't have any RVV products in hand nor similar simulation platforms, this is the best benchmarking environment we have. Yun Hsiang also ran benchmarks base on Sergei Lewis's commits in the same environment: https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html <https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html> Out implementations in this have less instruction/cycle count in most cases. When benchmarking Sergei Lewis's commits, Yun Hsiang encountered some errors. He helped to debug the source code and pointed out some issues: https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html <https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html> We know that different uarch variants might prefer different code, but our implementation is more generic. It follows the RVV spec 1.0 and has no other hardware assumptions. The benchmarking also shows good results, compare with the default and other proposed implementations. > >> v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html >> * add RISC-V vectoriezed mem*/str* functions >> >> v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html >> * include the __memcmpeq function >> * set lmul=1 for memcmp for generality >> >> v3: >> * remove "Contributed by" comments >> * fix licesnce headers >> * avoid using camelcase variables >> * avoid using C99 one line comment >> >> Jerry Shih (2): >> riscv: vectorized mem* functions >> riscv: vectorized str* functions >> >> Nick Knight (1): >> riscv: vectorized strchr and strnlen functions >> >> Vincent Chen (1): >> riscv: Enabling vectorized mem*/str* functions in build time >> >> Yun Hsiang (1): >> riscv: add vectorized __memcmpeq >> >> scripts/build-many-glibcs.py | 10 ++++ >> sysdeps/riscv/preconfigure | 19 ++++++++ >> sysdeps/riscv/preconfigure.ac | 18 +++++++ >> sysdeps/riscv/rv32/rvv/Implies | 2 + >> sysdeps/riscv/rv64/rvv/Implies | 2 + >> sysdeps/riscv/rvv/memchr.S | 62 ++++++++++++++++++++++++ >> sysdeps/riscv/rvv/memcmp.S | 70 +++++++++++++++++++++++++++ >> sysdeps/riscv/rvv/memcmpeq.S | 67 ++++++++++++++++++++++++++ >> sysdeps/riscv/rvv/memcpy.S | 50 +++++++++++++++++++ >> sysdeps/riscv/rvv/memmove.S | 71 +++++++++++++++++++++++++++ >> sysdeps/riscv/rvv/memset.S | 49 +++++++++++++++++++ >> sysdeps/riscv/rvv/strcat.S | 71 +++++++++++++++++++++++++++ >> sysdeps/riscv/rvv/strchr.S | 62 ++++++++++++++++++++++++ >> sysdeps/riscv/rvv/strcmp.S | 88 ++++++++++++++++++++++++++++++++++ >> sysdeps/riscv/rvv/strcpy.S | 55 +++++++++++++++++++++ >> sysdeps/riscv/rvv/strlen.S | 53 ++++++++++++++++++++ >> sysdeps/riscv/rvv/strncat.S | 82 +++++++++++++++++++++++++++++++ >> sysdeps/riscv/rvv/strncmp.S | 84 ++++++++++++++++++++++++++++++++ >> sysdeps/riscv/rvv/strncpy.S | 85 ++++++++++++++++++++++++++++++++ >> sysdeps/riscv/rvv/strnlen.S | 55 +++++++++++++++++++++ >> 20 files changed, 1055 insertions(+) >> create mode 100644 sysdeps/riscv/rv32/rvv/Implies >> create mode 100644 sysdeps/riscv/rv64/rvv/Implies >> create mode 100644 sysdeps/riscv/rvv/memchr.S >> create mode 100644 sysdeps/riscv/rvv/memcmp.S >> create mode 100644 sysdeps/riscv/rvv/memcmpeq.S >> create mode 100644 sysdeps/riscv/rvv/memcpy.S >> create mode 100644 sysdeps/riscv/rvv/memmove.S >> create mode 100644 sysdeps/riscv/rvv/memset.S >> create mode 100644 sysdeps/riscv/rvv/strcat.S >> create mode 100644 sysdeps/riscv/rvv/strchr.S >> create mode 100644 sysdeps/riscv/rvv/strcmp.S >> create mode 100644 sysdeps/riscv/rvv/strcpy.S >> create mode 100644 sysdeps/riscv/rvv/strlen.S >> create mode 100644 sysdeps/riscv/rvv/strncat.S >> create mode 100644 sysdeps/riscv/rvv/strncmp.S >> create mode 100644 sysdeps/riscv/rvv/strncpy.S >> create mode 100644 sysdeps/riscv/rvv/strnlen.S
Yes, I've been in email conversation with Yun and have an updated version of that patchset locally. Since a key design goal of this approach is to explicitly align memory accesses instead of using fault only first loads in order to take advantage of uarch specific fast paths, my plan here is to submit an ifunc based version of the patch, gated behind suitable tests to make sure it is only enabled when it is not only safe but also outperforms more generic alternatives, once suitable hwcaps support is available to enable appropriate checks to be written. There seems little point sending an update to the list until it can be gated in this manner, but I am certainly happy to do so if there is interest. On Wed, May 10, 2023 at 10:02 AM Hau Hsu via Libc-alpha < libc-alpha@sourceware.org> wrote: > > > > On May 8, 2023, at 10:06 PM, Palmer Dabbelt <palmer@dabbelt.com> wrote: > > > > On Thu, 04 May 2023 00:48:46 PDT (-0700), hau.hsu@sifive.com <mailto: > hau.hsu@sifive.com> wrote: > >> This is v3 patchset of adding vectorized mem*/str* functions for > >> RISC-V. > >> > >> This patch proposes implementations of memchr, memcmp, memcpy, memmove, > >> memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp, > >> strncpy and strnlen that leverage the RISC-V V extension (RVV), version > >> 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These > >> routines are from https://github.com/sifive/sifive-libc, which we agree > >> to be contributed to the Free Software Foundation. With regards to > >> IFUNC, some details concerning `hwcap` are still under discussion in the > >> community. For the purposes of reviewing this patch, we have temporarily > >> opted for RVV delegation at compile time. Once the `hwcap` mechanism is > >> ready, we’ll rebase on it. > > > > IMO it's fine to allow users to build a glibc that assumes the V > extension, so we don't need to block this on having the dynamic probing > working. > > > > That said, we do need to get the Linux uABI sorted out as right now we > can't even turn on V for userspace. > > Does this mean that our current implementation that checks whether a user > is building > glibc with RVV compile flags is acceptable, at least for now? > > >> These routines assume VLEN is at least 32 bits, as is required by all > >> currently defined vector extensions, and they support arbitrarily large > >> VLEN. All implementations work for both RV32 and RV64 platforms, and > >> make no assumptions about page size. > >> > >> The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code > >> size, while the `str*` (unknown-length) routines use LMUL=1 instead. > >> Longer LMUL will still minimize dynamic code size for the latter > >> routines, but it will also increase the cost of the remainder/tail loop: > >> more data loaded and comparisons performed past the `\0`. This overhead > >> will be particularly pronounced for smaller strings. > >> > >> Measured performance improvements of the vectorized ("rvv") > >> implementations vs. the existing Glibc ("scalar") implementations are as > > > > There's been a few of these posted so I forget exactly where the reviews > ended up, but at least one of the asks was to compare these against > vectorized versions of the standard glibc routines. > > I guess you mean this thread? > https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html < > https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html> > > > > >> follows: > >> memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms) > >> memcmp: 55% > >> memcpy: 88% > >> memmove: 80% > >> memset: 88% > >> strcmp: 85% > >> strlen: 70% > >> strcat: 53% > >> strchr: 85% > >> strcpy: 70% > >> strncmp 90% > >> strncat: 50% > >> strncpy: 60% > >> strnlen: 80% > >> Above data are collected on SiFive X280 (FPGA simulation), across a wide > >> range of problem sizes. > > > > That's certainly more realistic of a system than the QEMU results, but > the general consensus has been that FPGA-based development systems don't > count as hardware -- not so much because of the FPGA, but because we're > looking for production systems. If there's real production systems running > on FPGAs that's a different story, but it looks like these are just > pre-silicon development systems. > > Yes, the FPGA environment is not a production system, but currently we > don't have > any RVV products in hand nor similar simulation platforms, this is the > best benchmarking environment we have. > > Yun Hsiang also ran benchmarks base on Sergei Lewis's commits in the same > environment: > https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html < > https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html> > Out implementations in this have less instruction/cycle count in most > cases. > > When benchmarking Sergei Lewis's commits, Yun Hsiang encountered some > errors. > He helped to debug the source code and pointed out some issues: > https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html < > https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html> > > We know that different uarch variants might prefer different code, but our > implementation is more generic. > It follows the RVV spec 1.0 and has no other hardware assumptions. > The benchmarking also shows good results, compare with the default and > other proposed implementations. > > > > > >> v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html > >> * add RISC-V vectoriezed mem*/str* functions > >> > >> v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html > >> * include the __memcmpeq function > >> * set lmul=1 for memcmp for generality > >> > >> v3: > >> * remove "Contributed by" comments > >> * fix licesnce headers > >> * avoid using camelcase variables > >> * avoid using C99 one line comment > >> > >> Jerry Shih (2): > >> riscv: vectorized mem* functions > >> riscv: vectorized str* functions > >> > >> Nick Knight (1): > >> riscv: vectorized strchr and strnlen functions > >> > >> Vincent Chen (1): > >> riscv: Enabling vectorized mem*/str* functions in build time > >> > >> Yun Hsiang (1): > >> riscv: add vectorized __memcmpeq > >> > >> scripts/build-many-glibcs.py | 10 ++++ > >> sysdeps/riscv/preconfigure | 19 ++++++++ > >> sysdeps/riscv/preconfigure.ac | 18 +++++++ > >> sysdeps/riscv/rv32/rvv/Implies | 2 + > >> sysdeps/riscv/rv64/rvv/Implies | 2 + > >> sysdeps/riscv/rvv/memchr.S | 62 ++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/memcmp.S | 70 +++++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/memcmpeq.S | 67 ++++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/memcpy.S | 50 +++++++++++++++++++ > >> sysdeps/riscv/rvv/memmove.S | 71 +++++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/memset.S | 49 +++++++++++++++++++ > >> sysdeps/riscv/rvv/strcat.S | 71 +++++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/strchr.S | 62 ++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/strcmp.S | 88 ++++++++++++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/strcpy.S | 55 +++++++++++++++++++++ > >> sysdeps/riscv/rvv/strlen.S | 53 ++++++++++++++++++++ > >> sysdeps/riscv/rvv/strncat.S | 82 +++++++++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/strncmp.S | 84 ++++++++++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/strncpy.S | 85 ++++++++++++++++++++++++++++++++ > >> sysdeps/riscv/rvv/strnlen.S | 55 +++++++++++++++++++++ > >> 20 files changed, 1055 insertions(+) > >> create mode 100644 sysdeps/riscv/rv32/rvv/Implies > >> create mode 100644 sysdeps/riscv/rv64/rvv/Implies > >> create mode 100644 sysdeps/riscv/rvv/memchr.S > >> create mode 100644 sysdeps/riscv/rvv/memcmp.S > >> create mode 100644 sysdeps/riscv/rvv/memcmpeq.S > >> create mode 100644 sysdeps/riscv/rvv/memcpy.S > >> create mode 100644 sysdeps/riscv/rvv/memmove.S > >> create mode 100644 sysdeps/riscv/rvv/memset.S > >> create mode 100644 sysdeps/riscv/rvv/strcat.S > >> create mode 100644 sysdeps/riscv/rvv/strchr.S > >> create mode 100644 sysdeps/riscv/rvv/strcmp.S > >> create mode 100644 sysdeps/riscv/rvv/strcpy.S > >> create mode 100644 sysdeps/riscv/rvv/strlen.S > >> create mode 100644 sysdeps/riscv/rvv/strncat.S > >> create mode 100644 sysdeps/riscv/rvv/strncmp.S > >> create mode 100644 sysdeps/riscv/rvv/strncpy.S > >> create mode 100644 sysdeps/riscv/rvv/strnlen.S > >