From patchwork Tue Feb 7 00:16:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 64399 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5E93938493E4 for ; Tue, 7 Feb 2023 00:20:34 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by sourceware.org (Postfix) with ESMTPS id 166D83858439 for ; Tue, 7 Feb 2023 00:16:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 166D83858439 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-wr1-x42d.google.com with SMTP id m14so12017850wrg.13 for ; Mon, 06 Feb 2023 16:16:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lNlFZRu7PYiep6w+7Ap3iNMp8G5wsKMbf2Rwc8RV7wE=; b=Q+t9a4yHxHAqiQo7n6SzA/Hd1d37Vqh2i+vNBBDvLlFjVaYeJyxCbGHgQjxFwnK2ib 3ZSjK0czDxPQspxrPjVLfSMWVUXNmUFDs9D5HSFY2/Z/vVaGUcdGwsMC0SwgexCyBhRY VajZnENTUVYMUcSuV3sZm9n9ViG8q6UUWIM8dm05LA36F/XmXpVL9N0anrQTkg6P6XCt O3/IUx8/LwVZRWY2Usm6AvA/zHG/kNjwo9aEXXCVSQnTB+AXfg2PM6ObjAPn1LXIYNru qi4I3Ijx+L1PBDIVtm8OXZLBS+tkqVmDQZll6ozxPNBMSw9aIXVYu1NXxTZYaHE/kMwb Z2lA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lNlFZRu7PYiep6w+7Ap3iNMp8G5wsKMbf2Rwc8RV7wE=; b=Ep89zqxucgJphXXjmzsEOhqoCs8Z7qK3Kxi+3tcladNehRfTYIdyjO5k+9OLxxpzBg 44u8PAsKVXfpnbnnHjmvgvkkNGJWEcmSo2b2CJ5G3jwmTOMpyjQPNiFhiIuBRKx9/Mwi MRStU13LSVIDS/eShpUv2jIm/RA8ha9J/cO1sR69pKHVgJr6JyzmP8k+CyDkj+voZmKq OL5flT0fcYSjeSBycaHjkTb9UO7eRxKU4daxTGejsCIEDlrSyrjXW6RWr0dKZq8/pLDn QPMNdRvhI1IHHxgP13JQVkAcVYEA9X720Aiulb4N3rcZoPNNNFGFOPNjhe4JRaBOtKUr YhZg== X-Gm-Message-State: AO0yUKU9OJ567QCsC8eIcBB0StZPy3QaiukR288tXUNLYY7vClDwCbtV l1hZsyMghd4vstBrvXZqSJQFV43M8FWGmuoL X-Google-Smtp-Source: AK7set/INupsPEU0U/3G01w2Synf9amPHTbWMzdX3l+VPgb5EdLQYD5IF//jA8IG6N3IZSFKtsh+kg== X-Received: by 2002:a5d:54c5:0:b0:242:5563:c3b with SMTP id x5-20020a5d54c5000000b0024255630c3bmr595606wrv.59.1675729016466; Mon, 06 Feb 2023 16:16:56 -0800 (PST) Received: from beast.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id f1-20020a1cc901000000b003df14531724sm16862050wmb.21.2023.02.06.16.16.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Feb 2023 16:16:55 -0800 (PST) From: Christoph Muellner To: libc-alpha@sourceware.org, Palmer Dabbelt , Darius Rad , Andrew Waterman , DJ Delorie , Vineet Gupta , Kito Cheng , Jeff Law , Philipp Tomsich , Heiko Stuebner Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [RFC PATCH 18/19] riscv: Add an optimized strncmp routine Date: Tue, 7 Feb 2023 01:16:17 +0100 Message-Id: <20230207001618.458947-19-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230207001618.458947-1-christoph.muellner@vrull.eu> References: <20230207001618.458947-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_MANYTO, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From: Christoph Müllner The implementation of strncmp() can be accelerated using Zbb's orc.b instruction. Let's add an optimized implementation that makes use of this instruction. Signed-off-by: Christoph Müllner --- sysdeps/riscv/multiarch/Makefile | 3 +- sysdeps/riscv/multiarch/ifunc-impl-list.c | 1 + sysdeps/riscv/multiarch/strncmp.c | 6 +- sysdeps/riscv/multiarch/strncmp_zbb.S | 119 ++++++++++++++++++++++ 4 files changed, 127 insertions(+), 2 deletions(-) create mode 100644 sysdeps/riscv/multiarch/strncmp_zbb.S diff --git a/sysdeps/riscv/multiarch/Makefile b/sysdeps/riscv/multiarch/Makefile index 056ce2ffc0..9f22e31b99 100644 --- a/sysdeps/riscv/multiarch/Makefile +++ b/sysdeps/riscv/multiarch/Makefile @@ -14,5 +14,6 @@ sysdep_routines += \ strcmp_generic \ strcmp_zbb \ strcmp_zbb_unaligned \ - strncmp_generic + strncmp_generic \ + strncmp_zbb endif diff --git a/sysdeps/riscv/multiarch/ifunc-impl-list.c b/sysdeps/riscv/multiarch/ifunc-impl-list.c index eb37ed6017..82fd34d010 100644 --- a/sysdeps/riscv/multiarch/ifunc-impl-list.c +++ b/sysdeps/riscv/multiarch/ifunc-impl-list.c @@ -64,6 +64,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_generic)) IFUNC_IMPL (i, name, strncmp, + IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_zbb) IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_generic)) return i; } diff --git a/sysdeps/riscv/multiarch/strncmp.c b/sysdeps/riscv/multiarch/strncmp.c index 970aeb8b85..5b0fe08e98 100644 --- a/sysdeps/riscv/multiarch/strncmp.c +++ b/sysdeps/riscv/multiarch/strncmp.c @@ -30,8 +30,12 @@ extern __typeof (__redirect_strncmp) __libc_strncmp; extern __typeof (__redirect_strncmp) __strncmp_generic attribute_hidden; +extern __typeof (__redirect_strncmp) __strncmp_zbb attribute_hidden; -libc_ifunc (__libc_strncmp, __strncmp_generic); +libc_ifunc (__libc_strncmp, + HAVE_RV(zbb) + ? __strncmp_zbb + : __strncmp_generic); # undef strncmp strong_alias (__libc_strncmp, strncmp); diff --git a/sysdeps/riscv/multiarch/strncmp_zbb.S b/sysdeps/riscv/multiarch/strncmp_zbb.S new file mode 100644 index 0000000000..29cff30def --- /dev/null +++ b/sysdeps/riscv/multiarch/strncmp_zbb.S @@ -0,0 +1,119 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +/* Assumptions: rvi_zbb. */ + +#define src1 a0 +#define result a0 +#define src2 a1 +#define len a2 +#define data1 a2 +#define data2 a3 +#define align a4 +#define data1_orcb t0 +#define limit t1 +#define fast_limit t2 +#define m1 t3 + +#if __riscv_xlen == 64 +# define REG_L ld +# define SZREG 8 +# define PTRLOG 3 +#else +# define REG_L lw +# define SZREG 4 +# define PTRLOG 2 +#endif + +#ifndef STRNCMP +# define STRNCMP __strncmp_zbb +#endif + +.option push +.option arch,+zbb + +ENTRY_ALIGN (STRNCMP, 6) + beqz len, L(equal) + or align, src1, src2 + and align, align, SZREG-1 + add limit, src1, len + bnez align, L(simpleloop) + li m1, -1 + + /* Adjust limit for fast-path. */ + andi fast_limit, limit, -SZREG + + /* Main loop for aligned string. */ + .p2align 3 +L(loop): + bge src1, fast_limit, L(simpleloop) + REG_L data1, 0(src1) + REG_L data2, 0(src2) + orc.b data1_orcb, data1 + bne data1_orcb, m1, L(foundnull) + addi src1, src1, SZREG + addi src2, src2, SZREG + beq data1, data2, L(loop) + + /* Words don't match, and no null byte in the first + * word. Get bytes in big-endian order and compare. */ +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + rev8 data1, data1 + rev8 data2, data2 +#endif + /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence. */ + sltu result, data1, data2 + neg result, result + ori result, result, 1 + ret + +L(foundnull): + /* Found a null byte. + * If words don't match, fall back to simple loop. */ + bne data1, data2, L(simpleloop) + + /* Otherwise, strings are equal. */ + li result, 0 + ret + + /* Simple loop for misaligned strings. */ + .p2align 3 +L(simpleloop): + bge src1, limit, L(equal) + lbu data1, 0(src1) + addi src1, src1, 1 + lbu data2, 0(src2) + addi src2, src2, 1 + bne data1, data2, L(sub) + bnez data1, L(simpleloop) + +L(sub): + sub result, data1, data2 + ret + +L(equal): + li result, 0 + ret + +.option pop + +END (STRNCMP) +libc_hidden_builtin_def (STRNCMP)