From patchwork Wed May 7 14:46:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Salem X-Patchwork-Id: 111672 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A58CB385801B for ; Wed, 7 May 2025 14:48:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A58CB385801B Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=P5fZ0mPv X-Original-To: newlib@sourceware.org Delivered-To: newlib@sourceware.org Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by sourceware.org (Postfix) with ESMTPS id D60343858CD1 for ; Wed, 7 May 2025 14:46:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D60343858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D60343858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::72a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1746629208; cv=none; b=xrxX+lp5CwQ163XkUD0pA6zYl9IphEboyHAR207SbjxYNr1fciTA4aGQdSjfAwC/+nEIO37Nu2HU/AbURqQww2NbpdR0FukxwwFIC5z88PmI1/jdff2e0qhW5S+9kpaoNf4YPjDwydIiqbsgBUKR4f6G/jajtISMZvjT0iBDPRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1746629208; c=relaxed/simple; bh=0palhW452pXKaqJRC46/cxSqWOpiiu2HOgUahcY5ka0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=oVYoA8qAofyrgFVQGsxx/G5BrVYkGB27WF1ryR1oWrfi0YBpAk6dsgXjj7jv1Qtw9O7fEuU5ihPvztpO+AtojQcubtU/d2LbiHw/GL74D2eFkz3hZeLIDyHJOzEqchBQHEVSGwVo19EQYL67Z7gtKIu2zl/iFiis/7qoDLAIMrQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D60343858CD1 Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-7c53b9d66fdso1046100785a.3 for ; Wed, 07 May 2025 07:46:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746629207; x=1747234007; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=vJpkVPEQPe06WiIEZZavb9F4LxAHqTLSrZHgp4EIQ84=; b=P5fZ0mPvEZLuxMLLBmUKzXYqUANhM1KO/f6r5Vomz9UCvCpDPCSZE+Zy+gMrqMsZuQ 73z8hbr0WDjJpjbpscqry61OWLrDVoh5nJtQ0Uy7HPzOvsdmwXHOgJHE9gjzTgbwFNtg TMHNT3qvg+szLVYuYxWzZ2O4c7jh9GaMZz5eMVkF7dwqynftYqa9yVmfOMjaz6aUCLSX CgAkStulKSp/eLQv4cPr1KTSGhRRcOs/dygvFpm3KbU7Xy6F6pXb14Q3Nzuq3H8sAhWT vl+E54/Bq74u6j5zuwCbTqo4G2+OOmovdQUs4ltB1TqtO3NyV4l5p5IgBo1WzAJ91fEt JZ4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746629207; x=1747234007; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vJpkVPEQPe06WiIEZZavb9F4LxAHqTLSrZHgp4EIQ84=; b=fl8rhAgbDzOiBrZ/YkVq7Mn/t2k340j+HD4LsYyZtCPMqtJmBo2AGLEGSBbAulBSKe bhmwgA6a30U/VxIVNB3f3tbIzEjiKAup0/cZKXmdSf2+he6zNMIsEv9gBr55yVmpNHVc XRtcibF93o2/KZRNDTtaD64gB/zxXjk3CHQsb2s1Swm5JiasYkvWm0M7VWePFynFVaqB cNjEQENfYGR41YCTdf1QmFQEg16iMIqvzyyqd07d8AJH898XpevB6gEAGe1x82W0uCy5 pRJcwO0OxK6+dzVQEoB10WyK40gPZXIzf5UpDVChOXD5kqkk7FDK+f0g131+3kGMwA4G /B9w== X-Gm-Message-State: AOJu0YyEa8XmOl8DjRrPj7PX+PxjQ3A3hYsKYtKcX3xR3tbpfB40mY2p gbIU8yJl2GtJ3KAYzJttYsi5gfyPU58sd0yxs3Uw7BQF2TMEzSHqzklQIjdi X-Gm-Gg: ASbGncuLMXh4XMWtTuE2aXQaHZLIgSXd6p66tFkhRMS19Bmon6T6lKohhw7EkvghO1u WjSKLdIkTrqMvTIfbo9s6iQ7JCvX9sXqmtlFtHqIItgyUbbdk9rUaPZgAeLmYpyHVJI3CSBJIVl zysN3kty9e01u6qHgJ0+w+y7OoXZBZxWS0i7XSegf7QKCkPdAKH72/vmvsOfY4UyRbbVZmSTTLF n206cgkAQUfqiomKb2D8O1XbAXgda+T9piUAa8JyfypEubFvp2+MejjH9jodKIiIXd/c31q2KtQ PzJCjrlP1JRRAm2QyrknsMugHJ3Jp+5PHFCHZjGHCsyYfSYwu8S5KbkzA47G7fsvREoCdQ5Z9g= = X-Google-Smtp-Source: AGHT+IGhgIkKwvshZb6lX+h9Yi31YAo2Hjwsf9FfDRhopvGQPogDqpcYrVF+wQ+Kic7skWYxY2cD6w== X-Received: by 2002:a05:620a:361e:b0:7cc:a80c:6237 with SMTP id af79cd13be357-7cca80c626emr142648185a.44.1746629206788; Wed, 07 May 2025 07:46:46 -0700 (PDT) Received: from fedora.. (c-73-176-204-61.hsd1.il.comcast.net. [73.176.204.61]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7caf75d8fe2sm157522385a.97.2025.05.07.07.46.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 May 2025 07:46:46 -0700 (PDT) From: Eric Salem To: newlib@sourceware.org, Kito Cheng Subject: [PATCH v2 2/3] newlib: riscv: Optimize memchr() and memrchr() Date: Wed, 7 May 2025 09:46:42 -0500 Message-ID: <569ea557b808a897dd75334f51d9fc827f13490b.1746628687.git.ericsalem@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: newlib-bounces~patchwork=sourceware.org@sourceware.org The RISC-V Zbb, Zbkb, and Zilsd extensions provide instructions optimized for bit and load/store operations. Use them when available for the RISC-V port. Reviewed-by: Christian Herber Signed-off-by: Eric Salem --- newlib/libc/machine/riscv/memchr.c | 137 ++++++++++++++++------ newlib/libc/machine/riscv/memrchr.c | 159 +++++++++++++++++++------- newlib/libc/machine/riscv/rv_string.h | 45 +++++++- newlib/libc/machine/riscv/xlenint.h | 7 ++ 4 files changed, 265 insertions(+), 83 deletions(-) diff --git a/newlib/libc/machine/riscv/memchr.c b/newlib/libc/machine/riscv/memchr.c index 5c08c12813fe..f2824540a96b 100644 --- a/newlib/libc/machine/riscv/memchr.c +++ b/newlib/libc/machine/riscv/memchr.c @@ -29,10 +29,17 @@ QUICKREF memchr ansi pure */ -#include <_ansi.h> -#include -#include -#include "../../string/local.h" +#include +#include +#include "rv_string.h" + +// Move size +#if __riscv_zilsd +#define MV_SZ 8 +#else +#define MV_SZ SZREG +#endif + void * memchr (const void *src_void, @@ -43,47 +50,101 @@ memchr (const void *src_void, unsigned char d = c; #if !defined(PREFER_SIZE_OVER_SPEED) && !defined(__OPTIMIZE_SIZE__) - unsigned long *asrc; - unsigned long mask; - unsigned int i; + size_t align = (uintptr_t) src & (MV_SZ - 1); - while (UNALIGNED_X(src)) + if (align) { - if (!length--) - return NULL; - if (*src == d) - return (void *) src; - src++; - } + align = MV_SZ - align; - if (!TOO_SMALL_LITTLE_BLOCK(length)) - { - /* If we get this far, we know that length is large and src is - word-aligned. */ - /* The fast code reads the source one word at a time and only - performs the bytewise search on word-sized segments if they - contain the search character, which is detected by XORing - the word-sized segment with a word-sized block of the search - character and then detecting for the presence of NUL in the - result. */ - asrc = (unsigned long *) src; - mask = d << 8 | d; - mask = mask << 16 | mask; - for (i = 32; i < sizeof(mask) * 8; i <<= 1) - mask = (mask << i) | mask; - - while (!TOO_SMALL_LITTLE_BLOCK(length)) + if (length < align) align = length; + + switch (align) { - if (DETECT_CHAR(*asrc, mask)) - break; - length -= LITTLE_BLOCK_SIZE; - asrc++; +#if MV_SZ == 8 + case 7: + if (*src++ == d) return (void *) (src - 1); + case 6: + if (*src++ == d) return (void *) (src - 1); + case 5: + if (*src++ == d) return (void *) (src - 1); + case 4: + if (*src++ == d) return (void *) (src - 1); +#endif /* MV_SZ == 8 */ + case 3: + if (*src++ == d) return (void *) (src - 1); + case 2: + if (*src++ == d) return (void *) (src - 1); + case 1: + if (*src++ == d) return (void *) (src - 1); } - /* If there are fewer than LITTLE_BLOCK_SIZE characters left, - then we resort to the bytewise loop. */ + length -= align; + } + + const unsigned char *end_addr = src + (length & ~(MV_SZ - 1)); - src = (unsigned char *) asrc; + if (src < end_addr) + { + uintxlen_t mask = __libc_splat_byte(d); + uintlslen_t val; + + do + { +#if __riscv_zilsd + asm volatile ("ld %0, 0(%1)" + : "=R" (val) + : "r" (src) + ); +#else /* not riscv_zilsd */ + val = *(uintxlen_t*) src; +#endif /* __riscv_zilsd */ + uintxlen_t word1 = val ^ mask; + + if (__libc_detect_null(word1)) + { +#if __riscv_zbb + word1 = ~__LIBC_RISCV_ZBB_ORC_B(word1); + word1 = __LIBC_RISCV_ZBB_CNT_Z(word1); + + return (void *) (src + (word1 >> 3)); +#else /* not __riscv_zbb */ + if (*src++ == d) return (void *) (src - 1); + if (*src++ == d) return (void *) (src - 1); + if (*src++ == d) return (void *) (src - 1); +#if __riscv_xlen == 64 + if (*src++ == d) return (void *) (src - 1); + if (*src++ == d) return (void *) (src - 1); + if (*src++ == d) return (void *) (src - 1); + if (*src++ == d) return (void *) (src - 1); +#endif /* __riscv_xlen == 64 */ + return (void *) src; +#endif /* __riscv_zbb */ + } +#if __riscv_zilsd + uintxlen_t word2 = (val >> 32); + word2 ^= mask; + + if (__libc_detect_null(word2)) + { + src += MV_SZ / 2; +#if __riscv_zbb + word2 = ~__LIBC_RISCV_ZBB_ORC_B(word2); + word2 = __LIBC_RISCV_ZBB_CNT_Z(word2); + + return (void *) (src + (word2 >> 3)); +#else /* not __riscv_zbb */ + if (*src++ == d) return (void *) (src - 1); + if (*src++ == d) return (void *) (src - 1); + if (*src++ == d) return (void *) (src - 1); + return (void *) src; +#endif /* __riscv_zbb */ + } +#endif /* __riscv_zilsd */ + + src += MV_SZ; + } while (src < end_addr); + + length &= MV_SZ - 1; } #endif /* not PREFER_SIZE_OVER_SPEED */ diff --git a/newlib/libc/machine/riscv/memrchr.c b/newlib/libc/machine/riscv/memrchr.c index 8d15ccb780ec..84f94b0c1bd3 100644 --- a/newlib/libc/machine/riscv/memrchr.c +++ b/newlib/libc/machine/riscv/memrchr.c @@ -29,61 +29,142 @@ QUICKREF memrchr */ -#include <_ansi.h> -#include -#include -#include "../../string/local.h" +#include +#include +#include "rv_string.h" + +// Move size +#if __riscv_zilsd +#define MV_SZ 8 + +// Offset is only 4 bytes for Zilsd/Zclsd since each register is 32 bits +#define OFFSET 4 +#else +#define MV_SZ SZREG +#define OFFSET SZREG +#endif + void * memrchr (const void *src_void, int c, size_t length) { - const unsigned char *src = (const unsigned char *) src_void + length - 1; + const unsigned char *src = (const unsigned char *) src_void; unsigned char d = c; + if (length) src += length - 1; + #if !defined(PREFER_SIZE_OVER_SPEED) && !defined(__OPTIMIZE_SIZE__) - unsigned long *asrc; - unsigned long mask; - unsigned int i; - while (UNALIGNED_X(src)) - { - if (!length--) - return NULL; - if (*src == d) - return (void *) src; - src--; - } + /* + We add one to the address because even if an address is already aligned, + when loading words the bytes preceding this address are read, so check + the single byte. - if (!TOO_SMALL_LITTLE_BLOCK(length)) + If the address has all the least significant bits set equaling MV_SZ - 1, + and has a length of at least MV_SZ, we can read a word starting from + src & ~(MV_SZ - 1) because no alignment is actually required + */ + size_t align = (uintptr_t) (src + 1) & (MV_SZ - 1); + + if (align) { - /* If we get this far, we know that length is large and src is - word-aligned. */ - /* The fast code reads the source one word at a time and only - performs the bytewise search on word-sized segments if they - contain the search character, which is detected by XORing - the word-sized segment with a word-sized block of the search - character and then detecting for the presence of NUL in the - result. */ - asrc = (unsigned long *) (src - LITTLE_BLOCK_SIZE + 1); - mask = d << 8 | d; - mask = mask << 16 | mask; - for (i = 32; i < sizeof(mask) * 8; i <<= 1) - mask = (mask << i) | mask; - - while (!TOO_SMALL_LITTLE_BLOCK(length)) + if (length < align) align = length; + + switch (align) { - if (DETECT_CHAR(*asrc, mask)) - break; - length -= LITTLE_BLOCK_SIZE; - asrc--; +#if MV_SZ == 8 + case 7: + if (*src-- == d) return (void *) (src + 1); + case 6: + if (*src-- == d) return (void *) (src + 1); + case 5: + if (*src-- == d) return (void *) (src + 1); + case 4: + if (*src-- == d) return (void *) (src + 1); +#endif /* MV_SZ == 8 */ + case 3: + if (*src-- == d) return (void *) (src + 1); + case 2: + if (*src-- == d) return (void *) (src + 1); + case 1: + if (*src-- == d) return (void *) (src + 1); } - /* If there are fewer than LITTLE_BLOCK_SIZE characters left, - then we resort to the bytewise loop. */ + length -= align; + } + + const unsigned char *end_addr = src - (length & ~(MV_SZ - 1)); - src = (unsigned char *) asrc + LITTLE_BLOCK_SIZE - 1; + if (src > end_addr) + { + src -= MV_SZ - 1; + + uintxlen_t mask = __libc_splat_byte(d); + uintlslen_t val; + + do + { +#if __riscv_zilsd + asm volatile ("ld %0, 0(%1)" + : "=R" (val) + : "r" (src) + ); +#else /* not riscv_zilsd */ + val = *(uintxlen_t*) src; +#endif /* __riscv_zilsd */ + +#if __riscv_zilsd + uintxlen_t word2 = val >> 32; + word2 ^= mask; + + if (__libc_detect_null(word2)) + { +#if __riscv_zbb + src += OFFSET; + word2 = ~__LIBC_RISCV_ZBB_ORC_B(word2); + word2 = __LIBC_RISCV_ZBB_CNT_Z_REV(word2); + + return (void *) (src + OFFSET - 1 - (word2 >> 3)); +#else /* not __riscv_zbb */ + src += MV_SZ - 1; + if (*src-- == d) return (void *) (src + 1); + if (*src-- == d) return (void *) (src + 1); + if (*src-- == d) return (void *) (src + 1); + return (void *) src; +#endif /* __riscv_zbb */ + } +#endif /* __riscv_zilsd */ + uintxlen_t word1 = val ^ mask; + + if (__libc_detect_null(word1)) + { +#if __riscv_zbb + word1 = ~__LIBC_RISCV_ZBB_ORC_B(word1); + word1 = __LIBC_RISCV_ZBB_CNT_Z_REV(word1); + + return (void *) (src + OFFSET - 1 - (word1 >> 3)); +#else /* not __riscv_zbb */ + src += OFFSET - 1; + if (*src-- == d) return (void *) (src + 1); + if (*src-- == d) return (void *) (src + 1); + if (*src-- == d) return (void *) (src + 1); +#if __riscv_xlen == 64 + if (*src-- == d) return (void *) (src + 1); + if (*src-- == d) return (void *) (src + 1); + if (*src-- == d) return (void *) (src + 1); + if (*src-- == d) return (void *) (src + 1); +#endif /* __riscv_xlen == 64 */ + return (void *) src; +#endif /* __riscv_zbb */ + } + + src -= MV_SZ; + } while (src > end_addr); + + length &= MV_SZ - 1; + src = end_addr; } #endif /* not PREFER_SIZE_OVER_SPEED */ diff --git a/newlib/libc/machine/riscv/rv_string.h b/newlib/libc/machine/riscv/rv_string.h index 7754303064c9..dc2a26daf1b2 100644 --- a/newlib/libc/machine/riscv/rv_string.h +++ b/newlib/libc/machine/riscv/rv_string.h @@ -20,20 +20,24 @@ // Determine which intrinsics to use based on XLEN and endianness #if __riscv_xlen == 64 - #define __LIBC_RISCV_ZBB_ORC_B(x) __riscv_orc_b_64(x) + #define __LIBC_RISCV_ZBB_ORC_B(x) __riscv_orc_b_64(x) #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ - #define __LIBC_RISCV_ZBB_CNT_Z(x) __riscv_ctz_64(x) + #define __LIBC_RISCV_ZBB_CNT_Z(x) __riscv_ctz_64(x) + #define __LIBC_RISCV_ZBB_CNT_Z_REV(x) __riscv_clz_64(x) #else - #define __LIBC_RISCV_ZBB_CNT_Z(x) __riscv_clz_64(x) + #define __LIBC_RISCV_ZBB_CNT_Z(x) __riscv_clz_64(x) + #define __LIBC_RISCV_ZBB_CNT_Z_REV(x) __riscv_ctz_64(x) #endif #else - #define __LIBC_RISCV_ZBB_ORC_B(x) __riscv_orc_b_32(x) + #define __LIBC_RISCV_ZBB_ORC_B(x) __riscv_orc_b_32(x) #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ - #define __LIBC_RISCV_ZBB_CNT_Z(x) __riscv_ctz_32(x) + #define __LIBC_RISCV_ZBB_CNT_Z(x) __riscv_ctz_32(x) + #define __LIBC_RISCV_ZBB_CNT_Z_REV(x) __riscv_clz_32(x) #else - #define __LIBC_RISCV_ZBB_CNT_Z(x) __riscv_clz_32(x) + #define __LIBC_RISCV_ZBB_CNT_Z(x) __riscv_clz_32(x) + #define __LIBC_RISCV_ZBB_CNT_Z_REV(x) __riscv_ctz_32(x) #endif #endif #endif @@ -121,4 +125,33 @@ static __inline char *__libc_strcpy(char *dst, const char *src, bool ret_start) } +static __inline uintxlen_t __libc_splat_byte(unsigned char c) +{ + uintxlen_t val; + +#if __riscv_zbkb + asm volatile ("packh %0, %1, %1" + : "=r" (val) + : "r" (c) + ); +#if __riscv_xlen == 64 + asm volatile ("packw %0, %0, %0" + : "+r" (val) + ); +#endif /* __riscv_xlen == 64 */ + asm volatile ("pack %0, %0, %0" + : "+r" (val) + ); +#else /* not __riscv_zbkb */ + val = (c << 8) | c; + val = (val << 16) | val; +#if __riscv_xlen == 64 + val = (val << 32) | val; +#endif /* __riscv_xlen == 64 */ +#endif /* __riscv_zbkb */ + + return val; +} + + #endif /* _RV_STRING_H */ diff --git a/newlib/libc/machine/riscv/xlenint.h b/newlib/libc/machine/riscv/xlenint.h index 86363a80655f..2d444ff9b80e 100644 --- a/newlib/libc/machine/riscv/xlenint.h +++ b/newlib/libc/machine/riscv/xlenint.h @@ -11,4 +11,11 @@ typedef uint32_t uintxlen_t; # error __riscv_xlen must equal 32 or 64 #endif +/* Load/Store length */ +#if __riscv_zilsd +typedef uint64_t uintlslen_t; +#else +typedef uintxlen_t uintlslen_t; +#endif + #endif /* _XLENINT_H */