From patchwork Thu Feb 29 08:00:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Tirta Halim X-Patchwork-Id: 86568 X-Patchwork-Delegate: maxim.kuvyrkov@gmail.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 22DA63858C78 for ; Thu, 29 Feb 2024 08:05:37 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id 7771C3858CDA for ; Thu, 29 Feb 2024 08:05:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7771C3858CDA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7771C3858CDA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::432 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709193912; cv=none; b=eETkKhL+wVwlWNfP/uGEF/kRCvmuJt+LBg9isRuZSYf2SCRcsJI/uetCLCwTPH3UFKzGCKsaQPjy37HtrxKoBMsPAPrsGnGKEK0/CfJax3v23ZLMtXAteKTEtMLE+asZiPbj5Q2ccPb/5OW6Io8SpYl+bxvVvZE4HKOgRUKZ6ck= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709193912; c=relaxed/simple; bh=pt+A5IAH1ihX+tF8VmFt9m+Fcv/YEGc1ov7mQeUOocc=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=mK32fVNabDZxkibacffXvLmxPMjdGBLJ8L77gsxySNtTWx6Kbp+ZTZnESg9bf4ipeOC8xKGD/GqOcpo2lgsYr3QY5yNDc3V+IB+YWIxFeEIZYlJGVmGuGlCLYuZAS54ntK1bUrtYDQjkrX2kfsQ5cflpzfvQ5DyaAlUYfz5B8IY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-6e5675f2ca2so363063b3a.0 for ; Thu, 29 Feb 2024 00:05:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709193908; x=1709798708; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=81O/XVS6vm3IpAb7JHUEkrJ6+Epanz99KB5jZKUcFQg=; b=EiU7i66RzJew5aPAaRftnjHwU3svP7RXHrDd2w/yJ29JNaSWJS3eMUfyISYJFS0hRo V7eYXTFqZxSAhlQbYPouj6CXzJGBk1GrLFRhPB7Nr3+345/Xwkw3oO0SdYFHzoPO/ikJ TN4YdXmV/4lW3ZjYriwYGJmnLqr/4lC3dM/tuROXKqcftJRe9DYsSOx9lbVCaeHakMjs RdDg2qCHk5XUfXoKcAElqjF87WC9TrrQpO+vkSYimTD2TRH1c8PE62nvGWMr3vjTpgfw cXruV4yV5bUkQsJ2EA1mI0l/6/w6Y8dFhB+a7orhqxRSRsmULRoKFwFBR4WJFSDY1hjE vpPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709193908; x=1709798708; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=81O/XVS6vm3IpAb7JHUEkrJ6+Epanz99KB5jZKUcFQg=; b=NdkL70CZtJ34/bi8ss2RV3CqJBP8/dHIGrqLBXEMSsCAMXEF2sgDsEmFZaHnBnny9q 3NCgEEHf8Rl3lBirGr3JRiKj8V4GGbKSHPbL9ivZBb4YsBxwCWW1bZo+wm0d0/19HdWs xC46DalrZs7+SoUhbPaysp2T9F2YCr0NiyMok2BKkE+5AJShp0tev4zDhl+zsoknqT0K 4uufex4ZKJeKSZOlvAFqqZrdE3vni7jvbj8qIPKEKXtJNE+5QRMJ0XMC9RhNjEmjnhSn HzhTgASOWXmx6ICNDvYUQk514C0oNU3SAIQk/PUTouZT4O/IjS7IcogU0e5xHJhrPRLj i2RA== X-Gm-Message-State: AOJu0YzpywD02SuuQPk94qWtyQNzXFWEYs3rXgJA0kUq4aFIWhakJhWq SZwBew982cF0wUIpcuzLj0Vnc5nmv6vmTrbKrtVFRhTkmXfEbs1dRjGQahlL+Yg= X-Google-Smtp-Source: AGHT+IHsYNG/zM9/TSDvBKFyBLLVve7j2Mql7s29REJgSYZ73WskmqIkwwjBz50wVd2oFDTiRciZ3g== X-Received: by 2002:a05:6a00:93a2:b0:6e4:c626:f905 with SMTP id ka34-20020a056a0093a200b006e4c626f905mr1687303pfb.16.1709193908237; Thu, 29 Feb 2024 00:05:08 -0800 (PST) Received: from localhost.localdomain ([2001:448a:20a0:2169:bb1d:a0c3:4a8c:8646]) by smtp.gmail.com with ESMTPSA id lp3-20020a056a003d4300b006e553f2b880sm663491pfb.211.2024.02.29.00.05.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 00:05:07 -0800 (PST) From: James Tirta Halim To: libc-alpha@sourceware.org Cc: James Tirta Halim Subject: [PATCH] optimize memmem for short needles Date: Thu, 29 Feb 2024 15:00:14 +0700 Message-ID: <20240229080014.148663-1-tirtajames45@gmail.com> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org The current implementation does not check for an early match with memchr before initializing the shift table because large shifts may be faster than memchr. However, for short shifts, memchr may be faster. find_edge_in_needle (taken from strstr_avx512) is used to find a rarer character for memchr to find. bench-memmem timings (Core i5 8400): memmem_new memmem memmem_new_use_first_char average: 1679.59 2502.39 ~1940.1 total: 440053 655625 ~508305 Passes test-memmem. --- string/memmem.c | 43 +++++++++++++++++++++++++++++++++---------- 1 file changed, 33 insertions(+), 10 deletions(-) diff --git a/string/memmem.c b/string/memmem.c index a4117f8e1e..772c795caf 100644 --- a/string/memmem.c +++ b/string/memmem.c @@ -38,6 +38,19 @@ in smaller shifts. */ #define hash2(p) (((size_t)(p)[0] - ((size_t)(p)[-1] << 3)) % sizeof (shift)) +/* Returns the first edge within the needle, returns original S + if no edge is found. Example: 'ab' is the first edge in 'aaaaaaaaaabaarddg'. + N must be > 0. */ +static inline unsigned char *__attribute__ ((always_inline)) +find_edge_in_needle (const unsigned char *s, size_t n) +{ + unsigned char *const ret = (unsigned char *) s; + for (; --n; ++s) + if (*(s + 1) != *s) + return (unsigned char *) s + 1; + return ret; +} + /* Fast memmem algorithm with guaranteed linear-time performance. Small needles up to size 2 use a dedicated linear search. Longer needles up to size 256 use a novel modified Horspool algorithm. It hashes pairs @@ -58,8 +71,6 @@ __memmem (const void *haystack, size_t hs_len, if (ne_len == 0) return (void *) hs; - if (ne_len == 1) - return (void *) memchr (hs, ne[0], hs_len); /* Ensure haystack length is >= needle length. */ if (hs_len < ne_len) @@ -67,17 +78,29 @@ __memmem (const void *haystack, size_t hs_len, const unsigned char *end = hs + hs_len - ne_len; - if (ne_len == 2) + /* Check for an early match before initializing the shift table. Only done + for short needles where memchr may be faster than shifting around with the + table. */ + if (ne_len <= sizeof (unsigned long) * 4) { - uint32_t nw = ne[0] << 16 | ne[1], hw = hs[0] << 16 | hs[1]; - for (hs++; hs <= end && hw != nw; ) - hw = hw << 16 | *++hs; - return hw == nw ? (void *)hs - 1 : NULL; + const unsigned int edge = find_edge_in_needle (ne, ne_len) - ne; + hs = memchr (hs + edge, *((char *) ne + edge), + (hs_len - edge) - (ne_len - edge) + 1); + if (hs == NULL || memcmp (hs -= edge, ne, ne_len) == 0) + return (void *) hs; + if (ne_len == 2) + { + uint32_t nw = ne[0] << 16 | ne[1], hw = hs[0] << 16 | hs[1]; + for (hs++; hs <= end && hw != nw;) + hw = hw << 16 | *++hs; + return hw == nw ? (void *) (hs - 1) : NULL; + } } - /* Use Two-Way algorithm for very long needles. */ - if (__builtin_expect (ne_len > 256, 0)) - return two_way_long_needle (hs, hs_len, ne, ne_len); + else if (__glibc_unlikely (ne_len > 256)) + { + return two_way_long_needle (hs, hs_len, ne, ne_len); + } uint8_t shift[256]; size_t tmp, shift1;