From patchwork Sat Dec 16 04:33:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Tirta Halim X-Patchwork-Id: 82291 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 780843858424 for ; Sat, 16 Dec 2023 04:37:57 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230]) by sourceware.org (Postfix) with ESMTPS id 2DC5F3858C30 for ; Sat, 16 Dec 2023 04:37:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2DC5F3858C30 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2DC5F3858C30 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::230 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702701464; cv=none; b=wmkBMA4MDthI/Sr+6UY1wx9yq11sl7BLcTA1TKJpw3oKhD+3fGzW8sVr0Uuo22Z1QRpgfa50WcNH911ePssbtNhoM/QiRpHqo8vlcLdteA6+XaKcEKIXKczbyFlrDrUrvbTCsTAUZPmM/gfER1KKf5t5XXOpMz1WWnkq2LtiF1I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702701464; c=relaxed/simple; bh=OCPQQgGO/8NPTKNZCqUw1gimR4uomwgAbLNNhrPDDXM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=vOr4hNUgAnXzLuIykahPbetLAKDQ21IAlM8yeYi/FFujM5wmJsD/a96fX2EB1D52oMRlA5p++yf/Xnx7mqIxcBlYwGJE5wuft3DIpv7QBUvQxe48hnentmfQcZ3aDSUF2HymFhlqRVNHmNwTOxM0FimPKAHIl0FpgEAU3i1+J8c= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oi1-x230.google.com with SMTP id 5614622812f47-3ba14203a34so1205041b6e.1 for ; Fri, 15 Dec 2023 20:37:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702701462; x=1703306262; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=80JQ10EIJO6xluR4lW1PlRBqkblmyt7R6ny7EiIJjPw=; b=fAdpFm5ZTHRxkMkueJ6IzjXaXCawmfF9D2VqLlbk7xzdGzsgSMkEv8f9n4gYcL2cac Apq3IPgjERXB2vjwOTAuu48BQLVty+CnN0ty9OtdMN0oFzjWqKtQ0Cy/5/+CHZjK6/z7 ewYG2sVlSjr8M22zmaspeBvm9iCrzyb32L9WgmPtgDzzOf49aYlTh2WqvZ9IOa9HalZc jfAhMOcqWgteivH2b9EL1+YFicJgFHJs2voBVdoE8YY3OxTP5UNsxihWPc+63qFfSomb gxUwHdDsazYroM/4lbOKEeoD0WBlLuywy9oa98MN4jKxEnJ9HK3e7xATZXPhfUqyUBK+ DKfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702701462; x=1703306262; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=80JQ10EIJO6xluR4lW1PlRBqkblmyt7R6ny7EiIJjPw=; b=vARg4mTsY5psybe9bQUO2EUVfSr2bQ7+tIrqBmp4rHCzOIMILUQbsnO3PsMG5mNrGu 6OGleq1AW1drDZOMamGPAo4iTvr8MuHm5Ne0r4AHEfsfjAGg9NEF864TQCJc6nM7Uo1z wgkGLU1ZaW3nYyhuYDhffU7IobHx0ETKuqNj7LtQiWC+21Hm5vR3lwLgEU5zK6cMNrNs twdiZpb4q3OR/DQMnDagCY389B0HOvIOW0wrUbi+Cy0f74pgAAVyxJ8UhHq/DHFoSNPF vWFQq3/9uK8V4RGjlkurbpNA1Mc87hTJVnZfDLgDanT3NNezEuGSxQlrLZ2okfHdTBvt goWA== X-Gm-Message-State: AOJu0YzXHRu6F2iqsZDDBs1B3o37vPbL1Vqszz2YFlZI9HuuGh+UXp1h jrHZDW2sAmMl7JJ2TR3GpLw= X-Google-Smtp-Source: AGHT+IHMVGobmonNJT+NetqN5rngSYi1IsdxUQsZn0W2LdEv5MkRJCTExQDILdARv4AeuURgC8vtVg== X-Received: by 2002:a05:6808:448e:b0:3b8:6096:f8a with SMTP id eq14-20020a056808448e00b003b860960f8amr18315906oib.49.1702701461782; Fri, 15 Dec 2023 20:37:41 -0800 (PST) Received: from localhost.localdomain ([2001:448a:20a0:5ec:16dc:2257:aa0b:ab86]) by smtp.gmail.com with ESMTPSA id u23-20020a62d457000000b006ce9e9d27c7sm14862696pfl.129.2023.12.15.20.37.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 20:37:41 -0800 (PST) From: James Tirta Halim To: carlos@redhat.com Cc: goldstein.w.n@gmail.com, libc-alpha@sourceware.org, skpgkp2@gmail.com, tirtajames45@gmail.com Subject: [PATCH] sysdeps/x86_64/multiarch/memmem-avx2.c: add memmem-avx2.c Date: Sat, 16 Dec 2023 11:33:34 +0700 Message-ID: <20231216043334.72176-1-tirtajames45@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Find the rarest byte in NE. Do a naive loop until HS is aligned. Once aligned, find the parts of HS that matches the rare byte and the byte after it, shift back to the position of HS that should match NE and do a memcmp. Average timings (Core i5 8400): __memmem_avx2 basic_memmem twoway_memmem memmem 1342.942864 19100.87074 3335.335377 2745.971856 --- sysdeps/x86_64/multiarch/memmem-avx2.c | 72 ++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 sysdeps/x86_64/multiarch/memmem-avx2.c diff --git a/sysdeps/x86_64/multiarch/memmem-avx2.c b/sysdeps/x86_64/multiarch/memmem-avx2.c new file mode 100644 index 0000000000..524d0fe45f --- /dev/null +++ b/sysdeps/x86_64/multiarch/memmem-avx2.c @@ -0,0 +1,72 @@ +#include +#include +#include +#include + +static inline void * +__find_rarest_byte (const void *ne, + size_t n) +{ + static const unsigned char rarebyte_table[256] = { 0, 1, 13, 56, 59, 60, 61, 62, 63, 232, 248, 2, 158, 4, 5, 6, 7, 8, 9, 10, 14, 20, 26, 29, 37, 46, 52, 53, 54, 55, 57, 58, 255, 172, 242, 193, 162, 174, 178, 182, 218, 219, 212, 180, 249, 197, 221, 210, 253, 231, 230, 224, 225, 226, 227, 223, 222, 220, 176, 213, 184, 229, 188, 164, 159, 209, 181, 203, 189, 216, 196, 192, 185, 205, 161, 168, 215, 187, 211, 194, 195, 165, 206, 204, 214, 198, 173, 179, 175, 183, 167, 202, 239, 201, 160, 241, 163, 246, 233, 238, 240, 254, 237, 208, 234, 250, 169, 186, 236, 217, 245, 243, 228, 170, 247, 244, 251, 235, 199, 200, 252, 207, 177, 191, 171, 190, 166, 3, 140, 134, 124, 126, 86, 128, 95, 117, 114, 93, 81, 87, 132, 96, 112, 97, 103, 82, 139, 89, 98, 88, 119, 74, 156, 115, 104, 75, 120, 106, 76, 155, 90, 122, 107, 125, 152, 145, 136, 137, 101, 116, 102, 108, 99, 141, 77, 78, 118, 79, 109, 100, 150, 73, 94, 72, 121, 151, 113, 135, 110, 105, 83, 91, 11, 12, 64, 149, 146, 111, 65, 69, 66, 15, 16, 17, 18, 19, 130, 92, 144, 123, 21, 22, 23, 24, 131, 133, 127, 142, 25, 70, 129, 27, 28, 67, 153, 84, 143, 138, 147, 157, 148, 68, 71, 30, 31, 32, 33, 34, 35, 36, 154, 38, 39, 40, 41, 42, 80, 43, 44, 45, 47, 48, 85, 49, 50, 51 }; + const unsigned char *rare = (const unsigned char *) ne; + const unsigned char *p = (const unsigned char *) ne; + int c_rare = rarebyte_table[*rare]; + int c; + for (; n--; ++p) + { + c = rarebyte_table[*p]; + if (c < c_rare) { + rare = p; + c_rare = c; + } + } + return (void *) rare; +} + +void * +__memmem_avx2 (const void *hs, + size_t hs_len, + const void *ne, + size_t ne_len) +{ + if (ne_len == 1) + return (void *) memchr (hs, *(unsigned char *) ne, hs_len); + if (__glibc_unlikely (ne_len == 0)) + return (void *) hs; + if (__glibc_unlikely (hs_len < ne_len)) + return NULL; + const unsigned char *h = (const unsigned char *) hs; + const unsigned char *const end = h + hs_len - ne_len; + size_t shift = PTR_DIFF (__find_rarest_byte (ne, ne_len), ne); + if (shift == ne_len - 1) + --shift; + h += shift; + for (; !PTR_IS_ALIGNED (h, sizeof (__m256i)); ++h) + { + if (__glibc_unlikely (h - shift > end)) + return NULL; + if (*h == *((unsigned char *) ne + shift) && !memcmp (h - shift, ne, ne_len)) + return (void *) (h - shift); + } + const __m256i nv = _mm256_set1_epi8 (*((char *) ne + shift)); + const __m256i nv1 = _mm256_set1_epi8 (*((char *) ne + shift + 1)); + __m256i hv, hv1; + uint32_t i, hm0, hm1, m; + for (; h - shift <= end; h += sizeof (__m256i)) { + hv = _mm256_load_si256 ((const __m256i *) h); + hv1 = _mm256_loadu_si256 ((const __m256i *) (h + 1)); + hm0 = (uint32_t) _mm256_movemask_epi8 (_mm256_cmpeq_epi8 (hv, nv)); + hm1 = (uint32_t) _mm256_movemask_epi8 (_mm256_cmpeq_epi8 (hv1, nv1)); + m = hm0 & hm1; + while (m) + { + i = _tzcnt_u32 (m); + m = _blsr_u32 (m); + if (__glibc_unlikely (h + i - shift > end)) + return NULL; + if (!memcmp (h + i - shift, ne, ne_len)) + return (char *) h + i - shift; + } + } + return NULL; +}