From patchwork Mon Dec 11 17:33:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Tirta Halim X-Patchwork-Id: 81927 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D51BF3858407 for ; Mon, 11 Dec 2023 17:37:18 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id B74123858D28 for ; Mon, 11 Dec 2023 17:37:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B74123858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B74123858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702316227; cv=none; b=H51e8UfZIO7D7RINAkuMi6N3w37VgqMGIFPTthVPwVRBRsZ4AUuUXrR0bY0BI5IMK0MKtBLJuDEnPfjm9p8Oq42EtLwTA9HG91KcSQj9l0jQQp0MCDnGfrZwNU+petkXDUeWQMTtMv0Br+udo5oTKiIf/kKOkSJ5jDprQdakxGY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702316227; c=relaxed/simple; bh=BAs3BoYv9T9gV4uHm3uYyCyPVPRGRijSi8UbkUzXsQI=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=jK5JhvC5NJj2gJRBz4BPlrPppRbqeFfbyn4AKz/WJdsyrE4G0hzuGeIPz40Vy80uwz0xlYnyHSXFCoxSk1O96LASHtK4bxJ+4tLUzpPPvq/D69cCJTZXU9ybQ/76HpHkrp8apKdpcBhu4nFef7MY2zUzHoRTqQXDvvV9/GWTuNw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-5c68da9d639so2689470a12.3 for ; Mon, 11 Dec 2023 09:37:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702316224; x=1702921024; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=suw1mdALDh7UEA94m72JLiU0I19YLQhZppoS49eyQxE=; b=giwjek/9cQe99cXkht8AyUdilYFzHIKOrdI7rg0FfDFmG/IoaeZ4MsVFNvh9zwY0SS h10SwkxIXd/V2KPC4uYw7ryTA+9GBODFpbki31FOT0sc053gk00IUbo8ahPO+siUQp3i FJqd8d3J8nk03M4IUAJm5oJ9HgOqukDW/+afLLf+Fhr3X7J4HCIGxYogEYnWGENWpvXk ehUybx+T9JaOfJmXkFaAkUK+nj982NhtMz/ClghU/jwB1nAgxD4qJ03RiRFSf/IbhbVK dDHjTptfGUXoLmePHzo/NstgNNp1gCTdVZ8Y1Wo8aNJQWU9E8M2lAZRwW5fPeKBm1aEL QHfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702316224; x=1702921024; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=suw1mdALDh7UEA94m72JLiU0I19YLQhZppoS49eyQxE=; b=la5j1kmA5SFlg373ebL678xzh47PzTkd8/suGu8KfKsBvAeIapd+HgQKNPgnKHsU2m BBBFKA5Xmxv5PU7BCC2/RfyFH5BG06rZoVpt8tGkoNh1sFPg3nSBehOsj5bAq+WlOtQ4 lr4T51mXoDDaekeU+zKeyRujIdcICZbp2KiRLsB2izh8lAV9Ii6jX875vpCH69dHJJxx uP9PdehmQBlBUVNYysyx6IAimvoFgvjF3te0ZUfNV3xumbdEAG0DnOKx6Chx4XG039iK 2RDToYefL9ExCGB9NP1+LX1hLCoTCbFp/EP4w/OIVgonBCqH1saVIJAMjkZ9rANrJAdE L7UQ== X-Gm-Message-State: AOJu0YwrKjYSnRwt+v+2jbksA2XvNnUr84p4UPq8qm2EESXR5n2PKrl4 X0iuQFKzZfc9dCh4ZxwM2ZhXq7L+LTOkbg== X-Google-Smtp-Source: AGHT+IE1c7/CBRsgNa/JpqmJ0n8tuY0Wb38D3FRPSfNJ6ksoEnrzYmtKl0LEsfvkXVDFJJudFy0LAw== X-Received: by 2002:a05:6a20:3ca1:b0:190:86b:cce9 with SMTP id b33-20020a056a203ca100b00190086bcce9mr2562725pzj.24.1702316223762; Mon, 11 Dec 2023 09:37:03 -0800 (PST) Received: from localhost.localdomain ([2001:448a:20a0:566a:d0c2:108c:e5ff:5f7f]) by smtp.gmail.com with ESMTPSA id g25-20020aa78759000000b006d0951e74cbsm1693413pfo.178.2023.12.11.09.37.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 09:37:03 -0800 (PST) From: James Tirta Halim To: libc-alpha@sourceware.org Cc: James Tirta Halim Subject: [PATCH] sysdeps/memmem-avx2.c: add memmem-avx2.c Date: Tue, 12 Dec 2023 00:33:20 +0700 Message-ID: <20231211173320.702246-1-tirtajames45@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org --- sysdeps/x86_64/multiarch/memmem-avx2.c | 55 ++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 sysdeps/x86_64/multiarch/memmem-avx2.c diff --git a/sysdeps/x86_64/multiarch/memmem-avx2.c b/sysdeps/x86_64/multiarch/memmem-avx2.c new file mode 100644 index 0000000000..b0cced73aa --- /dev/null +++ b/sysdeps/x86_64/multiarch/memmem-avx2.c @@ -0,0 +1,55 @@ +#include +#include +#include +#include + +void * +__memmem_avx2 (const void *hs, size_t hs_len, const void *ne, size_t ne_len) +{ + if (ne_len == 1) + return (void *) memchr (hs, *(unsigned char *) ne, hs_len); + if (__glibc_unlikely (ne_len == 0)) + return (void *) hs; + if (__glibc_unlikely (hs_len == ne_len)) + return !memcmp (hs, ne, ne_len) ? (void *) hs : NULL; + if (__glibc_unlikely (hs_len < ne_len)) + return NULL; + const __m256i nv = _mm256_set1_epi8 (*(char *) ne); + const unsigned char *h = (const unsigned char *) hs; + const unsigned char *n = (const unsigned char *) ne; + const unsigned char *const end = h + hs_len - ne_len; + const int c1 = *(n + 1); + n += 2, ne_len -= 2; + __m256i hv; + uint32_t i, m; + if (!PTR_IS_ALIGNED (h)) { + hv = _mm256_loadu_si256 ((const __m256i *) h); + m = (uint32_t) _mm256_movemask_epi8 (_mm256_cmpeq_epi8 (hv, nv)); + for (; m; m = _blsr_u32 (m)) { + i = _tzcnt_u32 (m); + if (__glibc_unlikely (h + i > end)) + return NULL; + if (*(h + i + 1) == c1 && !memcmp (h + i + 2, n, ne_len)) + return (char *) h + i; + } + h += sizeof (__m256i); + if (__glibc_unlikely (h > end)) + return NULL; + h = (const unsigned char *) PTR_ALIGN_UP (h, sizeof (__m256i)); + } + for (;;) { + hv = _mm256_load_si256 ((const __m256i *) h); + m = (uint32_t) _mm256_movemask_epi8 (_mm256_cmpeq_epi8 (hv, nv)); + for (; m; m = _blsr_u32 (m)) { + i = _tzcnt_u32 (m); + if (__glibc_unlikely (h + i > end)) + return NULL; + if (*(h + i + 1) == c1 && !memcmp (h + i + 2, n, ne_len)) + return (char *) h + i; + } + h += sizeof (__m256i); + if (__glibc_unlikely (h > end)) + return NULL; + } + return NULL; +}