From patchwork Wed Feb 3 05:39:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: develop--- via Libc-alpha X-Patchwork-Id: 41912 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0F9D9398C83B; Wed, 3 Feb 2021 05:39:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0F9D9398C83B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1612330757; bh=VzBg7K7i7cjvzZEyY3fkxaMtC57krHJo0I7p9tsKRl4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=HXdbVfiIZQhSyXi2KZumq8D4ar+7nHy9uJdTTCrF/vcm+W2sEfjwfxeVOi1oFIION DYrKrCF0Pndrgj22SI9cD+YNAMdic1ZsEAEOp9gHyDowxz3EUDvOPg1otVGf42Mxdn EoGaENGkWpbSSOYMa2HD6mP7mTS0aOthiU6IoLBM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by sourceware.org (Postfix) with ESMTPS id 8A20D386F83F for ; Wed, 3 Feb 2021 05:39:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8A20D386F83F Received: by mail-pj1-x1034.google.com with SMTP id q72so2321618pjq.2 for ; Tue, 02 Feb 2021 21:39:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VzBg7K7i7cjvzZEyY3fkxaMtC57krHJo0I7p9tsKRl4=; b=JPt7BCqlY6SttWjvuVegoorrjOXgy6kNHcD/xmbisav9sxvzkCYcDsLlOBGyDxpA1Y Wv00rmL8MJAyYCr2bUVJEy06W89qy1IsXBdoeDQ4anmhZHxzYoaLMC/y9DRy1HoEloEO 9fpIJJD/+3uTtXdaTcznQFK7u+mwplUzC4E4U4GfRW4TjE0x4AerT4IAhO3hMEbVmBf7 g6M2PsCBEn9GxiJZyUPUc/sFM2YPYhAmG5s6RrdapOPQA73VaSyqUCUkMS4hwZ4IQQit Yq12n9ZTrojmVscgLGf6G3aimT1LwVEmEb6j7bZ9Q3UF9hCv9YvdA3rf5O1P0/lYRfvj pzfg== X-Gm-Message-State: AOAM532lfOo1E6wHH8V988uQMG5h7mLPjT0KAH/WewykEjxNNjR6OdLO QC0WoNCytZiIfV2pabGevuyL3jmlS5tIww== X-Google-Smtp-Source: ABdhPJwnglMOJEkHZr8eee22zlnDFSOk32YdX1OtvsIxoxtXgdM8LsUvICWziKkCxVsYHwjs1L0D5w== X-Received: by 2002:a17:902:ea0d:b029:e1:54c:6344 with SMTP id s13-20020a170902ea0db02900e1054c6344mr1754931plg.22.1612330753525; Tue, 02 Feb 2021 21:39:13 -0800 (PST) Received: from localhost.localdomain (c-73-241-149-213.hsd1.ca.comcast.net. [73.241.149.213]) by smtp.googlemail.com with ESMTPSA id s9sm713378pfd.38.2021.02.02.21.39.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Feb 2021 21:39:13 -0800 (PST) To: libc-alpha@sourceware.org Subject: [PATCH v4 2/2] x86: Add additional benchmarks and tests for strchr Date: Wed, 3 Feb 2021 00:39:00 -0500 Message-Id: <20210203053900.4125403-2-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210203053900.4125403-1-goldstein.w.n@gmail.com> References: <20210203053900.4125403-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "goldstein.w.n--- via Libc-alpha" From: develop--- via Libc-alpha Reply-To: goldstein.w.n@gmail.com Cc: goldstein.w.n@gmail.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" From: noah This patch adds additional benchmarks and tests for string size of 4096 and several benchmarks for string size 256 with different alignments. Signed-off-by: noah --- Added 2 additional benchmark and test sizes: 4096: Just feels like a natural "large" size to test 256 with multiple alignments: This essentially is to test how expensive the initial work prior to the 4x loop is depending on different alignments. results from bench-strchr: All times are in seconds and the medium of 100 runs. Old is current strchr-avx2.S implementation. New is this patch. Summary: New is definetly faster for medium -> large sizes. Once the 4x loop is hit there is a 10%+ speedup and New always wins out. For smaller sizes there is more variance as to which is faster and the differences are small. Generally it seems the New version wins out. This is likely because 0 - 31 sized strings are the fast path for new (no jmp). Also something that is neat is the significant performance improved for alignment 96 and 112. This is because the 5x vectors before 4x loop really favor that alignment. Benchmarking CPU: Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz size, algn, Old T , New T -------- Win Dif 0 , 0 , 2.54 , 2.52 -------- New -0.02 1 , 0 , 2.57 , 2.52 -------- New -0.05 2 , 0 , 2.56 , 2.52 -------- New -0.04 3 , 0 , 2.58 , 2.54 -------- New -0.04 4 , 0 , 2.61 , 2.55 -------- New -0.06 5 , 0 , 2.65 , 2.62 -------- New -0.03 6 , 0 , 2.73 , 2.74 -------- Old -0.01 7 , 0 , 2.75 , 2.74 -------- New -0.01 8 , 0 , 2.62 , 2.6 -------- New -0.02 9 , 0 , 2.73 , 2.75 -------- Old -0.02 10 , 0 , 2.74 , 2.74 -------- Eq N/A 11 , 0 , 2.76 , 2.72 -------- New -0.04 12 , 0 , 2.74 , 2.72 -------- New -0.02 13 , 0 , 2.75 , 2.72 -------- New -0.03 14 , 0 , 2.74 , 2.73 -------- New -0.01 15 , 0 , 2.74 , 2.73 -------- New -0.01 16 , 0 , 2.74 , 2.73 -------- New -0.01 17 , 0 , 2.74 , 2.74 -------- Eq N/A 18 , 0 , 2.73 , 2.73 -------- Eq N/A 19 , 0 , 2.73 , 2.73 -------- Eq N/A 20 , 0 , 2.73 , 2.73 -------- Eq N/A 21 , 0 , 2.73 , 2.72 -------- New -0.01 22 , 0 , 2.71 , 2.74 -------- Old -0.03 23 , 0 , 2.71 , 2.69 -------- New -0.02 24 , 0 , 2.68 , 2.67 -------- New -0.01 25 , 0 , 2.66 , 2.62 -------- New -0.04 26 , 0 , 2.64 , 2.62 -------- New -0.02 27 , 0 , 2.71 , 2.64 -------- New -0.07 28 , 0 , 2.67 , 2.69 -------- Old -0.02 29 , 0 , 2.72 , 2.72 -------- Eq N/A 30 , 0 , 2.68 , 2.69 -------- Old -0.01 31 , 0 , 2.68 , 2.68 -------- Eq N/A 32 , 0 , 3.51 , 3.52 -------- Old -0.01 32 , 1 , 3.52 , 3.51 -------- New -0.01 64 , 0 , 3.97 , 3.93 -------- New -0.04 64 , 2 , 3.95 , 3.9 -------- New -0.05 64 , 1 , 4.0 , 3.93 -------- New -0.07 64 , 3 , 3.97 , 3.88 -------- New -0.09 64 , 4 , 3.95 , 3.89 -------- New -0.06 64 , 5 , 3.94 , 3.9 -------- New -0.04 64 , 6 , 3.97 , 3.9 -------- New -0.07 64 , 7 , 3.97 , 3.91 -------- New -0.06 96 , 0 , 4.74 , 4.52 -------- New -0.22 128 , 0 , 5.29 , 5.19 -------- New -0.1 128 , 2 , 5.29 , 5.15 -------- New -0.14 128 , 3 , 5.31 , 5.22 -------- New -0.09 256 , 0 , 11.19 , 9.81 -------- New -1.38 256 , 3 , 11.19 , 9.84 -------- New -1.35 256 , 4 , 11.2 , 9.88 -------- New -1.32 256 , 16 , 11.21 , 9.79 -------- New -1.42 256 , 32 , 11.39 , 10.34 -------- New -1.05 256 , 48 , 11.88 , 10.56 -------- New -1.32 256 , 64 , 11.82 , 10.83 -------- New -0.99 256 , 80 , 11.85 , 10.86 -------- New -0.99 256 , 96 , 9.56 , 8.76 -------- New -0.8 256 , 112 , 9.55 , 8.9 -------- New -0.65 512 , 0 , 15.76 , 13.72 -------- New -2.04 512 , 4 , 15.72 , 13.74 -------- New -1.98 512 , 5 , 15.73 , 13.74 -------- New -1.99 1024, 0 , 24.85 , 21.33 -------- New -3.52 1024, 5 , 24.86 , 21.27 -------- New -3.59 1024, 6 , 24.87 , 21.32 -------- New -3.55 2048, 0 , 45.75 , 36.7 -------- New -9.05 2048, 6 , 43.91 , 35.42 -------- New -8.49 2048, 7 , 44.43 , 36.37 -------- New -8.06 4096, 0 , 96.94 , 81.34 -------- New -15.6 4096, 7 , 97.01 , 81.32 -------- New -15.69 benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++- string/test-strchr.c | 26 +++++++++++++++++++++++++- 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c index bf493fe458..4ce2369d9b 100644 --- a/benchtests/bench-strchr.c +++ b/benchtests/bench-strchr.c @@ -100,7 +100,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char) size_t i; CHAR *result; CHAR *buf = (CHAR *) buf1; - align &= 15; + align &= 127; if ((align + len) * sizeof (CHAR) >= page_size) return; @@ -151,12 +151,24 @@ test_main (void) do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR); } + for (i = 1; i < 8; ++i) + { + do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR); + do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR); + } + for (i = 1; i < 8; ++i) { do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR); do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR); } + for (i = 0; i < 8; ++i) + { + do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR); + do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR); + } + for (i = 0; i < 32; ++i) { do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR); @@ -169,12 +181,24 @@ test_main (void) do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR); } + for (i = 1; i < 8; ++i) + { + do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR); + do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR); + } + for (i = 1; i < 8; ++i) { do_test (i, 64, 256, 0, MIDDLE_CHAR); do_test (i, 64, 256, 0, BIG_CHAR); } + for (i = 0; i < 8; ++i) + { + do_test (16 * i, 256, 512, 0, MIDDLE_CHAR); + do_test (16 * i, 256, 512, 0, BIG_CHAR); + } + for (i = 0; i < 32; ++i) { do_test (0, i, i + 1, 0, MIDDLE_CHAR); diff --git a/string/test-strchr.c b/string/test-strchr.c index 5b6022746c..2cf4ea2add 100644 --- a/string/test-strchr.c +++ b/string/test-strchr.c @@ -130,7 +130,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char) size_t i; CHAR *result; CHAR *buf = (CHAR *) buf1; - align &= 15; + align &= 127; if ((align + len) * sizeof (CHAR) >= page_size) return; @@ -259,12 +259,24 @@ test_main (void) do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR); } + for (i = 1; i < 8; ++i) + { + do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR); + do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR); + } + for (i = 1; i < 8; ++i) { do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR); do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR); } + for (i = 0; i < 8; ++i) + { + do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR); + do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR); + } + for (i = 0; i < 32; ++i) { do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR); @@ -277,12 +289,24 @@ test_main (void) do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR); } + for (i = 1; i < 8; ++i) + { + do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR); + do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR); + } + for (i = 1; i < 8; ++i) { do_test (i, 64, 256, 0, MIDDLE_CHAR); do_test (i, 64, 256, 0, BIG_CHAR); } + for (i = 0; i < 8; ++i) + { + do_test (16 * i, 256, 512, 0, MIDDLE_CHAR); + do_test (16 * i, 256, 512, 0, BIG_CHAR); + } + for (i = 0; i < 32; ++i) { do_test (0, i, i + 1, 0, MIDDLE_CHAR);