From patchwork Wed May 11 03:06:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 53773 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DC324385803D for ; Wed, 11 May 2022 03:10:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DC324385803D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652238640; bh=xjFgK9x7aoibmKeBK1V0q16/tTBHTGwpNgrBDgvck8o=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=dssHQqv8UJHcnEAq6b9N5f9JbOPEUNguQB2i4wjkKMWAtXR90Tw0YiTo5T4g0kSgL 43Qhl5t9oL2hFkv/p9mZuAhyfmBUMppWy7xw+HfTkOuD+L+ndWUbU1hir9EHK8SWr7 FusgWqDagZb54XlhlEruJlRNYAITV8xafw3ujMKE= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-io1-xd2f.google.com (mail-io1-xd2f.google.com [IPv6:2607:f8b0:4864:20::d2f]) by sourceware.org (Postfix) with ESMTPS id C3FB63857367 for ; Wed, 11 May 2022 03:06:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C3FB63857367 Received: by mail-io1-xd2f.google.com with SMTP id i20so829793ion.0 for ; Tue, 10 May 2022 20:06:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xjFgK9x7aoibmKeBK1V0q16/tTBHTGwpNgrBDgvck8o=; b=l33PID19juA+230iUJrQhP/3GUtfuAZVg3FR/9PAJ4gu+5OHV8BVvSejhcwiTMKlXX nvgG9dNqP6eVHeLniU2lhiIVl5O5OpbkK8Or9HA7E6uWFJaIsrGSPi1MwQQbpwm+AoZo XoeA0uI0ndQ28OS2lvCNiqgTQNiW6g5FvzYnaoqPFkgXceVtX3b26XoEXHvqsP2qxW+J dKv91jX2+hyOEicz3OEsHTX/bhD85BJovjpnKkAUBTFownJP0mMoGlNmChNq/UL/B4t+ 60RX+1GmC4B6lXh7UaRHY+pGw+UfzRS2y3YLm4oGrfaa8VnvCAKXpVA30+EajrMqlFE8 dFEA== X-Gm-Message-State: AOAM533s0+bRQGwLmoZpFipcBKXrQ8jpdC21CtwgBVRE6lQvXTzfWjYR 7zO7B3/BMhs0OwCQJoGYrYOR69H8f0Y= X-Google-Smtp-Source: ABdhPJzXBRXjG5Elem8XXXYCZiTMkeL43Kpjw9a28GAUc5SN8/u+fPKedMtLxc/0J+uU9wWm+0e6sw== X-Received: by 2002:a05:6638:2116:b0:32b:7d73:67f1 with SMTP id n22-20020a056638211600b0032b7d7367f1mr11550662jaj.135.1652238408336; Tue, 10 May 2022 20:06:48 -0700 (PDT) Received: from noah-tgl.. (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.gmail.com with ESMTPSA id h4-20020a92c264000000b002cf5aae6645sm324798ild.2.2022.05.10.20.06.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 May 2022 20:06:47 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Date: Tue, 10 May 2022 22:06:35 -0500 Message-Id: <20220511030635.154689-6-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220511030635.154689-1-goldstein.w.n@gmail.com> References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Cc: Alexander Monakov Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Unroll slightly and enforce good instruction scheduling. This improves performance on out-of-order machines. Note the unrolling allows for pipelined multiplies which helps a bit, but most of the gain is from enforcing better instruction scheduling for more ILP. Unrolling further started to induce slowdowns for sizes [0, 4] but can help the loop so if larger sizes are the target further unrolling can be beneficial. Results for _dl_new_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=30 runs Geometric of all benchmark New / Old: 0.674 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 2.865, 2.72, 1.053 fixed, 1, 3.567, 2.489, 1.433 fixed, 2, 2.577, 3.649, 0.706 fixed, 3, 3.644, 5.983, 0.609 fixed, 4, 4.211, 6.833, 0.616 fixed, 5, 4.741, 9.372, 0.506 fixed, 6, 5.415, 9.561, 0.566 fixed, 7, 6.649, 10.789, 0.616 fixed, 8, 8.081, 11.808, 0.684 fixed, 9, 8.427, 12.935, 0.651 fixed, 10, 8.673, 14.134, 0.614 fixed, 11, 10.69, 15.408, 0.694 fixed, 12, 10.789, 16.982, 0.635 fixed, 13, 12.169, 18.411, 0.661 fixed, 14, 12.659, 19.914, 0.636 fixed, 15, 13.526, 21.541, 0.628 fixed, 16, 14.211, 23.088, 0.616 fixed, 32, 29.412, 52.722, 0.558 fixed, 64, 65.41, 142.351, 0.459 fixed, 128, 138.505, 295.625, 0.469 fixed, 256, 291.707, 601.983, 0.485 random, 2, 12.698, 12.849, 0.988 random, 4, 16.065, 15.857, 1.013 random, 8, 19.564, 21.105, 0.927 random, 16, 23.919, 26.823, 0.892 random, 32, 31.987, 39.591, 0.808 random, 64, 49.282, 71.487, 0.689 random, 128, 82.23, 145.364, 0.566 random, 256, 152.209, 298.434, 0.51 Co-authored-by: Alexander Monakov --- elf/dl-new-hash.h | 66 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 61 insertions(+), 5 deletions(-) diff --git a/elf/dl-new-hash.h b/elf/dl-new-hash.h index 40d88c81f9..5269f6eb98 100644 --- a/elf/dl-new-hash.h +++ b/elf/dl-new-hash.h @@ -20,15 +20,71 @@ #define _DL_NEW_HASH_H 1 #include +/* For __glibc_unlikely. */ +#include + +/* The simplest implementation of _dl_new_hash is: + +_dl_new_hash (const char *s) +{ + uint32_t h = 5381; + for (unsigned char c = *s; c != '\0'; c = *++s) + h = h * 33 + c; + return h; +} + + We can get better performance, however, by slightly unrolling the + loop and explicitly specifying order of operations to prevent + reassosiation of instructions and ensure ideal scheduling. */ static inline uint32_t __attribute__ ((unused)) -_dl_new_hash (const char *s) +_dl_new_hash (const char *str) { - uint32_t h = 5381; - for (unsigned char c = *s; c != '\0'; c = *++s) - h = h * 33 + c; - return h; + const unsigned char *s = (const unsigned char *) str; + unsigned int h = 5381; + unsigned int c0, c1; + for (;;) + { + c0 = s[0]; + /* Unlikely length zero string so evens will be slightly less + common. */ + if (__glibc_unlikely (c0 == 0)) + return h; + + c1 = s[1]; + if (c1 == 0) + { + c0 += h; + /* Ideal instruction scheduling is: + c0 += h; + h *= 32; + h += c0; + + The asm statements are to prevent reassosiation that would result in + more instruction interdependencies and worse scheduling. */ + asm("" : "+r"(h) : "r"(c0)); + h = h * 32 + c0; + return h; + } + + /* Ideal instruction scheduling is: + c1 += c0; + h *= 33 * 33; + c0 *= 32; + c1 += c0; + h += c1; + + The asm statements are to prevent reassosiation that would result in + more instruction interdependencies and worse scheduling. */ + c1 += c0; + asm("" : "+r"(c1), "+r"(c0)); + h *= 33 * 33; + c1 += c0 * 32; + asm("" : "+r"(c1)); + h += c1; + s += 2; + } }