From patchwork Wed May 11 03:06:35 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Noah Goldstein <goldstein.w.n@gmail.com>
X-Patchwork-Id: 53773
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id DC324385803D
	for <patchwork@sourceware.org>; Wed, 11 May 2022 03:10:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DC324385803D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1652238640;
	bh=xjFgK9x7aoibmKeBK1V0q16/tTBHTGwpNgrBDgvck8o=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=dssHQqv8UJHcnEAq6b9N5f9JbOPEUNguQB2i4wjkKMWAtXR90Tw0YiTo5T4g0kSgL
	 43Qhl5t9oL2hFkv/p9mZuAhyfmBUMppWy7xw+HfTkOuD+L+ndWUbU1hir9EHK8SWr7
	 FusgWqDagZb54XlhlEruJlRNYAITV8xafw3ujMKE=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail-io1-xd2f.google.com (mail-io1-xd2f.google.com
 [IPv6:2607:f8b0:4864:20::d2f])
 by sourceware.org (Postfix) with ESMTPS id C3FB63857367
 for <libc-alpha@sourceware.org>; Wed, 11 May 2022 03:06:48 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C3FB63857367
Received: by mail-io1-xd2f.google.com with SMTP id i20so829793ion.0
 for <libc-alpha@sourceware.org>; Tue, 10 May 2022 20:06:48 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=xjFgK9x7aoibmKeBK1V0q16/tTBHTGwpNgrBDgvck8o=;
 b=l33PID19juA+230iUJrQhP/3GUtfuAZVg3FR/9PAJ4gu+5OHV8BVvSejhcwiTMKlXX
 nvgG9dNqP6eVHeLniU2lhiIVl5O5OpbkK8Or9HA7E6uWFJaIsrGSPi1MwQQbpwm+AoZo
 XoeA0uI0ndQ28OS2lvCNiqgTQNiW6g5FvzYnaoqPFkgXceVtX3b26XoEXHvqsP2qxW+J
 dKv91jX2+hyOEicz3OEsHTX/bhD85BJovjpnKkAUBTFownJP0mMoGlNmChNq/UL/B4t+
 60RX+1GmC4B6lXh7UaRHY+pGw+UfzRS2y3YLm4oGrfaa8VnvCAKXpVA30+EajrMqlFE8
 dFEA==
X-Gm-Message-State: AOAM533s0+bRQGwLmoZpFipcBKXrQ8jpdC21CtwgBVRE6lQvXTzfWjYR
 7zO7B3/BMhs0OwCQJoGYrYOR69H8f0Y=
X-Google-Smtp-Source: 
 ABdhPJzXBRXjG5Elem8XXXYCZiTMkeL43Kpjw9a28GAUc5SN8/u+fPKedMtLxc/0J+uU9wWm+0e6sw==
X-Received: by 2002:a05:6638:2116:b0:32b:7d73:67f1 with SMTP id
 n22-20020a056638211600b0032b7d7367f1mr11550662jaj.135.1652238408336;
 Tue, 10 May 2022 20:06:48 -0700 (PDT)
Received: from noah-tgl.. (node-17-161.flex.volo.net. [76.191.17.161])
 by smtp.gmail.com with ESMTPSA id
 h4-20020a92c264000000b002cf5aae6645sm324798ild.2.2022.05.10.20.06.47
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 10 May 2022 20:06:47 -0700 (PDT)
To: libc-alpha@sourceware.org
Subject: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h
Date: Tue, 10 May 2022 22:06:35 -0500
Message-Id: <20220511030635.154689-6-goldstein.w.n@gmail.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20220511030635.154689-1-goldstein.w.n@gmail.com>
References: <20220414041231.926415-1-goldstein.w.n@gmail.com>
 <20220511030635.154689-1-goldstein.w.n@gmail.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Noah Goldstein via Libc-alpha
 <libc-alpha@sourceware.org>
From: Noah Goldstein <goldstein.w.n@gmail.com>
Reply-To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: Alexander Monakov <amonakov@ispras.ru>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

Unroll slightly and enforce good instruction scheduling. This improves
performance on out-of-order machines. Note the unrolling allows
for pipelined multiplies which helps a bit, but most of the gain
is from enforcing better instruction scheduling for more ILP.
Unrolling further started to induce slowdowns for sizes [0, 4]
but can help the loop so if larger sizes are the target further
unrolling can be beneficial.

Results for _dl_new_hash
Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

Time as Geometric Mean of N=30 runs
Geometric of all benchmark New / Old: 0.674
  type, length, New Time, Old Time, New Time / Old Time
 fixed,      0,    2.865,     2.72,               1.053
 fixed,      1,    3.567,    2.489,               1.433
 fixed,      2,    2.577,    3.649,               0.706
 fixed,      3,    3.644,    5.983,               0.609
 fixed,      4,    4.211,    6.833,               0.616
 fixed,      5,    4.741,    9.372,               0.506
 fixed,      6,    5.415,    9.561,               0.566
 fixed,      7,    6.649,   10.789,               0.616
 fixed,      8,    8.081,   11.808,               0.684
 fixed,      9,    8.427,   12.935,               0.651
 fixed,     10,    8.673,   14.134,               0.614
 fixed,     11,    10.69,   15.408,               0.694
 fixed,     12,   10.789,   16.982,               0.635
 fixed,     13,   12.169,   18.411,               0.661
 fixed,     14,   12.659,   19.914,               0.636
 fixed,     15,   13.526,   21.541,               0.628
 fixed,     16,   14.211,   23.088,               0.616
 fixed,     32,   29.412,   52.722,               0.558
 fixed,     64,    65.41,  142.351,               0.459
 fixed,    128,  138.505,  295.625,               0.469
 fixed,    256,  291.707,  601.983,               0.485
random,      2,   12.698,   12.849,               0.988
random,      4,   16.065,   15.857,               1.013
random,      8,   19.564,   21.105,               0.927
random,     16,   23.919,   26.823,               0.892
random,     32,   31.987,   39.591,               0.808
random,     64,   49.282,   71.487,               0.689
random,    128,    82.23,  145.364,               0.566
random,    256,  152.209,  298.434,                0.51

Co-authored-by: Alexander Monakov <amonakov@ispras.ru>
---
 elf/dl-new-hash.h | 66 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 61 insertions(+), 5 deletions(-)

diff --git a/elf/dl-new-hash.h b/elf/dl-new-hash.h
index 40d88c81f9..5269f6eb98 100644
--- a/elf/dl-new-hash.h
+++ b/elf/dl-new-hash.h
@@ -20,15 +20,71 @@
 #define _DL_NEW_HASH_H 1
 
 #include <stdint.h>
+/* For __glibc_unlikely.  */
+#include <sys/cdefs.h>
+
+/* The simplest implementation of _dl_new_hash is:
+
+_dl_new_hash (const char *s)
+{
+   uint32_t h = 5381;
+   for (unsigned char c = *s; c != '\0'; c = *++s)
+     h = h * 33 + c;
+   return h;
+}
+
+   We can get better performance, however, by slightly unrolling the
+   loop and explicitly specifying order of operations to prevent
+   reassosiation of instructions and ensure ideal scheduling.  */
 
 static inline uint32_t
 __attribute__ ((unused))
-_dl_new_hash (const char *s)
+_dl_new_hash (const char *str)
 {
-  uint32_t h = 5381;
-  for (unsigned char c = *s; c != '\0'; c = *++s)
-    h = h * 33 + c;
-  return h;
+  const unsigned char *s = (const unsigned char *) str;
+  unsigned int h = 5381;
+  unsigned int c0, c1;
+  for (;;)
+    {
+      c0 = s[0];
+      /* Unlikely length zero string so evens will be slightly less
+	 common.  */
+      if (__glibc_unlikely (c0 == 0))
+	return h;
+
+      c1 = s[1];
+      if (c1 == 0)
+	{
+	  c0 += h;
+	  /* Ideal instruction scheduling is:
+	 c0 += h;
+	 h *= 32;
+	 h += c0;
+
+	 The asm statements are to prevent reassosiation that would result in
+	 more instruction interdependencies and worse scheduling.  */
+	  asm("" : "+r"(h) : "r"(c0));
+	  h = h * 32 + c0;
+	  return h;
+	}
+
+      /* Ideal instruction scheduling is:
+	 c1 += c0;
+	 h *= 33 * 33;
+	 c0 *= 32;
+	 c1 += c0;
+	 h  += c1;
+
+	 The asm statements are to prevent reassosiation that would result in
+	 more instruction interdependencies and worse scheduling.  */
+      c1 += c0;
+      asm("" : "+r"(c1), "+r"(c0));
+      h *= 33 * 33;
+      c1 += c0 * 32;
+      asm("" : "+r"(c1));
+      h += c1;
+      s += 2;
+    }
 }