From patchwork Mon May 16 20:30:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 54054 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 27CB43850431 for ; Mon, 16 May 2022 20:34:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 27CB43850431 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652733257; bh=zT12/ETrclJbsZpH7e5xTHuWtgAQyj7RnM6K1rT4Xvs=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=HDk2VMz8eADRJ6yKjfaH7vgBh40CYLsmfk+nYk6PsK5h8qiQ7u+qFPDeLpfVsdXwk 6LGZhw3ziIdTfk6xrfFJoM4N+2r6zl2oLqiWxA/KmPNmUC/NwvlVG7Dg9bjAGrOFPC EMqpv6+Og14uFvy8Jprt/P7hSmuIkxh2+pbWquTA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-io1-xd31.google.com (mail-io1-xd31.google.com [IPv6:2607:f8b0:4864:20::d31]) by sourceware.org (Postfix) with ESMTPS id C60313858002 for ; Mon, 16 May 2022 20:30:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C60313858002 Received: by mail-io1-xd31.google.com with SMTP id z26so17232564iot.8 for ; Mon, 16 May 2022 13:30:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zT12/ETrclJbsZpH7e5xTHuWtgAQyj7RnM6K1rT4Xvs=; b=CV4nqmcQPuWAjp9HYeDhDKF5I/c9t7P7/DyE3j7q1xxAhSPPkQ/BpVqNEwOgQ8mZdC jF3aSKa3t3iYk+/oQxINYZMB3eNq2lzgcYl/GOgyzJZmCIGwdNRSAqfObxHivbzF7KYh d1zgPPzlHyKT58X86jb9/leIgJdaVqw/aeP/RGeCdWLdaC0gWzKYI7FhHcR6zAemEBjC VuetVBKqH3OZPWUm3yy3hSuK6EhecJ60Eu3Iw8DLrstY/eX+MeCTGu9dLo6ycMebaikh eYnmZT1tl5AvzF3xfw4O8hPsyINe6idu/uHLWKBbAGbF9OtABN2ZTDReVRleYGgKF/9r S5Fg== X-Gm-Message-State: AOAM533wlYJXTEO56Og41WhKbeAwBpzp+IEaAGHz0lzlXe172d9Nj117 DAydsO7+ol3eQVU5UY91LXfQfqU9XsQ= X-Google-Smtp-Source: ABdhPJwnskWR3Q4OGkKPjaTXDUS78XYVH9n4fOylDkioYSiJdvcCQ7yh9rtDJd3MdN3S5zVPoVsj/Q== X-Received: by 2002:a05:6638:52e:b0:32a:e022:5a9e with SMTP id j14-20020a056638052e00b0032ae0225a9emr10605993jar.60.1652733037839; Mon, 16 May 2022 13:30:37 -0700 (PDT) Received: from noah-tgl.. (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.gmail.com with ESMTPSA id c11-20020a92cf0b000000b002cde6e352f7sm94299ilo.65.2022.05.16.13.30.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 13:30:37 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v9 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Date: Mon, 16 May 2022 15:30:04 -0500 Message-Id: <20220516203004.38687-6-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220516203004.38687-1-goldstein.w.n@gmail.com> References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220516203004.38687-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Cc: Alexander Monakov Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Unroll slightly and enforce good instruction scheduling. This improves performance on out-of-order machines. The unrolling allows for pipelined multiplies. As well, as an optional sysdep, reorder the operations and prevent reassosiation for better scheduling and higher ILP. This commit only adds the barrier for x86, although it should be either no change or a win for any architecture. Unrolling further started to induce slowdowns for sizes [0, 4] but can help the loop so if larger sizes are the target further unrolling can be beneficial. Results for _dl_new_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=30 runs Geometric of all benchmark New / Old: 0.674 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 2.865, 2.72, 1.053 fixed, 1, 3.567, 2.489, 1.433 fixed, 2, 2.577, 3.649, 0.706 fixed, 3, 3.644, 5.983, 0.609 fixed, 4, 4.211, 6.833, 0.616 fixed, 5, 4.741, 9.372, 0.506 fixed, 6, 5.415, 9.561, 0.566 fixed, 7, 6.649, 10.789, 0.616 fixed, 8, 8.081, 11.808, 0.684 fixed, 9, 8.427, 12.935, 0.651 fixed, 10, 8.673, 14.134, 0.614 fixed, 11, 10.69, 15.408, 0.694 fixed, 12, 10.789, 16.982, 0.635 fixed, 13, 12.169, 18.411, 0.661 fixed, 14, 12.659, 19.914, 0.636 fixed, 15, 13.526, 21.541, 0.628 fixed, 16, 14.211, 23.088, 0.616 fixed, 32, 29.412, 52.722, 0.558 fixed, 64, 65.41, 142.351, 0.459 fixed, 128, 138.505, 295.625, 0.469 fixed, 256, 291.707, 601.983, 0.485 random, 2, 12.698, 12.849, 0.988 random, 4, 16.065, 15.857, 1.013 random, 8, 19.564, 21.105, 0.927 random, 16, 23.919, 26.823, 0.892 random, 32, 31.987, 39.591, 0.808 random, 64, 49.282, 71.487, 0.689 random, 128, 82.23, 145.364, 0.566 random, 256, 152.209, 298.434, 0.51 Co-authored-by: Alexander Monakov --- sysdeps/generic/dl-new-hash.h | 114 +++++++++++++++++++++++++++++ {elf => sysdeps/x86}/dl-new-hash.h | 16 +--- 2 files changed, 117 insertions(+), 13 deletions(-) create mode 100644 sysdeps/generic/dl-new-hash.h rename {elf => sysdeps/x86}/dl-new-hash.h (77%) diff --git a/sysdeps/generic/dl-new-hash.h b/sysdeps/generic/dl-new-hash.h new file mode 100644 index 0000000000..84aa7991a4 --- /dev/null +++ b/sysdeps/generic/dl-new-hash.h @@ -0,0 +1,114 @@ +/* _dl_new_hash for elf symbol lookup + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _DL_NEW_HASH_H +#define _DL_NEW_HASH_H 1 + +#include +/* For __always_inline. */ +#include +/* For __glibc_unlikely. */ +#include + +/* The simplest implementation of _dl_new_hash is: + + _dl_new_hash (const char *s) + { + uint32_t h = 5381; + for (unsigned char c = *s; c != '\0'; c = *++s) + h = h * 33 + c; + return h; + } + + We can get better performance by slightly unrolling the + loop to pipeline the multiples. + + As well, as an architecture specific option we add asm statements + to explicitly specifying order of operations to prevent + reassosiation of instructions that lengthens the loop carried + dependency. This may have no affect as the compiler may have + ordered instructions the same way without it but in testing this + has not been the case for GCC. Improving GCC to reliably schedule + instructions ideally cannot be easily done. + + Architecture(s) that use the reassosiation barries are: + x86 + + Note it is very unlikely the reassosiation barriers would + de-optimize performance on any archictecture and with an imperfect + compiler it may help performance, especially on out-of-order cpus, + so it is suggested that the respective maintainers add them. */ + + +#ifndef __asm_reassociation_barrier +# define __asm_reassociation_barrier(...) +#endif + +static __always_inline uint32_t +__attribute__ ((unused)) +_dl_new_hash (const char *str) +{ + const unsigned char *s = (const unsigned char *) str; + unsigned int h = 5381; + unsigned int c0, c1; + for (;;) + { + c0 = s[0]; + /* Since hashed string is normally not empty, this is unlikely on the + first iteration of the loop. */ + if (__glibc_unlikely (c0 == 0)) + return h; + + c1 = s[1]; + if (c1 == 0) + { + /* Ideal instruction scheduling is: + c0 += h; + h *= 32; + h += c0; + + The __asm_reassociation_barrier() macro is a sysdep optional asm + statements to prevents reassosiation that would result in more + instruction interdependencies and worse scheduling. */ + c0 += h; + __asm_reassociation_barrier("" : "+r"(h) : "r"(c0)); + h = h * 32 + c0; + return h; + } + + /* Ideal instruction scheduling is: + c1 += c0; + h *= 33 * 33; + c0 *= 32; + c1 += c0; + h += c1; + + The __asm_reassociation_barrier() macro is a sysdep optional asm + statements to prevents reassosiation that would result in more + instruction interdependencies and worse scheduling. */ + c1 += c0; + __asm_reassociation_barrier("" : "+r"(c1), "+r"(c0)); + h *= 33 * 33; + c1 += c0 * 32; + __asm_reassociation_barrier("" : "+r"(c1)); + h += c1; + s += 2; + } +} + +#endif /* dl-new-hash.h */ diff --git a/elf/dl-new-hash.h b/sysdeps/x86/dl-new-hash.h similarity index 77% rename from elf/dl-new-hash.h rename to sysdeps/x86/dl-new-hash.h index b7a91ecc07..dd800265bf 100644 --- a/elf/dl-new-hash.h +++ b/sysdeps/x86/dl-new-hash.h @@ -19,19 +19,9 @@ #ifndef _DL_NEW_HASH_H #define _DL_NEW_HASH_H 1 -#include -/* For __always_inline. */ -#include - -static __always_inline uint32_t -__attribute__ ((unused)) -_dl_new_hash (const char *s) -{ - uint32_t h = 5381; - for (unsigned char c = *s; c != '\0'; c = *++s) - h = h * 33 + c; - return h; -} +#define __asm_reassociation_barrier __asm__ +#undef _DL_NEW_HASH_H +#include #endif /* dl-new-hash.h */