From patchwork Tue May 10 15:04:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 53748 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 12CF43954C4D for ; Tue, 10 May 2022 15:09:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 12CF43954C4D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652195350; bh=39ZhwaRnO7izlBsJGP4yq+F8SMNFED7tN4qXak0uZfo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=fvhNH8pdy89O0ndwdJjdInRMuBhsucNvG4Eswxh6EWGwaAhAaF049o2xq11HFaKra MM+YMn8OJ1nhylvK4OoSyViJgym58uDHZcNh64+KRy8XYrXQo61Mq3a100S/lo7oHw 0UniNtGQz6kI//lazreG5eGQz1v+7q8Pl36rq7t8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-io1-xd2a.google.com (mail-io1-xd2a.google.com [IPv6:2607:f8b0:4864:20::d2a]) by sourceware.org (Postfix) with ESMTPS id 814003955630 for ; Tue, 10 May 2022 15:04:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 814003955630 Received: by mail-io1-xd2a.google.com with SMTP id f4so18807408iov.2 for ; Tue, 10 May 2022 08:04:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=39ZhwaRnO7izlBsJGP4yq+F8SMNFED7tN4qXak0uZfo=; b=MDDjV8P4mOsgITw52TG8aUhFzkppfp+cbm1RPAPcfePPe2lD9zwqjduVlOWBcSA+1D 1XXO6NvDppI9v2XvC5mg3UMPMX2cmhvgHZIxu2sfhHqtB1WLNonrH7I7e/79JOflh0e8 JpDdzP5kWxcI/5+pKmj3NbAGNlDzaJBbpCraK4AQ9LNqDNa01TWW3b5/RUUVkpdaiKRU 4IxJj0uRmjpD+9WsUYfNYjFd1ms+m75Z66r3203EsmTPi3QdIcaBCVc9hWw75+wO0hHv jzdkbrIdj6PQwDB7MA8y6UTHWSnDV5CuK813RvVln5g9HGli9u2wuI4A5Cee1pyPoJG+ qEgw== X-Gm-Message-State: AOAM531nYsTj00H7zytO5KCQk+aL6pe6MdSasfDv3oevqlxlW7vQi2pq lmRdfzf+Sh2W4tbwo36+X+j5GWkGBX8= X-Google-Smtp-Source: ABdhPJwZrD4uU+irbiJ/878OMGDFmA7wVbRGNl4tKnw0HPhRKzgKUpYnENtuVEDfSCgI9CSaBaN+/Q== X-Received: by 2002:a05:6638:19c4:b0:32b:6436:f1c9 with SMTP id bi4-20020a05663819c400b0032b6436f1c9mr10301690jab.303.1652195090374; Tue, 10 May 2022 08:04:50 -0700 (PDT) Received: from noah-tgl.. (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.gmail.com with ESMTPSA id i11-20020a5e850b000000b0065a47e16f62sm4288988ioj.52.2022.05.10.08.04.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 May 2022 08:04:50 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Date: Tue, 10 May 2022 10:04:41 -0500 Message-Id: <20220510150441.20948-6-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220510150441.20948-1-goldstein.w.n@gmail.com> References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220510150441.20948-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Cc: Alexander Monakov Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Unroll slightly and enforce good instruction scheduling. This improves performance on out-of-order machines. Note the unrolling allows for pipelined multiplies which helps a bit, but most of the gain is from enforcing better instruction scheduling for more ILP. Unrolling further started to induce slowdowns for sizes [0, 4] but can help the loop so if larger sizes are the target further unrolling can be beneficial. Results for _dl_new_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=25 runs Geometric of all benchmark New / Old: 0.791 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 0.641, 0.658, 0.974 fixed, 1, 1.888, 1.883, 1.003 fixed, 2, 2.712, 2.833, 0.957 fixed, 3, 3.314, 3.739, 0.886 fixed, 4, 4.316, 4.866, 0.887 fixed, 5, 5.16, 5.966, 0.865 fixed, 6, 5.986, 7.241, 0.827 fixed, 7, 7.264, 8.435, 0.861 fixed, 8, 8.052, 9.846, 0.818 fixed, 9, 9.369, 11.316, 0.828 fixed, 10, 10.256, 12.925, 0.794 fixed, 11, 12.191, 14.546, 0.838 fixed, 12, 12.667, 15.92, 0.796 fixed, 13, 14.442, 17.465, 0.827 fixed, 14, 14.808, 18.981, 0.78 fixed, 15, 16.244, 20.565, 0.79 fixed, 16, 17.166, 22.044, 0.779 fixed, 32, 35.447, 50.558, 0.701 fixed, 64, 86.479, 134.529, 0.643 fixed, 128, 155.453, 287.527, 0.541 fixed, 256, 302.57, 593.64, 0.51 random, 2, 11.168, 10.61, 1.053 random, 4, 13.308, 13.53, 0.984 random, 8, 16.579, 19.437, 0.853 random, 16, 21.292, 24.776, 0.859 random, 32, 30.56, 35.906, 0.851 random, 64, 49.249, 68.577, 0.718 random, 128, 81.845, 140.664, 0.582 random, 256, 152.517, 292.204, 0.522 Co-authored-by: Alexander Monakov --- elf/dl-new-hash.h | 50 ++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/elf/dl-new-hash.h b/elf/dl-new-hash.h index 40d88c81f9..cacbeec289 100644 --- a/elf/dl-new-hash.h +++ b/elf/dl-new-hash.h @@ -20,15 +20,55 @@ #define _DL_NEW_HASH_H 1 #include +/* For __glibc_unlikely. */ +#include static inline uint32_t __attribute__ ((unused)) -_dl_new_hash (const char *s) +_dl_new_hash (const char *signed_s) { - uint32_t h = 5381; - for (unsigned char c = *s; c != '\0'; c = *++s) - h = h * 33 + c; - return h; + const unsigned char *s = (const unsigned char *) signed_s; + unsigned int h = 5381; + unsigned int c0, c1; + for (;;) + { + c0 = (unsigned int) *s; + /* Unlikely length zero string so evens will be slightly less + common. */ + if (__glibc_unlikely (c0 == 0)) + { + return h; + } + + c1 = (unsigned int) *(s + 1); + if (c1 == 0) + { + c0 += h; + /* Ideal instruction scheduling is: + c0 += h; + h *= 32; + h += c0; + The asm statement ensures the compiler can't mess that up. */ + asm("" : "+r"(h) : "r"(c0)); + h = h * 32 + c0; + return h; + } + + /* Ideal instruction scheduling is: + c1 += c0; + h *= 33 * 33; + c0 *= 32; + c1 += c0; + h += c1; + The asm statements ensures the compiler can't mess that up. */ + c1 += c0; + asm("" : "+r"(c1), "+r"(c0)); + h *= 33 * 33; + c1 += c0 * 32; + asm("" : "+r"(c1)); + h += c1; + s += 2; + } }