From patchwork Wed Apr 20 18:35:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 53077 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0B88D3857376 for ; Wed, 20 Apr 2022 18:35:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id B74003858D1E for ; Wed, 20 Apr 2022 18:35:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B74003858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=FCCCYApLwr8feqNqvX9aWPeeRKYXOmU6zBsFm1X20zU=; b=LAGciridx7q272tDi0y8v28zi9 3lDUhtgvPXLDv1gnyEn5Mf9RqYSOFSCAvNZNBliLnnjkW2yqH5B7li+bMu2WqkfnmhtbPnRmVdOC+ YjAkcVh8AkUGi7TyOFWqdTu0BRum8/o0kzXxtMSoRSSVAWMRaAn/Y/uZWO1erTIEl+WL8OD+D7XZO nD/dYkIU8LvD1cbOLBhgoQoCVovgvoHkkuMVAZmwsBsUYiL7td6zKU64sJR881X60O+apR0zGDpgX q1pfydAG9VF9AGE4VcPvFKiKB/Xqo7aBfo5Kd0Q6dpcRqgWB2E/Nvc20fhN1lOeqmy/9IJGJ7KkbW 4zv8TgDg==; Received: from host109-151-117-89.range109-151.btcentralplus.com ([109.151.117.89]:50326 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhFAX-0003M3-S6 for gcc-patches@gcc.gnu.org; Wed, 20 Apr 2022 14:35:14 -0400 From: "Roger Sayle" To: Subject: [PATCH] PR middle-end/98865: Optimize (a>>63)*b as -(a>>63)&b in match.pd. Date: Wed, 20 Apr 2022 19:35:11 +0100 Message-ID: <012f01d854e5$5fb82fe0$1f288fa0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdhU5I/+B4sFndrtTJ+I9Npa9qg9XQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch implements the constant folding optimization(s) described in PR middle-end/98865, which should help address the serious performance regression of Botan AES-128/XTS mentioned in PR tree-optimization/98856. This combines aspects of both Jakub Jelinek's patch in comment #2 and Andrew Pinski's patch in comment #4, so both are listed as co-authors. Alas truth_valued_p is not quite what we want (and tweaking its definition has undesirable side-effects), so instead this patch introduces a new zero_one_valued predicate based on tree_nonzero_bits that extends truth_valued_p (which is for Boolean types with single bit precision). This is then used to simple if X*Y into X&Y when both X and Y are zero_one_valued_p, and simplify X*Y into (-X)&Y when X is zero_one_valued_p, in both cases replacing an integer multiplication with a cheaper bit-wise AND. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and with --target_board=unix{-m32}, with no new failures, except for a tweak required to tree-ssa/vrp116.c. The recently proposed cmove patch ensures the i386 backend continues to generate identical code for vrp116.c as before. Ok, either for mainline or when stage 1 reopens? 2022-04-20 Roger Sayle Andrew Pinski Jakub Jelinek gcc/ChangeLog PR middle-end/98865 * match.pd (match zero_one_valued_p): New predicate. (mult @0 @1): Use zero_one_valued_p for transforming into (and @0 @1). (mult zero_one_valued_p@0 @1): Convert integer multiplication into a negation and a bit-wise AND, if it can't be cheaply implemented by a single left shift. gcc/testsuite/ChangeLog PR middle-end/98865 * gcc.dg/pr98865.c: New test case. * gcc.dg/vrp116.c: Tweak test to confirm the integer multiplication has been eliminated, not for the actual replacement implementation. Thanks, Roger diff --git a/gcc/match.pd b/gcc/match.pd index 6d691d3..16a1203 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -285,14 +285,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || !COMPLEX_FLOAT_TYPE_P (type))) (negate @0))) -/* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 } */ -(simplify - (mult SSA_NAME@1 SSA_NAME@2) - (if (INTEGRAL_TYPE_P (type) - && get_nonzero_bits (@1) == 1 - && get_nonzero_bits (@2) == 1) - (bit_and @1 @2))) - /* Transform x * { 0 or 1, 0 or 1, ... } into x & { 0 or -1, 0 or -1, ...}, unless the target has native support for the former but not the latter. */ (simplify @@ -1789,6 +1781,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_not (bit_not @0)) @0) +(match zero_one_valued_p + @0 + (if (INTEGRAL_TYPE_P (type) && tree_nonzero_bits (@0) == 1))) +(match zero_one_valued_p + truth_valued_p@0) + +/* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 } */ +(simplify + (mult zero_one_valued_p@0 zero_one_valued_p@1) + (if (INTEGRAL_TYPE_P (type)) + (bit_and @0 @1))) + +/* Transform x * { 0 or 1 } into x & { 0 or -1 }, i.e. an integer + multiplication into negate/bitwise and. Don't do this if the + multiplication is cheap, may be implemented by a single shift. */ +(simplify + (mult:c zero_one_valued_p@0 @1) + (if (INTEGRAL_TYPE_P (type) + && (TREE_CODE (@1) != INTEGER_CST + || wi::popcount (wi::to_wide (@1)) > 1)) + (bit_and (negate @0) @1))) + /* Convert ~ (-A) to A - 1. */ (simplify (bit_not (convert? (negate @0))) diff --git a/gcc/testsuite/gcc.dg/pr98865.c b/gcc/testsuite/gcc.dg/pr98865.c new file mode 100644 index 0000000..e7599d3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr98865.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#if __SIZEOF_INT__ == 4 +unsigned int foo(unsigned int a, unsigned int b) +{ + return (a >> 31) * b; +} + +int bar(int a, int b) +{ + return -(a >> 31) * b; +} + +int baz(int a, int b) +{ + int c = a >> 31; + int d = -c; + return d * b; +} +#endif + +#if __SIZEOF_LONG_LONG__ == 8 +unsigned long long fool (unsigned long long a, unsigned long long b) +{ + return (a >> 63) * b; +} + +long long barl (long long a, long long b) +{ + return -(a >> 63) * b; +} + +long long bazl (long long a, long long b) +{ + long long c = a >> 63; + long long d = -c; + return d * b; +} +#endif + +unsigned int pin (int a, unsigned int b) +{ + unsigned int t = a & 1; + return t * b; +} + +unsigned long pinl (long a, unsigned long b) +{ + unsigned long t = a & 1; + return t * b; +} + +unsigned long long pinll (long long a, unsigned long long b) +{ + unsigned long long t = a & 1; + return t * b; +} + +/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c index 9e68a77..469b232 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c @@ -9,4 +9,5 @@ f (int m1, int m2, int c) return e ? m1 : m2; } -/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "optimized" } } */ +/* The MULT_EXPR should become BIT_AND_EXPR or COND_EXPR. */ +/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */