From patchwork Fri Aug 12 21:45:00 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Roger Sayle <roger@nextmovesoftware.com>
X-Patchwork-Id: 56725
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 583D93858036
	for <patchwork@sourceware.org>; Fri, 12 Aug 2022 21:45:25 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id BEFA53858CDA
 for <gcc-patches@gcc.gnu.org>; Fri, 12 Aug 2022 21:45:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BEFA53858CDA
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To:
 Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
 Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
 List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=PHEpt46zGG7Qd3zAPGqwcMacXMy2EnXNWV70wWfbyp8=; b=A8sp1SIvBVH6iIEHCUdQ21+etM
 WV0oiTdnL3KdZgEQA6jDQpbHq8TGvIRa1tJ9WSPz4SWEk0kH7M4TRSu6miilrYpEKkeeS1iVHmpcw
 abyUQnElTjKWrFdLjEuRlrVpoX+9HgTZnM4krEqiuYnk3+8P737mUr8g1sZ/F+MWpQ4FPlSNvtHen
 I4a4TYGsGeLbrXbsgvwXvM1wU+597VXxLLhSRQU2keVhTJnZPRUK3iZx+fmKc5b/UPbE4o0pb688l
 9jFd6AQVg3+sTlJhRjPBtde7npMK2MCn1RmUog3Fox6MzEf1tuarWbsZGu61C72zjNzAS5v+tdYyF
 G+82wv6w==;
Received: from host86-169-41-119.range86-169.btcentralplus.com
 ([86.169.41.119]:62540 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95)
 (envelope-from <roger@nextmovesoftware.com>) id 1oMcSl-00033g-0e;
 Fri, 12 Aug 2022 17:45:03 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Richard Biener'" <richard.guenther@gmail.com>
References: <00a501d8aafd$e10b6da0$a32248e0$@nextmovesoftware.com>
 <CAFiYyc2=7LioHkhrKheXxG4-iLzm0t8rfAyzybcBJ_hDa67thw@mail.gmail.com>
In-Reply-To: 
 <CAFiYyc2=7LioHkhrKheXxG4-iLzm0t8rfAyzybcBJ_hDa67thw@mail.gmail.com>
Subject: [PATCH take #2] PR tree-optimization/71343: Optimize (X<<C)&(Y<<C) as
 (X&Y)<<C.
Date: Fri, 12 Aug 2022 22:45:00 +0100
Message-ID: <009a01d8ae94$c7d92800$578b7800$@nextmovesoftware.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: Adiukg64s/hrCyE9S7mtkgGKlKawTA==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com:
 roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi Richard,
Many thanks for the review and useful suggestions.  I (think I) agree that
handling non-canonical forms in value_numbering makes more sense,
so this revised patch is just the first (non-controversial) part of the original
submission, that incorporates your observation that it doesn't need to
be limited to (valid) constant shifts, and can be generalized to any
shift, without introducing undefined behaviour that didn't exist before.

This revised patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32} with no new failures.  Ok for mainline?


2022-08-12  Roger Sayle  <roger@nextmovesoftware.com>
            Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
        PR tree-optimization/71343
        * match.pd (op (lshift @0 @1) (lshift @2 @1)): Optimize the
        expression (X<<C) + (Y<<C) to (X+Y)<<C for multiple operators.
        (op (rshift @0 @1) (rshift @2 @1)): Likewise, simplify (X>>C)^(Y>>C)
        to (X^Y)>>C for binary logical operators, AND, IOR and XOR.

gcc/testsuite/ChangeLog
        PR tree-optimization/71343
        * gcc.dg/pr71343-1.c: New test case.


Thanks,
Roger
---

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 08 August 2022 12:42
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] PR tree-optimization/71343: Optimize (X<<C)&(Y<<C) as
> (X&Y)<<C.
> 
> On Mon, Aug 8, 2022 at 10:07 AM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> >
> >
> > This patch resolves PR tree-optimization/71343, a missed-optimization
> > enhancement request where GCC fails to see that (a<<2)+(b<<2) == a*4+b*4.
> > This requires two related (sets of) optimizations to be added to match.pd.
> >
> > The first is that (X<<C) op (Y<<C) can be simplified to (X op Y) << C,
> > for many binary operators, including AND, IOR, XOR, and (if overflow
> > isn't an issue) PLUS and MINUS.  Likewise, the right shifts (both
> > logical and arithmetic) and bit-wise logical operators can be
> > simplified in a similar fashion.  These all reduce the number of
> > GIMPLE binary operations from 3 to 2, by combining/eliminating a shift
> operation.
> >
> > The second optimization reflects that the middle-end doesn't impose a
> > canonical form on multiplications by powers of two, vs. left shifts,
> > instead leaving these operations as specified by the programmer unless
> > there's a good reason to change them.  Hence, GIMPLE code may contain
> > the expressions "X * 8" and "X << 3" even though these represent the
> > same value/computation.  The tweak to match.pd is that comparison
> > operations whose operands are equivalent non-canonical expressions can
> > be taught their equivalence.  Hence "(X * 8) == (X << 3)" will always
> > evaluate to true, and "(X<<2) > 4*X" will always evaluate to false.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> 
> +/* Shifts by constants distribute over several binary operations,
> +   hence (X << C) + (Y << C) can be simplified to (X + Y) << C.  */
> +(for op (plus minus)
> +  (simplify
> +    (op (lshift:s @0 INTEGER_CST@1) (lshift:s @2 INTEGER_CST@1))
> +    (if (INTEGRAL_TYPE_P (type)
> +        && TYPE_OVERFLOW_WRAPS (type)
> +        && !TYPE_SATURATING (type)
> +        && tree_fits_shwi_p (@1)
> +        && tree_to_shwi (@1) > 0
> +        && tree_to_shwi (@1) < TYPE_PRECISION (type))
> 
> I do wonder why we need to restrict this to shifts by constants?
> Any out-of-bound shift was already there, no?
> 
> +/* Some tree expressions are intentionally non-canonical.
> +   We handle the comparison of the equivalent forms here.  */ (for cmp
> +(eq le ge)
> +  (simplify
> +    (cmp:c (lshift @0 INTEGER_CST@1) (mult @0 integer_pow2p@2))
> +    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +        && tree_fits_shwi_p (@1)
> +        && tree_to_shwi (@1) > 0
> +        && tree_to_shwi (@1) < TYPE_PRECISION  (TREE_TYPE (@0))
> +        && wi::to_wide (@1) == wi::exact_log2 (wi::to_wide (@2)))
> +      { constant_boolean_node (true, type); })))
> +
> +(for cmp (ne lt gt)
> +  (simplify
> +    (cmp:c (lshift @0 INTEGER_CST@1) (mult @0 integer_pow2p@2))
> +    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +        && tree_fits_shwi_p (@1)
> +        && tree_to_shwi (@1) > 0
> +        && tree_to_shwi (@1) < TYPE_PRECISION  (TREE_TYPE (@0))
> +        && wi::to_wide (@1) == wi::exact_log2 (wi::to_wide (@2)))
> +      { constant_boolean_node (false, type); })))
> 
> hmm.  I wonder if it makes more sense to handle this in value-numbering.
> tree-ssa-sccvn.cc:visit_nary_op handles some cases that are not exactly
> canonicalization issues but the shift vs mult could be handled there by just
> performing the alternate lookup.  That would also enable CSE and by means of
> that of course the comparisons you do above.
> 
> Thanks,
> Richard.
> 
> >
> > 2022-08-08  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR tree-optimization/71343
> >         * match.pd (op (lshift @0 @1) (lshift @2 @1)): Optimize the
> >         expression (X<<C) + (Y<<C) to (X+Y)<<C for multiple operators.
> >         (op (rshift @0 @1) (rshift @2 @1)): Likwise, simplify (X>>C)^(Y>>C)
> >         to (X^Y)>>C for binary logical operators, AND, IOR and XOR.
> >         (cmp:c (lshift @0) (mult @1)): Optimize comparisons between
> >         shifts by integer constants and multiplications by powers of 2.
> >
> > gcc/testsuite/ChangeLog
> >         PR tree-optimization/71343
> >         * gcc.dg/pr71343-1.c: New test case.
> >         * gcc.dg/pr71343-2.c: Likewise.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >

diff --git a/gcc/match.pd b/gcc/match.pd
index f82f94a..7940f75 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -982,6 +982,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        && tree_nop_conversion_p (type, TREE_TYPE (@1)))
    (lshift @0 @2)))
 
+/* Shifts by constants distribute over several binary operations,
+   hence (X << C) + (Y << C) can be simplified to (X + Y) << C.  */
+(for op (plus minus)
+  (simplify
+    (op (lshift:s @0 @1) (lshift:s @2 @1))
+    (if (INTEGRAL_TYPE_P (type)
+	 && TYPE_OVERFLOW_WRAPS (type)
+	 && !TYPE_SATURATING (type))
+      (lshift (op @0 @2) @1))))
+
+(for op (bit_and bit_ior bit_xor)
+  (simplify
+    (op (lshift:s @0 @1) (lshift:s @2 @1))
+    (if (INTEGRAL_TYPE_P (type))
+      (lshift (op @0 @2) @1)))
+  (simplify
+    (op (rshift:s @0 @1) (rshift:s @2 @1))
+    (if (INTEGRAL_TYPE_P (type))
+      (rshift (op @0 @2) @1))))
+
 /* Fold (1 << (C - x)) where C = precision(type) - 1
    into ((1 << C) >> x). */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr71343-1.c b/gcc/testsuite/gcc.dg/pr71343-1.c
new file mode 100644
index 0000000..146f5fc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr71343-1.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int foo_plus(unsigned int a, unsigned int b)
+{
+  return (a << 2) + (b << 2);
+}
+
+unsigned int foo_and(unsigned int a, unsigned int b)
+{
+  return (a << 2) & (b << 2);
+}
+
+unsigned int foo_ior(unsigned int a, unsigned int b)
+{
+  return (a << 2) | (b << 2);
+}
+
+unsigned int foo_xor(unsigned int a, unsigned int b)
+{
+  return (a << 2) ^ (b << 2);
+}
+
+unsigned int bar_and(unsigned int a, unsigned int b)
+{
+  return (a >> 2) & (b >> 2);
+}
+
+unsigned int bar_ior(unsigned int a, unsigned int b)
+{
+  return (a >> 2) | (b >> 2);
+}
+
+unsigned int bar_xor(unsigned int a, unsigned int b)
+{
+  return (a >> 2) ^ (b >> 2);
+}
+
+int baz_and(int a, int b)
+{
+  return (a >> 2) & (b >> 2);
+}
+
+int baz_ior(int a, int b)
+{
+  return (a >> 2) | (b >> 2);
+}
+
+int baz_xor(int a, int b)
+{
+  return (a >> 2) ^ (b >> 2);
+}
+
+/* { dg-final { scan-tree-dump-times " << " 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " >> " 6 "optimized" } } */
+