From patchwork Tue Jun  7 07:54:31 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: liuhongt <hongtao.liu@intel.com>
X-Patchwork-Id: 54881
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 58853382CCAC
	for <patchwork@sourceware.org>; Tue,  7 Jun 2022 07:55:06 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 58853382CCAC
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1654588506;
	bh=RvO/Y6f6ZsEFRi+c2wwFdTmEAyP+Z/802KjwUkbKhFo=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=Do4XArMMEdqgyhGW6NWMlbJ2cbr0BWyN+Y8epSL+n2oJHAVNA2XTDyjpcgkzM62Nl
	 HdrOTDQVA2wE+7Wjrlxp2aGvnyTvN/jQkK7CrTjP/xDNqb5L+ltytmtmGPWUyxa2Sd
	 01n+X++WOZDE7LJKdFW5gB9PMokfmQ0cZd8J7x8I=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by sourceware.org (Postfix) with ESMTPS id C91103857409
 for <gcc-patches@gcc.gnu.org>; Tue,  7 Jun 2022 07:54:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C91103857409
X-IronPort-AV: E=McAfee;i="6400,9594,10370"; a="340353591"
X-IronPort-AV: E=Sophos;i="5.91,283,1647327600"; d="scan'208";a="340353591"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Jun 2022 00:54:33 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.91,283,1647327600"; d="scan'208";a="669865890"
Received: from scymds01.sc.intel.com ([10.148.94.138])
 by FMSMGA003.fm.intel.com with ESMTP; 07 Jun 2022 00:54:33 -0700
Received: from shliclel045.sh.intel.com (shliclel045.sh.intel.com
 [10.239.236.45]) by scymds01.sc.intel.com
 with ESMTP id 2577sVaH007206; Tue, 7 Jun 2022 00:54:32 -0700
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Simplify (B * v + C) * D -> BD* v + CD when B, C,
 D are all INTEGER_CST.
Date: Tue,  7 Jun 2022 15:54:31 +0800
Message-Id: <20220607075431.24764-1-hongtao.liu@intel.com>
X-Mailer: git-send-email 2.18.1
In-Reply-To: 
 <CAFiYyc1bphpwz4dQ91pE44MEohC=n+Ju0ffG9Cvu2pmcrxj+ow@mail.gmail.com>
References: 
 <CAFiYyc1bphpwz4dQ91pE44MEohC=n+Ju0ffG9Cvu2pmcrxj+ow@mail.gmail.com>
X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 KAM_SHORT,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org>
From: liuhongt <hongtao.liu@intel.com>
Reply-To: liuhongt <hongtao.liu@intel.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

>> + (mult:c (plus:c@4 (mult:c@5 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)

>since canonicalization puts INTEGER_CSTs last the :c should not be necessary.

Changed.

>> + (if (single_use (@4)
>> +      && single_use (@5))

>since the resulting expression is not simple using :s instead of
>single_use (..) should
>work as well.

Changed.

> when we go from (a + CST1) * CST2 to a * CST2 + CST1*CST2 we have
> to worry about CST1 == -a which would make (a+CST1) * INT_MAX
> not overflow but a * INT_MAX + CST1 * INT_MAX might.  Is the
> overflow check for CST1 * INT_MAX sufficient to rule out
> that a * CST2 does not overflow when (a + CST1) * CST2 does not
> overflow?  Consider a == 2, CST1 == -1, CST2 == INT_MAX,
> here 1 * INT_MAX does not overflow, nor does -1 * INT_MAX, but
> 2 * INT_MAX overflows and thus the resulting expression invokes
> undefined behavior.
>
> The same issue probably arises for the first pattern outer half
> which looks like (a' + CST2) * CST3 with a' = a * CST1?
>
> The appropriate solution might be to perform the arithmetic
> in an unsigned type with the implication that has on value-range
> analysis.

Yes, the patch patched based on value-range analysis.

Update the patch.

Similar for (v + B) * C + D -> C * v + BCD.
Don't simplify it when there's overflow and overflow is UB for type v.

gcc/ChangeLog:

	PR tree-optimization/53533
	* match.pd: Simplify (B * v + C) * D -> BD * v + CD and
	(v + B) * C + D -> C * v + BCD when B,C,D are all INTEGER_CST,
	and there's no overflow or !TYPE_OVERFLOW_UNDEFINED.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr53533-1.c: New test.
	* gcc.target/i386/pr53533-2.c: New test.
	* gcc.target/i386/pr53533-3.c: New test.
	* gcc.target/i386/pr53533-4.c: New test.
	* gcc.target/i386/pr53533-5.c: New test.
	* gcc.dg/vect/slp-11a.c: Adjust testcase.
---
 gcc/match.pd                              | 82 +++++++++++++++++++++++
 gcc/testsuite/gcc.dg/vect/slp-11a.c       | 10 +--
 gcc/testsuite/gcc.target/i386/pr53533-1.c | 23 +++++++
 gcc/testsuite/gcc.target/i386/pr53533-2.c | 46 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr53533-3.c | 24 +++++++
 gcc/testsuite/gcc.target/i386/pr53533-4.c | 46 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr53533-5.c | 22 ++++++
 7 files changed, 248 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-5.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 44a385b912d..54f53a1f988 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -489,6 +489,88 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (!overflow || TYPE_OVERFLOW_WRAPS (type))
    (mult @0 { wide_int_to_tree (type, mul); }))))
 
+/* Similar to above, but there could be an extra add/sub between
+   successive multuiplications.  */
+(simplify
+ (mult (plus:s (mult:s@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
+ (with {
+   bool overflowed = true;
+   wi::overflow_type ovf1, ovf2;
+   wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@3),
+			   TYPE_SIGN (type), &ovf1);
+   wide_int add = wi::mul (wi::to_wide (@2), wi::to_wide (@3),
+			   TYPE_SIGN (type), &ovf2);
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+    {
+#if GIMPLE
+      value_range vr0;
+      if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE
+	  && get_global_range_query ()->range_of_expr (vr0, @4)
+	  && vr0.kind () == VR_RANGE)
+	{
+	  wide_int wmin0 = vr0.lower_bound ();
+	  wide_int wmax0 = vr0.upper_bound ();
+	  wmin0 = wi::mul (wmin0, wi::to_wide (@3), TYPE_SIGN (type), &ovf1);
+	  wmax0 = wi::mul (wmax0, wi::to_wide (@3), TYPE_SIGN (type), &ovf2);
+	  if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE)
+	    {
+	      wi::add (wmin0, add, TYPE_SIGN (type), &ovf1);
+	      wi::add (wmax0, add, TYPE_SIGN (type), &ovf2);
+	      if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE)
+		overflowed = false;
+	    }
+	}
+#endif
+    }
+  else
+   overflowed = false;
+ }
+  /* Skip folding on overflow.  */
+  (if (!overflowed)
+   (plus (mult @0 { wide_int_to_tree (type, mul); })
+	 { wide_int_to_tree (type, add); }))))
+
+/* Similar to above, but a multiplication between successive additions.  */
+(simplify
+ (plus (mult:s (plus:s @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
+ (with {
+   bool overflowed = true;
+   wi::overflow_type ovf1;
+   wi::overflow_type ovf2;
+   wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2),
+			   TYPE_SIGN (type), &ovf1);
+   wide_int add = wi::add (mul, wi::to_wide (@3),
+			   TYPE_SIGN (type), &ovf2);
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+    {
+#if GIMPLE
+      value_range vr0;
+      if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE
+	  && get_global_range_query ()->range_of_expr (vr0, @0)
+	  && vr0.kind () == VR_RANGE)
+	{
+	  wide_int wmin0 = vr0.lower_bound ();
+	  wide_int wmax0 = vr0.upper_bound ();
+	  wmin0 = wi::mul (wmin0, wi::to_wide (@2), TYPE_SIGN (type), &ovf1);
+	  wmax0 = wi::mul (wmax0, wi::to_wide (@2), TYPE_SIGN (type), &ovf2);
+	  if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE)
+	    {
+	      wi::add (wmin0, mul, TYPE_SIGN (type), &ovf1);
+	      wi::add (wmax0, mul, TYPE_SIGN (type), &ovf2);
+	      if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE)
+		overflowed = false;
+	    }
+	}
+#endif
+    }
+  else
+   overflowed = false;
+
+ }
+  /* Skip folding on overflow.  */
+  (if (!overflowed)
+   (plus (mult @0 @2) { wide_int_to_tree (type, add); }))))
+
 /* Optimize A / A to 1.0 if we don't care about
    NaNs or Infinities.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c b/gcc/testsuite/gcc.dg/vect/slp-11a.c
index bcd3c861ca4..e6632fa77be 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c
@@ -9,14 +9,14 @@ int
 main1 ()
 {
   int i;
-  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
-  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
 
   /* Different operations - not SLPable.  */
   for (i = 0; i < N; i++)
     {
       a0 = in[i*8] + 5;
-      a1 = in[i*8 + 1] * 6;
+      a1 = in[i*8 + 1] * 51072;
       a2 = in[i*8 + 2] + 7;
       a3 = in[i*8 + 3] + 8;
       a4 = in[i*8 + 4] + 9;
@@ -25,7 +25,7 @@ main1 ()
       a7 = in[i*8 + 7] + 12;
 
       b0 = a0 * 3;
-      b1 = a1 * 2;
+      b1 = a1 * 51072;
       b2 = a2 * 12;
       b3 = a3 * 5;
       b4 = a4 * 8;
@@ -47,7 +47,7 @@ main1 ()
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
-         || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3
+         || out[i*8 + 1] != (in[i*8 + 1] * 51072) * 51072 - 3
          || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
          || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
          || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
diff --git a/gcc/testsuite/gcc.target/i386/pr53533-1.c b/gcc/testsuite/gcc.target/i386/pr53533-1.c
new file mode 100644
index 00000000000..095de665366
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr53533-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+/* { dg-final { scan-assembler-times "imull\[ \t\]" "1" } } */
+/* { dg-final { scan-assembler-times "(?:addl|subl)\[ \t\]" "1" { target { ! ia32 } } } } */
+
+void
+__attribute__((noipa))
+foo (unsigned a[256], unsigned b[256])
+{
+  int i;
+  for (i = 0; i < 256; ++i)
+    {
+      unsigned tmp = a[i] + 12345U;
+      tmp *= 914237U;
+      tmp += 12332U;
+      tmp *= 914237U;
+      tmp += 12332U;
+      tmp *= 914237U;
+      tmp -= 13U;
+      tmp *= 8000U;
+      b[i] = tmp;
+    }
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr53533-2.c b/gcc/testsuite/gcc.target/i386/pr53533-2.c
new file mode 100644
index 00000000000..c31b6ff4dec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr53533-2.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "pr53533-1.c"
+
+void
+__attribute__((optimize("-O0")))
+foo1 (unsigned a[256], unsigned b[256])
+{
+  int i;
+  for (i = 0; i < 256; ++i)
+    {
+      unsigned tmp = a[i] + 12345U;
+      tmp *= 914237U;
+      tmp += 12332U;
+      tmp *= 914237U;
+      tmp += 12332U;
+      tmp *= 914237U;
+      tmp -= 13U;
+      tmp *= 8000U;
+      b[i] = tmp;
+    }
+}
+
+int main()
+{
+  unsigned int a[256];
+  unsigned int b[256];
+  unsigned int c[256];
+  for (unsigned int i = 0; i != 256; i++)
+    {
+      b[i] = 0;
+      c[i] = 1;
+      a[i] = i * i - 10 * i + 33;
+    }
+  foo (a, b);
+  foo1 (a, c);
+
+  for (unsigned int i = 0; i != 256; i++)
+    {
+      if (b[i] != c[i])
+	__builtin_abort ();
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr53533-3.c b/gcc/testsuite/gcc.target/i386/pr53533-3.c
new file mode 100644
index 00000000000..3b260d134e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr53533-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fwrapv" } */
+/* { dg-final { scan-assembler-times "imull\[ \t\]" "1" } } */
+/* { dg-final { scan-assembler-times "(?:addl|subl)\[ \t\]" "1" { target { ! ia32 } } } } */
+
+void
+__attribute__((noipa))
+foo (int a[256], int b[256])
+{
+  int i;
+  for (i = 0; i < 256; ++i)
+    {
+      int tmp = a[i] + 12345;
+      tmp *= 914237;
+      tmp += 12332;
+      tmp *= 914237;
+      tmp += 12332;
+      tmp *= 914237;
+      tmp -= 13;
+      tmp *= 8000;
+      b[i] = tmp;
+    }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr53533-4.c b/gcc/testsuite/gcc.target/i386/pr53533-4.c
new file mode 100644
index 00000000000..c29f90a44dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr53533-4.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fwrapv" } */
+
+#include "pr53533-3.c"
+
+void
+__attribute__((optimize("-O0")))
+foo1 (int a[256], int b[256])
+{
+  int i;
+  for (i = 0; i < 256; ++i)
+    {
+      int tmp = a[i] + 12345;
+      tmp *= 914237;
+      tmp += 12332;
+      tmp *= 914237;
+      tmp += 12332;
+      tmp *= 914237;
+      tmp -= 13;
+      tmp *= 8000;
+      b[i] = tmp;
+    }
+}
+
+int main()
+{
+  int a[256];
+  int b[256];
+  int c[256];
+  for (int i = 0; i != 256; i++)
+    {
+      b[i] = 0;
+      c[i] = 1;
+      a[i] = i * i - 10 * i + 33;
+    }
+  foo (a, b);
+  foo1 (a, c);
+
+  for (unsigned int i = 0; i != 256; i++)
+    {
+      if (b[i] != c[i])
+	__builtin_abort ();
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr53533-5.c b/gcc/testsuite/gcc.target/i386/pr53533-5.c
new file mode 100644
index 00000000000..56fe4f44064
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr53533-5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times {(?n)\* 2147483647} 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times {(?n)\* 1073741823} 1 "optimized" } } */
+
+#define INT_MAX 2147483647
+int
+foo (int a)
+{
+  /* When a == -2, there's no overflow for (a + 1) * INT_MAX - 1.
+     but overflow for a * INT_MAX + (INT_MAX - 1).
+     Don't simpify it.  */
+  return (a + 1) * INT_MAX - 1;
+}
+
+int
+foo1 (int a)
+{
+  /* Be conservative here, don't simplify this as long as
+     a * 2147483646 may overflow.  */
+  return 1073741823 * (a * 2 + 1);
+}