From patchwork Thu Jan 13 01:48:21 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jiufu Guo <guojiufu@linux.ibm.com>
X-Patchwork-Id: 49943
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 51F403857830
	for <patchwork@sourceware.org>; Thu, 13 Jan 2022 01:49:57 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 51F403857830
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1642038597;
	bh=dFbZX5S0N9d29m/3QWnYSo6npIs8vIH84tSYC7CkIZc=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=TCpORFsrVH8pIMQTqigsnlC6N3y2ygEJlinoUD6o9a3v6iURB+grwM7RoHGLoFt6+
	 VMo791uuhkxussKwj8l/bD19O7Y0Ho8XnNziZXJhcfOLfrN9VqCsTmwBf/2AN7cBD5
	 l5jr6dTHG3R2P84yp2YA3kY7aoJE/OFkvyXPaieE=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 8F8973858D39
 for <gcc-patches@gcc.gnu.org>; Thu, 13 Jan 2022 01:48:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F8973858D39
Received: from pps.filterd (m0098410.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id
 20D0vTPO030528;
 Thu, 13 Jan 2022 01:48:31 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3dja0cgnh5-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 13 Jan 2022 01:48:31 +0000
Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 20D1aJpU030002;
 Thu, 13 Jan 2022 01:48:30 GMT
Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com
 [149.81.74.107])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3dja0cgngn-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 13 Jan 2022 01:48:30 +0000
Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1])
 by ppma03fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 20D1fVgh012240;
 Thu, 13 Jan 2022 01:48:28 GMT
Received: from b06cxnps4074.portsmouth.uk.ibm.com
 (d06relay11.portsmouth.uk.ibm.com [9.149.109.196])
 by ppma03fra.de.ibm.com with ESMTP id 3df289pe8a-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 13 Jan 2022 01:48:28 +0000
Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com
 [9.149.105.62])
 by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 20D1mObU44433916
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Thu, 13 Jan 2022 01:48:24 GMT
Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id CC0FEAE057;
 Thu, 13 Jan 2022 01:48:24 +0000 (GMT)
Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 97A8FAE04D;
 Thu, 13 Jan 2022 01:48:23 +0000 (GMT)
Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127])
 by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP;
 Thu, 13 Jan 2022 01:48:23 +0000 (GMT)
To: gcc-patches@gcc.gnu.org
Subject: [PATCH 2/2] Add assumption combining iv
Date: Thu, 13 Jan 2022 09:48:21 +0800
Message-Id: <20220113014821.168869-2-guojiufu@linux.ibm.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20220113014821.168869-1-guojiufu@linux.ibm.com>
References: <20220113014821.168869-1-guojiufu@linux.ibm.com>
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: 29H5WIgVpk3vi0MaRjFrvIhbRDI7ANan
X-Proofpoint-ORIG-GUID: 1M6aUa8VOEN8Wyq05o_dCA836JaLoXs8
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513
 definitions=2022-01-12_08,2022-01-11_01,2021-12-02_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 adultscore=0 mlxlogscore=999
 malwarescore=0 spamscore=0 lowpriorityscore=0 phishscore=0 clxscore=1015
 bulkscore=0 suspectscore=0 mlxscore=0 impostorscore=0 priorityscore=1501
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000
 definitions=main-2201130002
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org>
From: Jiufu Guo <guojiufu@linux.ibm.com>
Reply-To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: rguenther@suse.de, segher@kernel.crashing.org, wschmidt@linux.ibm.com,
 jlaw@tachyum.com, dje.gcc@gmail.com
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This is the second patch for two IVs combining.
When one IV is chasing another one, to make it safe, we should check if
there is wrap/overflow for either IV.

With the assumption, which computed as this patch, the number of iterations
can be caculated, even the no_overflow flag is not updated for unsigned IVs,
like the test case of this patch.

This patch pass bootstrap and regtest on ppc64le and x86_64.
Is this ok for trunk, or it may more suitable for stage1.

BR,
Jiufu

	PR tree-optimization/102131

gcc/ChangeLog:

	* tree-ssa-loop-niter.c (get_step_count): New function.
	(iv_chase_assumption): New function.
	(number_of_iterations_cond): Call iv_chase_assumption.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr102131.c: New test.
---
 gcc/tree-ssa-loop-niter.c            | 73 ++++++++++++++++++++++++----
 gcc/testsuite/gcc.dg/vect/pr102131.c | 47 ++++++++++++++++++
 2 files changed, 110 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr102131.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 439d595a79f..56261164f28 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1788,6 +1788,63 @@ dump_affine_iv (FILE *file, affine_iv *iv)
     }
 }
 
+/* Generate expr: (HIGH - LOW) / STEP, under UTYPE. */
+
+static tree
+get_step_count (tree high, tree low, tree step, tree utype,
+		bool end_inclusive = false)
+{
+  tree delta = fold_build2 (MINUS_EXPR, TREE_TYPE (low), high, low);
+  delta = fold_convert (utype, delta);
+  if (end_inclusive)
+    delta = fold_build2 (PLUS_EXPR, utype, delta, build_one_cst (utype));
+
+  if (tree_int_cst_sign_bit (step))
+    step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step);
+  step = fold_convert (utype, step);
+
+  return fold_build2 (FLOOR_DIV_EXPR, utype, delta, step);
+}
+
+/*  Get the additional assumption if both two steps are not zero.
+    Assumptions satisfy that there is no overflow or wrap during
+    v0 and v1 chasing.  */
+
+static tree
+iv_chase_assumption (affine_iv *iv0, affine_iv *iv1, tree step,
+		     enum tree_code code)
+{
+  /* Here, no need additional assumptions for NE.  */
+  if (code == NE_EXPR)
+    return boolean_true_node;
+
+  /* No need addition assumption for pointer.  */
+  tree type = TREE_TYPE (iv0->base);
+  if (POINTER_TYPE_P (type))
+    return boolean_true_node;
+
+  bool positive0 = !tree_int_cst_sign_bit (iv0->step);
+  bool positive1 = !tree_int_cst_sign_bit (iv1->step);
+  tree utype = unsigned_type_for (type);
+  bool add1 = code == LE_EXPR;
+  tree niter = get_step_count (iv1->base, iv0->base, step, utype, add1);
+
+  int prec = TYPE_PRECISION (type);
+  signop sgn = TYPE_SIGN (type);
+  tree max = wide_int_to_tree (type, wi::max_value (prec, sgn));
+  tree min = wide_int_to_tree (type, wi::min_value (prec, sgn));
+  tree valid_niter0, valid_niter1;
+
+  valid_niter0 = positive0 ? get_step_count (max, iv0->base, iv0->step, utype)
+			   : get_step_count (iv0->base, min, iv0->step, utype);
+  valid_niter1 = positive1 ? get_step_count (max, iv1->base, iv1->step, utype)
+			   : get_step_count (iv1->base, min, iv1->step, utype);
+
+  tree e0 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter0);
+  tree e1 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter1);
+  return fold_build2 (TRUTH_AND_EXPR, boolean_type_node, e0, e1);
+}
+
 /* Determine the number of iterations according to condition (for staying
    inside loop) which compares two induction variables using comparison
    operator CODE.  The induction variable on left side of the comparison
@@ -1879,11 +1936,10 @@ number_of_iterations_cond (class loop *loop,
        {iv0.base, iv0.step - iv1.step} cmp_code {iv1.base, 0}
 
      provided that either below condition is satisfied:
+     a. iv0.step and iv1.step are integer.
+     b. Additional condition: before iv0 chase up v1, iv0 and iv1 should not
+     step over min or max of the type.  */
 
-       a) the test is NE_EXPR;
-       b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
-
-     This rarely occurs in practice, but it is simple enough to manage.  */
   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
     {
       tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
@@ -1894,15 +1950,12 @@ number_of_iterations_cond (class loop *loop,
       if (tree_int_cst_sign_bit (step))
 	return false;
 
-      if (code != NE_EXPR
-	  && (TREE_CODE (step) != INTEGER_CST
-	      || !iv0->no_overflow || !iv1->no_overflow))
+      niter->assumptions = iv_chase_assumption (iv0, iv1, step, code);
+      if (integer_zerop (niter->assumptions))
 	return false;
 
       iv0->step = step;
-      if (!POINTER_TYPE_P (type))
-	iv0->no_overflow = false;
-
+      iv0->no_overflow = true;
       iv1->step = build_int_cst (step_type, 0);
       iv1->no_overflow = true;
     }
diff --git a/gcc/testsuite/gcc.dg/vect/pr102131.c b/gcc/testsuite/gcc.dg/vect/pr102131.c
new file mode 100644
index 00000000000..23975cfeadb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr102131.c
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+#define MAX ((unsigned int) 0xffffffff)
+#define MIN ((unsigned int) (0))
+
+int arr[512];
+
+#define FUNC(NAME, CODE, S0, S1)                                               \
+  unsigned __attribute__ ((noinline)) NAME (unsigned int b0, unsigned int b1)  \
+  {                                                                            \
+    unsigned int n = 0;                                                        \
+    unsigned int i0, i1;                                                       \
+    int *p = arr;                                                              \
+    for (i0 = b0, i1 = b1; i0 CODE i1; i0 += S0, i1 += S1)                     \
+      {                                                                        \
+	n++;                                                                   \
+	*p++ = i0 + i1;                                                        \
+      }                                                                        \
+    return n;                                                                  \
+  }
+
+FUNC (lt_5_1, <, 5, 1);
+FUNC (le_1_m5, <=, 1, -5);
+FUNC (lt_1_10, <, 1, 10);
+
+int
+main ()
+{
+  int fail = 0;
+  if (lt_5_1 (MAX - 124, MAX - 27) != 28)
+    fail++;
+
+  /* to save time, do not run this. */
+  /*
+  if (le_1_m5 (MIN + 1, MIN + 9) != 715827885)
+    fail++;  */
+
+  if (lt_1_10 (MAX - 1000, MAX - 500) != 51)
+    fail++;
+
+  if (fail)
+    __builtin_abort ();
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */