From patchwork Thu Nov 25 18:17:18 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Roger Sayle <roger@nextmovesoftware.com>
X-Patchwork-Id: 48159
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 13F483858034
	for <patchwork@sourceware.org>; Thu, 25 Nov 2021 18:17:37 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id 710553858D35
 for <gcc-patches@gcc.gnu.org>; Thu, 25 Nov 2021 18:17:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 710553858D35
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=Da7nODFaz2eCXlapOWuF3FJ8l70Djxh+GKyfqqTsOcM=; b=ZNvX+h4CPmibVEMAb4/btVs9ac
 1aQFC7kDRcMw4/PuHlJbXNkbEGt+oRkLHd28Ygmpopi2RYOZaid8l52qJozWcXDm9MRu9inkK3HTx
 1aTQXeSnKkfh+WRLlrSPT34D1bDSGkXEVWMPLpImKt29RxlOHcJdrGfXpb41oYpMEi3SV7nYYye5f
 4usUf7b6+hMc6QmWMTtwXnkOKSzQ9QYyxVaESTEXTwjX0LEMOrXtjpDUKReTSdbC5/nSpRMy6mc4m
 Yb5JJQxxRHb0fPqZgr4fb0wJox7QAOsG4ttIOt8vBkjUvtTIuU2pO9UnpZoxEz6+MkUU3ARJUwkEV
 ExAJV4pw==;
Received: from [185.62.158.67] (port=59120 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <roger@nextmovesoftware.com>)
 id 1mqJJ9-0002kz-Rp; Thu, 25 Nov 2021 13:17:20 -0500
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Richard Biener'" <richard.guenther@gmail.com>
Subject: [PATCH take 3] ivopts: Improve code generated for very simple loops.
Date: Thu, 25 Nov 2021 18:17:18 -0000
Message-ID: <052601d7e228$af71a9b0$0e54fd10$@nextmovesoftware.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AdfiIK7rIisovF5PTTi47esl8G0HGA==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com:
 roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

On Tue, Nov 23, 2021 at 12:46PM Richard Biener < richard.guenther@gmail.com> wrote:
> On Thu, Nov 18, 2021 at 4:18 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> > > The patch doesn't add any testcase.
> >
> > The three new attached tests check that the critical invariants have a
> > simpler form, and hopefully shouldn't be affected by whether the
> > optimizer and/or backend costs actually decide to perform this iv substitution
> or not.
> 
> The testcases might depend on lp64 though, did you test them with -m32?
> IMHO it's fine to require lp64 here.

Great catch.  You're right that when the loop index has the same precision as the
target's pointer, that fold is (already) able to simplify the ((EXPR)-1)+1, so that with
-m32 my new tests ivopts-[567].c fail.  I've added "require lp64" to those tests, but
I've also added two more tests, using char and unsigned char for the loop expression,
which are optimized on both ilp32 and lp64.

For example, with -O2 -m32, we see the following improvements in ivopts-8.c:
diff ivopts-8.old.s ivopts-8.new.s
14,16c14,15
<       subl    $1, %ecx
<       movzbl  %cl, %ecx
<       leal    4(%eax,%ecx,4), %ecx
---
>       movsbl  %cl, %ecx
>       leal    (%eax,%ecx,4), %ecx

This might also explain why GCC currently generates sub-optimal code.  Back when
ivopts was written, most folks were on i686, so the generated code was optimal.
But with the transition to x86_64, the code is correct, just slightly less efficient.

> I'm a bit unsure about adding this special-casing in cand_value_at in general - it
> does seem that we're doing sth wrong elsewhere - either by not simplifying even
> though enough knowledge is there or by throwing away knowledge earlier
> (during niter analysis?).

I agree this approach is a bit ugly.  Conceptually, an alternative might be to avoid
throwing away knowledge earlier, during niter analysis, by adding an extra tree field
to the tree_niter_desc structure, so that it returns both niter0 (the iteration count
at the top of the loop) and niter1 (the iteration count at the bottom of the loop),
so that later passes (cand_value_at) can use the tree that's relevant.  Alas, this too is
ugly, and inefficient as we're creating/folding trees that may never be used/useful.
A compromise might be to add an enum field describing how the niter was
calculated to tree_niter_desc, and this can be inspected/used by cand_value_at.
The current patch figures this out by examining the other fields already in
tree_niter_desc.


> Anyway, the patch does look quite safe - can you show some statistics in how
> many time there's extra simplification this way during say bootstrap?

Certainly.  During stage2 and stage3 of a bootstrap on x86_64-pc-linux-gnu,
cand_value_at is called 500657 times.  The majority of calls,
447607 (89.4%), request the value at the end of the loop (after_adjust),
while 53050 (10.6%) request the value at the start of the loop.
102437 calls (20.5%) are optimized by clause 1 [0..N loops]
27939 calls (5.6%) are optimized by clause 2 [beg..end loops]

Looking for opportunities to improve things further, I see that
319608 calls (63.8%) have a LT_EXPR exit test.
160965 calls (32.2%) have a NE_EXPR exit test.
20084 calls (4.0%) have a GT_EXPR exit test.
so handling descending loops wouldn’t be a big win.
I'll investigate whether (constant) step sizes other than 1 are
(i) sufficiently common and (ii) benefit from improved folding.


This revised patch has been test on x86_64-pc-linux-gnu with a
make bootstrap and make -k check, both with and without
--target-board=unix{-m32}, with no new failures.
Ok for mainline?

2021-11-25  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-loop-ivopts.c (cand_value_at): Take a class
	tree_niter_desc* argument instead of just a tree for NITER.
	If we require the iv candidate value at the end of the final
	loop iteration, try using the original loop bound as the
	NITER for sufficiently simple loops.
	(may_eliminate_iv): Update (only) call to cand_value_at.

gcc/testsuite
	* gcc.dg/wrapped-binop-simplify.c: Update expected test result.
	* gcc.dg/tree-ssa/ivopts-5.c: New test case.
	* gcc.dg/tree-ssa/ivopts-6.c: New test case.
	* gcc.dg/tree-ssa/ivopts-7.c: New test case.
	* gcc.dg/tree-ssa/ivopts-8.c: New test case.
	* gcc.dg/tree-ssa/ivopts-9.c: New test case.


Roger
--

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 4769b65..067f823 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5030,28 +5030,57 @@ determine_group_iv_cost_address (struct ivopts_data *data,
   return !sum_cost.infinite_cost_p ();
 }
 
-/* Computes value of candidate CAND at position AT in iteration NITER, and
-   stores it to VAL.  */
+/* Computes value of candidate CAND at position AT in iteration DESC->NITER,
+   and stores it to VAL.  */
 
 static void
-cand_value_at (class loop *loop, struct iv_cand *cand, gimple *at, tree niter,
-	       aff_tree *val)
+cand_value_at (class loop *loop, struct iv_cand *cand, gimple *at,
+	       class tree_niter_desc *desc, aff_tree *val)
 {
   aff_tree step, delta, nit;
   struct iv *iv = cand->iv;
   tree type = TREE_TYPE (iv->base);
+  tree niter = desc->niter;
+  bool after_adjust = stmt_after_increment (loop, cand, at);
   tree steptype;
+
   if (POINTER_TYPE_P (type))
     steptype = sizetype;
   else
     steptype = unsigned_type_for (type);
 
+  /* If AFTER_ADJUST is required, the code below generates the equivalent
+     of BASE + NITER * STEP + STEP, when ideally we'd prefer the expression
+     BASE + (NITER + 1) * STEP, especially when NITER is often of the form
+     SSA_NAME - 1.  Unfortunately, guaranteeing that adding 1 to NITER
+     doesn't overflow is tricky, so we peek inside the TREE_NITER_DESC
+     class for common idioms that we know are safe.  */
+  if (after_adjust
+      && desc->control.no_overflow
+      && integer_onep (desc->control.step)
+      && (desc->cmp == LT_EXPR
+	  || desc->cmp == NE_EXPR)
+      && TREE_CODE (desc->bound) == SSA_NAME)
+    {
+      if (integer_onep (desc->control.base))
+	{
+	  niter = desc->bound;
+	  after_adjust = false;
+	}
+      else if (TREE_CODE (niter) == MINUS_EXPR
+	       && integer_onep (TREE_OPERAND (niter, 1)))
+	{
+	  niter = TREE_OPERAND (niter, 0);
+	  after_adjust = false;
+	}
+    }
+
   tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &step);
   aff_combination_convert (&step, steptype);
   tree_to_aff_combination (niter, TREE_TYPE (niter), &nit);
   aff_combination_convert (&nit, steptype);
   aff_combination_mult (&nit, &step, &delta);
-  if (stmt_after_increment (loop, cand, at))
+  if (after_adjust)
     aff_combination_add (&delta, &step);
 
   tree_to_aff_combination (iv->base, type, val);
@@ -5402,7 +5431,7 @@ may_eliminate_iv (struct ivopts_data *data,
       return true;
     }
 
-  cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
+  cand_value_at (loop, cand, use->stmt, desc, &bnd);
 
   *bound = fold_convert (TREE_TYPE (cand->iv->base),
 			 aff_combination_to_tree (&bnd));
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c
index a5d953b..706eed8 100644
--- a/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { { i?86-*-* x86_64-*-* s390*-*-* } && lp64 } } } */
 /* { dg-options "-O2 -fdump-tree-vrp2-details" } */
-/* { dg-final { scan-tree-dump-times "gimple_simplified to" 4 "vrp2" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 1 "vrp2" } } */
 
 void v1 (unsigned long *in, unsigned long *out, unsigned int n)
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
new file mode 100644
index 0000000..a6af497
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int*
+foo (int* mem, int sz, int val)
+{
+  int i;
+  for (i = 0; i < sz; i++)
+    if (mem[i] == val) 
+      return &mem[i];
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "inv_expr \[0-9\]: \\t\\(unsigned long\\) sz_\[0-9\]\\(D\\) \\* 4 \\+ \\(unsigned long\\) mem_\[0-9\]\\(D\\)" "ivopts" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
new file mode 100644
index 0000000..8383154
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int*
+foo (int* mem, int sz, int val)
+{
+  int i;
+  for (i = 0; i != sz; i++)
+    if (mem[i] == val) 
+      return &mem[i];
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "inv_expr \[0-9\]: \\t\\(unsigned long\\) sz_\[0-9\]\\(D\\) \\* 4 \\+ \\(unsigned long\\) mem_\[0-9\]\\(D\\)" "ivopts" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
new file mode 100644
index 0000000..44f5603
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int*
+foo (int* mem, int beg, int end, int val)
+{
+  int i;
+  for (i = beg; i < end; i++)
+    if (mem[i] == val) 
+      return &mem[i];
+  return 0;
+}
+/* { dg-final { scan-tree-dump "inv_expr \[0-9\]: \\t\\(unsigned long\\) \\(\\(unsigned int\\) end_\[0-9\]\\(D\\) - \\(unsigned int\\) beg_\[0-9\]\\(D\\)\\)" "ivopts" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
new file mode 100644
index 0000000..9e89a03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int*
+foo (int* mem, char sz, int val)
+{
+  char i;
+  for (i = 0; i < sz; i++)
+    if (mem[i] == val) 
+      return &mem[i];
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "inv_expr \[0-9\]: \\t\\(unsigned long\\) sz_\[0-9\]*\\(D\\) \\* 4 \\+ \\(unsigned long\\) mem_\[0-9\]*\\(D\\)" "ivopts" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
new file mode 100644
index 0000000..22b4a12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int*
+foo (int* mem, unsigned char sz, int val)
+{
+  unsigned char i;
+  for (i = 0; i < sz; i++)
+    if (mem[i] == val) 
+      return &mem[i];
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "inv_expr \[0-9\]: \\t\\(unsigned long\\) sz_\[0-9\]\\(D\\) \\* 4 \\+ \\(unsigned long\\) mem_\[0-9\]\\(D\\)" "ivopts" } } */