From patchwork Sun Dec  5 21:55:00 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 48515
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 1397F3858408
	for <patchwork@sourceware.org>; Sun,  5 Dec 2021 21:55:33 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1397F3858408
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1638741333;
	bh=9JGKDqR2Pa8NFHm+I8PBzA8AkEwmhkVtty1b+cCHfRI=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=YO7xuutOGmuYz6BbCUHZveUoMN2+cHcxwB+oNeYZReicaolNtiHZWgJ/RjzMKrBzO
	 89VzjV8H++jpsyy+nOfcEl+vpeMZiVamrWtEhZ4wIjv5NxW6sisH6Zm2Z4ZY3EA9eh
	 HvW4HMN25YnIOhz3HH+n3IzrJ+XtKRYyItjQZDew=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id A6D223858D28
 for <gcc-patches@gcc.gnu.org>; Sun,  5 Dec 2021 21:55:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A6D223858D28
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3502E113E
 for <gcc-patches@gcc.gnu.org>; Sun,  5 Dec 2021 13:55:02 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C93403F5A1
 for <gcc-patches@gcc.gnu.org>; Sun,  5 Dec 2021 13:55:01 -0800 (PST)
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: [PATCH] ranger: Optimise irange_union
Date: Sun, 05 Dec 2021 21:55:00 +0000
Message-ID: <mptlf0ysonf.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Richard Sandiford via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Richard Sandiford <richard.sandiford@arm.com>
Reply-To: Richard Sandiford <richard.sandiford@arm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

When compiling an optabs.ii at -O2 with a release-checking build,
the hottest function in the profile was irange_union.  This patch
tries to optimise it a bit.  The specific changes are:

- Use quick_push rather than safe_push, since the final number
  of entries is known in advance.

- Avoid assigning wi::to_wide & co. to a temporary wide_int,
  such as in:

    wide_int val_j = wi::to_wide (res[j]);

  wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST
  storage.  Assigning the result to wide_int forces an unnecessary
  copy to temporary storage.

  This is one area where "auto" helps a lot.  In the end though,
  it seemed more readable to inline the wi::to_*s rather than
  use auto.

- Use to_widest_int rather than to_wide_int.  Both are functionally
  correct, but to_widest_int is more efficient, for three reasons:

  - to_wide returns a wide-int representation in which the most
    significant element might not be canonically sign-extended.
    This is because we want to allow the storage of an INTEGER_CST
    like 0x1U << 31 to be accessed directly with both a wide_int view
    (where only 32 bits matter) and a widest_int view (where many more
    bits matter, and where the 32 bits are zero-extended to match the
    unsigned type).  However, operating on uncanonicalised wide_int
    forms is less efficient than operating on canonicalised forms.

  - to_widest_int has a constant rather than variable precision and
    there are never any redundant upper bits to worry about.

  - Using widest_int avoids the need for an overflow check, since
    there is enough precision to add 1 to any IL constant without
    wrap-around.

This gives a ~2% compile-time speed up with the test above.

I also tried adding a path for two single-pair ranges, but it
wasn't a win.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
	* value-range.cc (irange::irange_union): Use quick_push rather
	than safe_push.  Use widest_int rather than wide_int.  Avoid
	assigning wi::to_* results to wide*_int temporaries.
---
 gcc/value-range.cc | 46 +++++++++++++---------------------------------
 1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 82509fa55a7..d38d0786072 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1550,70 +1550,50 @@ irange::irange_union (const irange &r)
   // the merge is performed.
   //
   // [Xi,Yi]..[Xn,Yn]  U  [Xj,Yj]..[Xm,Ym]   -->  [Xk,Yk]..[Xp,Yp]
-  tree ttype = r.type ();
-  signop sign = TYPE_SIGN (ttype);
-
-  auto_vec<tree, 20> res;
-  wide_int u1 ;
-  wi::overflow_type ovf;
+  auto_vec<tree, 20> res (m_num_ranges * 2 + r.m_num_ranges * 2);
   unsigned i = 0, j = 0, k = 0;
 
   while (i < m_num_ranges * 2 && j < r.m_num_ranges * 2)
     {
       // lower of Xi and Xj is the lowest point.
-      if (wi::le_p (wi::to_wide (m_base[i]), wi::to_wide (r.m_base[j]), sign))
+      if (wi::to_widest (m_base[i]) <= wi::to_widest (r.m_base[j]))
 	{
-	  res.safe_push (m_base[i]);
-	  res.safe_push (m_base[i + 1]);
+	  res.quick_push (m_base[i]);
+	  res.quick_push (m_base[i + 1]);
 	  k += 2;
 	  i += 2;
 	}
       else
 	{
-	  res.safe_push (r.m_base[j]);
-	  res.safe_push (r.m_base[j + 1]);
+	  res.quick_push (r.m_base[j]);
+	  res.quick_push (r.m_base[j + 1]);
 	  k += 2;
 	  j += 2;
 	}
     }
   for ( ; i < m_num_ranges * 2; i += 2)
     {
-      res.safe_push (m_base[i]);
-      res.safe_push (m_base[i + 1]);
+      res.quick_push (m_base[i]);
+      res.quick_push (m_base[i + 1]);
       k += 2;
     }
   for ( ; j < r.m_num_ranges * 2; j += 2)
     {
-      res.safe_push (r.m_base[j]);
-      res.safe_push (r.m_base[j + 1]);
+      res.quick_push (r.m_base[j]);
+      res.quick_push (r.m_base[j + 1]);
       k += 2;
     }
 
   // Now normalize the vector removing any overlaps.
   i = 2;
-  int prec = TYPE_PRECISION (ttype);
-  wide_int max_val = wi::max_value (prec, sign);
   for (j = 2; j < k ; j += 2)
     {
-      wide_int val_im1 = wi::to_wide (res[i - 1]);
-      if (val_im1 == max_val)
-	break;
-      u1 = wi::add (val_im1, 1, sign, &ovf);
-
-      // Overflow indicates we are at MAX already.
-      // A wide int bug requires the previous max_val check
-      // trigger: gcc.c-torture/compile/pr80443.c  with -O3
-      if (ovf == wi::OVF_OVERFLOW)
-	break;
-
-      wide_int val_j = wi::to_wide (res[j]);
-      wide_int val_jp1 = wi::to_wide (res[j+1]);
       // Current upper+1 is >= lower bound next pair, then we merge ranges.
-      if (wi::ge_p (u1, val_j, sign))
+      if (wi::to_widest (res[i - 1]) + 1 >= wi::to_widest (res[j]))
 	{
 	  // New upper bounds is greater of current or the next one.
-	  if (wi::gt_p (val_jp1, val_im1, sign))
-	    res [i - 1] = res[j + 1];
+	  if (wi::to_widest (res[j + 1]) > wi::to_widest (res[i - 1]))
+	    res[i - 1] = res[j + 1];
 	}
       else
 	{