[2/2] PR tree-optimization/103231 - Directly resolve range_of_stmt dependencies.

Message ID 155ced1e-67fb-7cfb-f94a-138e2b2ead35@redhat.com
State Committed
Headers
Series [1/2] Split return functionality of get_non_stale_global_range. |

Commit Message

Andrew MacLeod Nov. 23, 2021, 5:02 p.m. UTC
  This is the second patch in the series.

Ranger uses its own API to recursively satisfy dependencies. When 
range_of_stmt is called on _1482 = _1154 + _1177;  it picks up the 
ranges of _1154 and _1177 from it's cache. If those statements have not 
been seen yet, it recursively calls range_of_stmt on each one to resolve 
the answer.  Each main API call can trigger up to 5 other calls to get 
to the next API point:

    gimple_ranger::fold_range_internal (...)
    gimple_ranger::range_of_stmt (_1154,...)
    gimple_ranger::range_of_expr (_1154,....)
    fold_using_range::range_of_range_op (..)
    fold_using_range::fold_stmt (...)
    gimple_ranger::fold_range_internal (...)
    gimple_ranger::range_of_stmt (_1482,...)

For a normal forward walk, values tend to already be in the cache, but 
when we try to answer a range_on_edge question on a back edge, it can 
trigger a very long series of queries.  I spent some time analyzing 
these patterns, and found that regardless of which API entry point was 
used, ultimately range_of_stmt is invoked in a predictable order to 
initiate the cache values.

This patch implements a dependency resolver which when range_of_stmt 
uses when it is called on something which does not have a cache entry 
yet (thus the disambiguation of the temporal failure vs lack of cache 
entry in the previous patch)

This looks at each operand, and if that operand does not have a cache 
entry, pushes it on a stack.   Names are popped from the stack and 
fold_using_range() is invoked once all the operands have been 
resolved.   When we do get to call fold_using_range::fold_stmt(), we are 
sure the operands are cached and the value will simply be calculated.  
This is ultimately the exact series of events that would have happened 
had the main API been used... except we don't involve the call stack 
anymore for each one.

Well, mostly :-).  For this fix, we only do this with operands of stmts 
which have a range-ops handler.. meaning we do not use the API for 
anything range-ops understands.  We will still use the main API for 
resolving PHIS and other statements as they are encountered.    We could 
do this for PHIS as well, but for the most part it was the chains of 
stmts within a block that were causing the vast majority of the issue.  
If we later discover large chains of PHIs are causing issues as well, 
then I can easily add them to this as well.  I avoided them this time 
because there is extra overhead involved in traversing all the PHI 
arguments extra times.  Sticking with range-ops limits us to 2 operands 
to check, and the overhead is very minimal.

I have tested this with PHIs as well and we could just include them 
upfront. The overhead is more than doubled, but the increased compile 
time of a VRP pass is still under 1%.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

Andrew
  

Comments

Richard Biener Nov. 24, 2021, 9:17 a.m. UTC | #1
On Tue, Nov 23, 2021 at 6:04 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This is the second patch in the series.
>
> Ranger uses its own API to recursively satisfy dependencies. When
> range_of_stmt is called on _1482 = _1154 + _1177;  it picks up the
> ranges of _1154 and _1177 from it's cache. If those statements have not
> been seen yet, it recursively calls range_of_stmt on each one to resolve
> the answer.  Each main API call can trigger up to 5 other calls to get
> to the next API point:
>
>     gimple_ranger::fold_range_internal (...)
>     gimple_ranger::range_of_stmt (_1154,...)
>     gimple_ranger::range_of_expr (_1154,....)
>     fold_using_range::range_of_range_op (..)
>     fold_using_range::fold_stmt (...)
>     gimple_ranger::fold_range_internal (...)
>     gimple_ranger::range_of_stmt (_1482,...)
>
> For a normal forward walk, values tend to already be in the cache, but
> when we try to answer a range_on_edge question on a back edge, it can
> trigger a very long series of queries.  I spent some time analyzing
> these patterns, and found that regardless of which API entry point was
> used, ultimately range_of_stmt is invoked in a predictable order to
> initiate the cache values.
>
> This patch implements a dependency resolver which when range_of_stmt
> uses when it is called on something which does not have a cache entry
> yet (thus the disambiguation of the temporal failure vs lack of cache
> entry in the previous patch)
>
> This looks at each operand, and if that operand does not have a cache
> entry, pushes it on a stack.   Names are popped from the stack and
> fold_using_range() is invoked once all the operands have been
> resolved.   When we do get to call fold_using_range::fold_stmt(), we are
> sure the operands are cached and the value will simply be calculated.
> This is ultimately the exact series of events that would have happened
> had the main API been used... except we don't involve the call stack
> anymore for each one.
>
> Well, mostly :-).  For this fix, we only do this with operands of stmts
> which have a range-ops handler.. meaning we do not use the API for
> anything range-ops understands.  We will still use the main API for
> resolving PHIS and other statements as they are encountered.    We could
> do this for PHIS as well, but for the most part it was the chains of
> stmts within a block that were causing the vast majority of the issue.
> If we later discover large chains of PHIs are causing issues as well,
> then I can easily add them to this as well.  I avoided them this time
> because there is extra overhead involved in traversing all the PHI
> arguments extra times.  Sticking with range-ops limits us to 2 operands
> to check, and the overhead is very minimal.
>
> I have tested this with PHIs as well and we could just include them
> upfront. The overhead is more than doubled, but the increased compile
> time of a VRP pass is still under 1%.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

OK.

Richard.

> Andrew
>
>
>
  
Andrew MacLeod Nov. 24, 2021, 2:06 p.m. UTC | #2
On 11/24/21 04:17, Richard Biener wrote:
> On Tue, Nov 23, 2021 at 6:04 PM Andrew MacLeod via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> This is the second patch in the series.
>>
>> Ranger uses its own API to recursively satisfy dependencies. When
>> range_of_stmt is called on _1482 = _1154 + _1177;  it picks up the
>> ranges of _1154 and _1177 from it's cache. If those statements have not
>> been seen yet, it recursively calls range_of_stmt on each one to resolve
>> the answer.  Each main API call can trigger up to 5 other calls to get
>> to the next API point:
>>
>>      gimple_ranger::fold_range_internal (...)
>>      gimple_ranger::range_of_stmt (_1154,...)
>>      gimple_ranger::range_of_expr (_1154,....)
>>      fold_using_range::range_of_range_op (..)
>>      fold_using_range::fold_stmt (...)
>>      gimple_ranger::fold_range_internal (...)
>>      gimple_ranger::range_of_stmt (_1482,...)
>>
>> For a normal forward walk, values tend to already be in the cache, but
>> when we try to answer a range_on_edge question on a back edge, it can
>> trigger a very long series of queries.  I spent some time analyzing
>> these patterns, and found that regardless of which API entry point was
>> used, ultimately range_of_stmt is invoked in a predictable order to
>> initiate the cache values.
>>
>> This patch implements a dependency resolver which when range_of_stmt
>> uses when it is called on something which does not have a cache entry
>> yet (thus the disambiguation of the temporal failure vs lack of cache
>> entry in the previous patch)
>>
>> This looks at each operand, and if that operand does not have a cache
>> entry, pushes it on a stack.   Names are popped from the stack and
>> fold_using_range() is invoked once all the operands have been
>> resolved.   When we do get to call fold_using_range::fold_stmt(), we are
>> sure the operands are cached and the value will simply be calculated.
>> This is ultimately the exact series of events that would have happened
>> had the main API been used... except we don't involve the call stack
>> anymore for each one.
>>
>> Well, mostly :-).  For this fix, we only do this with operands of stmts
>> which have a range-ops handler.. meaning we do not use the API for
>> anything range-ops understands.  We will still use the main API for
>> resolving PHIS and other statements as they are encountered.    We could
>> do this for PHIS as well, but for the most part it was the chains of
>> stmts within a block that were causing the vast majority of the issue.
>> If we later discover large chains of PHIs are causing issues as well,
>> then I can easily add them to this as well.  I avoided them this time
>> because there is extra overhead involved in traversing all the PHI
>> arguments extra times.  Sticking with range-ops limits us to 2 operands
>> to check, and the overhead is very minimal.
>>
>> I have tested this with PHIs as well and we could just include them
>> upfront. The overhead is more than doubled, but the increased compile
>> time of a VRP pass is still under 1%.
>>
>> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?
> OK.
>
> Richard.
>
>> Andrew
>>
>>
>>
committed.
  

Patch

From 28d1fea6e6c0c0368dbc04e895aaa0a6b47c19da Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Mon, 22 Nov 2021 14:39:41 -0500
Subject: [PATCH 3/3] Directly resolve range_of_stmt dependencies.

All ranger API entries eventually call range_of_stmt to ensure there is an
initial global value to work with.  This can cause very deep call chains when
satisfied via the normal API.  Instead, push any dependencies onto a stack
and evaluate them in a depth first manner, mirroring what would have happened
via the normal API calls.

	PR tree-optimization/103231
	gcc/
	* gimple-range.cc (gimple_ranger::gimple_ranger): Create stmt stack.
	(gimple_ranger::gimple_ranger): Delete stmt stack.
	(gimple_ranger::range_of_stmt): Process depenedencies if they have no
	global cache entry.
	(gimple_ranger::prefill_name): New.
	(gimple_ranger::prefill_stmt_dependencies): New.
	* gimple-range.h (class gimple_ranger): Add prototypes.
---
 gcc/gimple-range.cc | 107 +++++++++++++++++++++++++++++++++++++++++++-
 gcc/gimple-range.h  |   4 ++
 2 files changed, 109 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index e3ab3a8bb48..178a470a419 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -46,6 +46,9 @@  gimple_ranger::gimple_ranger () :
   m_oracle = m_cache.oracle ();
   if (dump_file && (param_ranger_debug & RANGER_DEBUG_TRACE))
     tracer.enable_trace ();
+  m_stmt_list.create (0);
+  m_stmt_list.safe_grow (num_ssa_names);
+  m_stmt_list.truncate (0);
 
   // Ensure the not_executable flag is clear everywhere.
   if (flag_checking)
@@ -61,6 +64,11 @@  gimple_ranger::gimple_ranger () :
     }
 }
 
+gimple_ranger::~gimple_ranger ()
+{
+  m_stmt_list.release ();
+}
+
 bool
 gimple_ranger::range_of_expr (irange &r, tree expr, gimple *stmt)
 {
@@ -284,9 +292,10 @@  gimple_ranger::range_of_stmt (irange &r, gimple *s, tree name)
   else
     {
       bool current;
-      // Check if the stmt has already been processed, and is not stale.
+      // Check if the stmt has already been processed.
       if (m_cache.get_global_range (r, name, current))
 	{
+	  // If it isn't stale, use this cached value.
 	  if (current)
 	    {
 	      if (idx)
@@ -294,7 +303,10 @@  gimple_ranger::range_of_stmt (irange &r, gimple *s, tree name)
 	      return true;
 	    }
 	}
-      // Otherwise calculate a new value.
+      else
+	prefill_stmt_dependencies (name);
+
+      // Calculate a new value.
       int_range_max tmp;
       fold_range_internal (tmp, s, name);
 
@@ -311,6 +323,97 @@  gimple_ranger::range_of_stmt (irange &r, gimple *s, tree name)
   return res;
 }
 
+
+// Check if NAME is a dependency that needs resolving, and push it on the
+// stack if so.  R is a scratch range.
+
+inline void
+gimple_ranger::prefill_name (irange &r, tree name)
+{
+  if (!gimple_range_ssa_p (name))
+    return;
+  gimple *stmt = SSA_NAME_DEF_STMT (name);
+  if (!gimple_range_handler (stmt))
+    return;
+
+  bool current;
+  // If this op has not been processed yet, then push it on the stack
+  if (!m_cache.get_global_range (r, name, current))
+    m_stmt_list.safe_push (name);
+}
+
+// This routine will seed the global cache with most of the depnedencies of
+// NAME.  This prevents excessive call depth through the normal API.
+
+void
+gimple_ranger::prefill_stmt_dependencies (tree ssa)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (ssa))
+    return;
+
+  int_range_max r;
+  unsigned idx;
+  gimple *stmt = SSA_NAME_DEF_STMT (ssa);
+  gcc_checking_assert (stmt && gimple_bb (stmt));
+
+  // Only pre-process range-ops.
+  if (!gimple_range_handler (stmt))
+    return;
+
+  // Mark where on the stack we are starting.
+  unsigned start = m_stmt_list.length ();
+  m_stmt_list.safe_push (ssa);
+
+  idx = tracer.header ("ROS dependence fill\n");
+
+  // Loop until back at the start point.
+  while (m_stmt_list.length () > start)
+    {
+      tree name = m_stmt_list.last ();
+      // NULL is a marker which indicates the next name in the stack has now
+      // been fully resolved, so we can fold it.
+      if (!name)
+	{
+	  // Pop the NULL, then pop the name.
+	  m_stmt_list.pop ();
+	  name = m_stmt_list.pop ();
+	  // Don't fold initial request, it will be calculated upon return.
+	  if (m_stmt_list.length () > start)
+	    {
+	      // Fold and save the value for NAME.
+	      stmt = SSA_NAME_DEF_STMT (name);
+	      fold_range_internal (r, stmt, name);
+	      m_cache.set_global_range (name, r);
+	    }
+	  continue;
+	}
+
+      // Add marker indicating previous NAME in list should be folded
+      // when we get to this NULL.
+      m_stmt_list.safe_push (NULL_TREE);
+      stmt = SSA_NAME_DEF_STMT (name);
+
+      if (idx)
+	{
+	  tracer.print (idx, "ROS dep fill (");
+	  print_generic_expr (dump_file, name, TDF_SLIM);
+	  fputs (") at stmt ", dump_file);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+
+      gcc_checking_assert (gimple_range_handler (stmt));
+      tree op = gimple_range_operand2 (stmt);
+      if (op)
+	prefill_name (r, op);
+      op = gimple_range_operand1 (stmt);
+      if (op)
+	prefill_name (r, op);
+    }
+  if (idx)
+    tracer.trailer (idx, "ROS ", false, ssa, r);
+}
+
+
 // This routine will invoke the gimple fold_stmt routine, providing context to
 // range_of_expr calls via an private interal API.
 
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 615496ec9b8..c2923c58b27 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -47,6 +47,7 @@  class gimple_ranger : public range_query
 {
 public:
   gimple_ranger ();
+  ~gimple_ranger ();
   virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL) OVERRIDE;
   virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) OVERRIDE;
   virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE;
@@ -61,9 +62,12 @@  public:
   bool fold_stmt (gimple_stmt_iterator *gsi, tree (*) (tree));
 protected:
   bool fold_range_internal (irange &r, gimple *s, tree name);
+  void prefill_name (irange &r, tree name);
+  void prefill_stmt_dependencies (tree ssa);
   ranger_cache m_cache;
   range_tracer tracer;
   basic_block current_bb;
+  vec<tree> m_stmt_list;
 };
 
 /* Create a new ranger instance and associate it with a function.
-- 
2.17.2