[2/2] tree-optimization/104530 - Mark defs dependent on non-null stale.

Message ID 16cb11c5-c46e-b0ae-2813-52f141414a41@redhat.com
State New
Headers
Series tree-optimization/104530 - proposed re-evaluation. |

Commit Message

Andrew MacLeod Feb. 22, 2022, 4:40 p.m. UTC
  This patch simply leverages the existing computation machinery to 
re-evaluate values dependent on a newly found non-null value

Ranger associates a monotonically increasing temporal value with every 
def as it is defined.  When that value is used, we check if any of the 
values used in the definition have been updated, making the current 
cached global value stale.  This makes the evaluation lazy, if there are 
no more uses, we will never re-evaluate.

When an ssa-name is marked non-null it does not change the global value, 
and thus will not invalidate any global values.  This patch marks any 
definitions in the block which are dependent on the non-null value as 
stale.  This will cause them to be re-evaluated when they are next used.

Imports: b.0_1  d.3_7
Exports: b.0_1  _2  _3  d.3_7  _8
          _2 : b.0_1(I)
          _3 : b.0_1(I)  _2
          _8 : b.0_1(I)  _2  _3  d.3_7(I)

    b.0_1 = b;
     _2 = b.0_1 == 0B;
     _3 = (int) _2;
     c = _3;
     _5 = *b.0_1;        <<-- from this point b.0_1 is [+1, +INF]
     a = _5;
     d.3_7 = d;
     _8 = _3 % d.3_7;
     if (_8 != 0)

when _5 is defined, and n.0_1 becomes non-null,  we mark the dependent 
names that are exports and defined in this block as stale.  so _2, _3 
and _8.

When _8 is being calculated, _3 is stale, and causes it to be 
recomputed.  it is dependent on _2, alsdo stale, so it is also 
recomputed, and we end up with

   _2 == [0, 0]
   _3 == [0 ,0]
and _8 = [0, 0]
And then we can fold away the condition.

The side effect is that _2 and _3 are globally changed to be [0, 0], but 
this is OK because it is the definition block, so it dominates all other 
uses of these names, and they should be [0,0] upon exit anyway.  The 
previous patch ensure that the global values written to 
SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3.

The patch would have been even smaller if I already had a mark_stale 
method.   I thought there was one, but I guess it never made it in from 
lack of need at the time.   The only other tweak was to make the value 
stale if the dependent value was the same as the definitions.

This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running 
to ensure.

OK for trunk? or defer to stage 1?
Andrew
  

Comments

Richard Biener Feb. 23, 2022, 7:33 a.m. UTC | #1
On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch simply leverages the existing computation machinery to
> re-evaluate values dependent on a newly found non-null value
>
> Ranger associates a monotonically increasing temporal value with every
> def as it is defined.  When that value is used, we check if any of the
> values used in the definition have been updated, making the current
> cached global value stale.  This makes the evaluation lazy, if there are
> no more uses, we will never re-evaluate.
>
> When an ssa-name is marked non-null it does not change the global value,
> and thus will not invalidate any global values.  This patch marks any
> definitions in the block which are dependent on the non-null value as
> stale.  This will cause them to be re-evaluated when they are next used.
>
> Imports: b.0_1  d.3_7
> Exports: b.0_1  _2  _3  d.3_7  _8
>           _2 : b.0_1(I)
>           _3 : b.0_1(I)  _2
>           _8 : b.0_1(I)  _2  _3  d.3_7(I)
>
>     b.0_1 = b;
>      _2 = b.0_1 == 0B;
>      _3 = (int) _2;
>      c = _3;
>      _5 = *b.0_1;        <<-- from this point b.0_1 is [+1, +INF]
>      a = _5;
>      d.3_7 = d;
>      _8 = _3 % d.3_7;
>      if (_8 != 0)
>
> when _5 is defined, and n.0_1 becomes non-null,  we mark the dependent
> names that are exports and defined in this block as stale.  so _2, _3
> and _8.
>
> When _8 is being calculated, _3 is stale, and causes it to be
> recomputed.  it is dependent on _2, alsdo stale, so it is also
> recomputed, and we end up with
>
>    _2 == [0, 0]
>    _3 == [0 ,0]
> and _8 = [0, 0]
> And then we can fold away the condition.
>
> The side effect is that _2 and _3 are globally changed to be [0, 0], but
> this is OK because it is the definition block, so it dominates all other
> uses of these names, and they should be [0,0] upon exit anyway.  The
> previous patch ensure that the global values written to
> SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3.
>
> The patch would have been even smaller if I already had a mark_stale
> method.   I thought there was one, but I guess it never made it in from
> lack of need at the time.   The only other tweak was to make the value
> stale if the dependent value was the same as the definitions.
>
> This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running
> to ensure.

@@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block
bb, tree name)
        {
          r.set_nonzero (type);
          m_on_entry.set_bb_range (name, bb, r);
+         // Mark consumers of name stale so they can be recomputed.
+         if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb))
+           {
+             tree x;
+             FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x)
+               if (m_gori.in_chain_p (name, x)
+                   && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb)
+                 m_temporal->set_stale (x);
+           }
        }

so if we have a BB that exports N names and each of those is updated to nonnull
this is going to be quadratic?  It also looks like the gimple_bb check
is cheaper
than the bitmap test done in in_chain_p.  What comes to my mind is why we need
to mark "consumers"?  Can't consumers check their uses defs when they look
at their timestamp?  This whole set_stale thing doesn't seem to be
transitive anyway,
consider:

   _1 = ...

<bb>
   _2 = _1 + ..;

<bb>
  _3 = _2 + ...;

so when _1 is updated to non-null we mark _2 as stale but _3 should
also be stale, no?
When we visit _3 before eventually getting to _2 (to see whether it
updates and thus
we more precisely we know if it makes _3 stale) we won't re-evaluate it?

That said, the change looks somewhat ad-hoc to get to 1-level deep second-level
opportunities?

Richard.

>
> OK for trunk? or defer to stage 1?
> Andrew
  
Andrew MacLeod Feb. 23, 2022, 3:54 p.m. UTC | #2
On 2/23/22 02:33, Richard Biener wrote:
> On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> This patch simply leverages the existing computation machinery to
>> re-evaluate values dependent on a newly found non-null value
>>
>> Ranger associates a monotonically increasing temporal value with every
>> def as it is defined.  When that value is used, we check if any of the
>> values used in the definition have been updated, making the current
>> cached global value stale.  This makes the evaluation lazy, if there are
>> no more uses, we will never re-evaluate.
>>
>> When an ssa-name is marked non-null it does not change the global value,
>> and thus will not invalidate any global values.  This patch marks any
>> definitions in the block which are dependent on the non-null value as
>> stale.  This will cause them to be re-evaluated when they are next used.
>>
>> Imports: b.0_1  d.3_7
>> Exports: b.0_1  _2  _3  d.3_7  _8
>>            _2 : b.0_1(I)
>>            _3 : b.0_1(I)  _2
>>            _8 : b.0_1(I)  _2  _3  d.3_7(I)
>>
>>      b.0_1 = b;
>>       _2 = b.0_1 == 0B;
>>       _3 = (int) _2;
>>       c = _3;
>>       _5 = *b.0_1;        <<-- from this point b.0_1 is [+1, +INF]
>>       a = _5;
>>       d.3_7 = d;
>>       _8 = _3 % d.3_7;
>>       if (_8 != 0)
>>
>> when _5 is defined, and n.0_1 becomes non-null,  we mark the dependent
>> names that are exports and defined in this block as stale.  so _2, _3
>> and _8.
>>
>> When _8 is being calculated, _3 is stale, and causes it to be
>> recomputed.  it is dependent on _2, alsdo stale, so it is also
>> recomputed, and we end up with
>>
>>     _2 == [0, 0]
>>     _3 == [0 ,0]
>> and _8 = [0, 0]
>> And then we can fold away the condition.
>>
>> The side effect is that _2 and _3 are globally changed to be [0, 0], but
>> this is OK because it is the definition block, so it dominates all other
>> uses of these names, and they should be [0,0] upon exit anyway.  The
>> previous patch ensure that the global values written to
>> SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3.
>>
>> The patch would have been even smaller if I already had a mark_stale
>> method.   I thought there was one, but I guess it never made it in from
>> lack of need at the time.   The only other tweak was to make the value
>> stale if the dependent value was the same as the definitions.
>>
>> This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running
>> to ensure.
> @@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block
> bb, tree name)
>          {
>            r.set_nonzero (type);
>            m_on_entry.set_bb_range (name, bb, r);
> +         // Mark consumers of name stale so they can be recomputed.
> +         if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb))
> +           {
> +             tree x;
> +             FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x)
> +               if (m_gori.in_chain_p (name, x)
> +                   && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb)
> +                 m_temporal->set_stale (x);
> +           }
>          }
>
> so if we have a BB that exports N names and each of those is updated to nonnull
> this is going to be quadratic?  It also looks like the gimple_bb check
> is cheaper
> than the bitmap test done in in_chain_p.  What comes to my mind is why we need
> to mark "consumers"?  Can't consumers check their uses defs when they look
> at their timestamp?  This whole set_stale thing doesn't seem to be

They do.  The timestamps only look at direct uses. Any use of _2 should 
look at the def and notice it is stale relative to b.0_1 automatically. 
We miss the opportunity in the example which uses _3 to compute _8.  _3 
is directly dependent on _2 whose def is not stale relative to _3, so we 
miss the transitive staleness via b.0_1.   This marks all the consumers 
whose calculation is derived from the now non-null value as stale.   
Within the block, it is fully transitive and anything potentially 
derived from NAME will be recalculated if it is used.  In old EVRP 
terms, it would be like updating the current value vector for any 
ssa-names derived from NAME when it becomes non-null, except it is done 
lazily.


> transitive anyway,
> consider:
>
>     _1 = ...
>
> <bb>
>     _2 = _1 + ..;
>
> <bb>
>    _3 = _2 + ...;
>
> so when _1 is updated to non-null we mark _2 as stale but _3 should
> also be stale, no?
> When we visit _3 before eventually getting to _2 (to see whether it
> updates and thus
> we more precisely we know if it makes _3 stale) we won't re-evaluate it?

> That said, the change looks somewhat ad-hoc to get to 1-level deep second-level
> opportunities?

The patch applies only to dom-walks, and primarily targets definitions 
in the current block that we have already seen that we now know are 
stale. It is one approach to applying non-null later in the same block 
without resorting to much of an algorithmic change.  It's not really 
intended to affect anything cross block as that is handled differently 
via the GORI engine.  It would provide better on-exit ranges in the 
definition block for some of the names involved.

That said, I'm not crazy about putting anything else into this release 
anyway, so if the regressions isn't serious enough, then I'd simply wait 
for the revamp of side-effects in the next release to deal with it.

Andrew
  
Jeff Law June 26, 2022, 6:23 p.m. UTC | #3
On 2/22/2022 9:40 AM, Andrew MacLeod via Gcc-patches wrote:
> This patch simply leverages the existing computation machinery to 
> re-evaluate values dependent on a newly found non-null value
>
> Ranger associates a monotonically increasing temporal value with every 
> def as it is defined.  When that value is used, we check if any of the 
> values used in the definition have been updated, making the current 
> cached global value stale.  This makes the evaluation lazy, if there 
> are no more uses, we will never re-evaluate.
>
> When an ssa-name is marked non-null it does not change the global 
> value, and thus will not invalidate any global values.  This patch 
> marks any definitions in the block which are dependent on the non-null 
> value as stale.  This will cause them to be re-evaluated when they are 
> next used.
>
> Imports: b.0_1  d.3_7
> Exports: b.0_1  _2  _3  d.3_7  _8
>          _2 : b.0_1(I)
>          _3 : b.0_1(I)  _2
>          _8 : b.0_1(I)  _2  _3  d.3_7(I)
>
>    b.0_1 = b;
>     _2 = b.0_1 == 0B;
>     _3 = (int) _2;
>     c = _3;
>     _5 = *b.0_1;        <<-- from this point b.0_1 is [+1, +INF]
>     a = _5;
>     d.3_7 = d;
>     _8 = _3 % d.3_7;
>     if (_8 != 0)
>
> when _5 is defined, and n.0_1 becomes non-null,  we mark the dependent 
> names that are exports and defined in this block as stale.  so _2, _3 
> and _8.
>
> When _8 is being calculated, _3 is stale, and causes it to be 
> recomputed.  it is dependent on _2, alsdo stale, so it is also 
> recomputed, and we end up with
>
>   _2 == [0, 0]
>   _3 == [0 ,0]
> and _8 = [0, 0]
> And then we can fold away the condition.
>
> The side effect is that _2 and _3 are globally changed to be [0, 0], 
> but this is OK because it is the definition block, so it dominates all 
> other uses of these names, and they should be [0,0] upon exit anyway.  
> The previous patch ensure that the global values written to 
> SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3.
>
> The patch would have been even smaller if I already had a mark_stale 
> method.   I thought there was one, but I guess it never made it in 
> from lack of need at the time.   The only other tweak was to make the 
> value stale if the dependent value was the same as the definitions.
>
> This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running 
> to ensure.
>
> OK for trunk? or defer to stage 1?
Seems reasonable now that we're in stage1.  Obviously given the time 
between original posting and now you should probably bootstrap and 
regression test it again.

Jeff
  

Patch

From a7e4e5f04899817cacc3ebe5cc3ff2d489489309 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Tue, 22 Feb 2022 09:58:00 -0500
Subject: [PATCH 2/2] Mark defs dependent on non-null stale.

When a name is marked as non-null, find all exports from the block, and mark their timestamp as stale. Any following use of the name will trigger a recomputaion using the new non-null range.

	PR tree-optimization/104530
	gcc/
	* gimple-range-cache.cc (temporal_cache::set_stale): New.
	(temporal_cache::current_p): Identical timestamp is not current.
	(ranger_cache::update_to_nonnull): Mark any export defined in this
	block stale if it is dependent on this name.

	gcc/testsuite/
	* gcc.dg/pr104530.c: New.
---
 gcc/gimple-range-cache.cc       | 26 ++++++++++++++++++++++++--
 gcc/testsuite/gcc.dg/pr104530.c | 17 +++++++++++++++++
 2 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr104530.c

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 613135266a4..debc93767a9 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -696,6 +696,7 @@  public:
   bool current_p (tree name, tree dep1, tree dep2) const;
   void set_timestamp (tree name);
   void set_always_current (tree name);
+  void set_stale (tree name);
 private:
   unsigned temporal_value (unsigned ssa) const;
 
@@ -740,9 +741,9 @@  temporal_cache::current_p (tree name, tree dep1, tree dep2) const
   // Any non-registered dependencies will have a value of 0 and thus be older.
   // Return true if time is newer than either dependent.
 
-  if (dep1 && ts < temporal_value (SSA_NAME_VERSION (dep1)))
+  if (dep1 && ts <= temporal_value (SSA_NAME_VERSION (dep1)))
     return false;
-  if (dep2 && ts < temporal_value (SSA_NAME_VERSION (dep2)))
+  if (dep2 && ts <= temporal_value (SSA_NAME_VERSION (dep2)))
     return false;
 
   return true;
@@ -759,6 +760,18 @@  temporal_cache::set_timestamp (tree name)
   m_timestamp[v] = ++m_current_time;
 }
 
+// Mark a NAME as stale by marking the timestamp as oldest, unless it is
+// already "always current".
+
+inline void
+temporal_cache::set_stale (tree name)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  if (v >= m_timestamp.length () || m_timestamp[v] == 0)
+    return;
+  m_timestamp[v] = 1;
+}
+
 // Set the timestamp to 0, marking it as "always up to date".
 
 inline void
@@ -1475,6 +1488,15 @@  ranger_cache::update_to_nonnull (basic_block bb, tree name)
 	{
 	  r.set_nonzero (type);
 	  m_on_entry.set_bb_range (name, bb, r);
+	  // Mark consumers of name stale so they can be recomputed.
+	  if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb))
+	    {
+	      tree x;
+	      FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x)
+		if (m_gori.in_chain_p (name, x)
+		    && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb)
+		  m_temporal->set_stale (x);
+	    }
 	}
     }
 }
diff --git a/gcc/testsuite/gcc.dg/pr104530.c b/gcc/testsuite/gcc.dg/pr104530.c
new file mode 100644
index 00000000000..9adedc5e5f9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104530.c
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+void foo(void);
+
+static int a, *b = &a, c, d = 1;
+
+int main() {
+    c = 0 == b;
+    a = *b;
+    if (c % d)
+        for (; d; --d)
+            foo();
+    b = 0;
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "evrp" } } */
-- 
2.17.2