[pushed] tree-optimization/106514 - revisit m_import compute in backward threading

Message ID 20220811130109.B6B3D3858429@sourceware.org
State Committed
Commit 16b013c9d9b4d950f89821476e791bf18c1295df
Headers
Series [pushed] tree-optimization/106514 - revisit m_import compute in backward threading |

Commit Message

Richard Biener Aug. 11, 2022, 1 p.m. UTC
  FYI - this is what I pushed after re-testing.  Changed from the
previous patch is mostly the way imports are computed, see the
separate patch to fixup path_range_query::compute_imports.

The XFAILed testcase from the previous change is un-XFAILed with
this and I added a new testcase covering the difference wrt
GORI import computes.

Statistics on this patch look good.

Richard.

--

This revisits how we compute imports later used for the ranger path
query during backwards threading.  The compute_imports function
of the path solver ends up pulling the SSA def chain of regular
stmts without limit and since it starts with just the gori imports
of the path exit it misses some interesting names to translate
during path discovery.  In fact with a still empty path this
compute_imports function looks like not the correct tool.

The following instead implements what it does during the path discovery
and since we add the exit block we seed the initial imports and
interesting names from just the exit conditional.  When we then
process interesting names (aka imports we did not yet see the definition
of) we prune local defs but add their uses in a similar way as
compute_imports would have done.

compute_imports also is lacking in its walking of the def chain
compared to range_def_chain::get_def_chain which for example
handles &_1->x specially through range_op_handler and
gimple_range_operand1, so the code copies this.  A fix for
compute_imports will be done separately, also fixing the unbound
walk there.

The patch also properly unwinds m_imports during the path discovery
backtracking and from a debugging session I have verified the two
sets evolve as expected now while previously behaving slightly erratic.

Fortunately the m_imports set now also is shrunken significantly for
the PR69592 testcase (aka PR106514) so that there's overall speedup
when increasing --param max-jump-thread-duplication-stmts as
15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
1s -> 2s -> 4s -> 8s.

This runs into a latent issue in X which doesn't seem to expect
any PHI nodes with a constant argument on an edge inside the path.
But we now have those as interesting, for example for the ICEing
g++.dg/torture/pr100925.C which just has sth like

  if (i)
    x = 1;
  else
    x = 5;
  if (x == 1)
    ...

where we now have the path from if (i) to if (x) and the PHI for x
in the set of imports to consider for resolving x == 1 which IMHO
looks exactly like what we want.  The path_range_query::ssa_range_in_phi
papers over the issue and drops the range to varying instead of
crashing.  I didn't want to mess with this any further in this patch
(but I couldn't resist replacing the loop over PHI args with
PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).

	PR tree-optimization/106514
	* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
	Compute and unwind both m_imports and interesting on the fly during
	path discovery.
	(back_threader::find_paths): Compute the original m_imports
	from just the SSA uses of the exit conditional.  Drop
	handling single_succ_to_potentially_threadable_block.
	* gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
	constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.

	* gcc.dg/tree-ssa/ssa-thread-19.c: Un-XFAIL.
	* gcc.dg/tree-ssa/ssa-thread-20.c: New testcase.
---
 gcc/gimple-range-path.cc                      |  52 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-20.c |  15 +++
 gcc/tree-ssa-threadbackward.cc                | 115 ++++++++++++++----
 4 files changed, 131 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-20.c
  

Patch

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 389faec260c..5ae374df3a2 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -276,8 +276,6 @@  void
 path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
 {
   tree name = gimple_phi_result (phi);
-  basic_block bb = gimple_bb (phi);
-  unsigned nargs = gimple_phi_num_args (phi);
 
   if (at_entry ())
     {
@@ -287,6 +285,7 @@  path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       // Try to fold the phi exclusively with global or cached values.
       // This will get things like PHI <5(99), 6(88)>.  We do this by
       // calling range_of_expr with no context.
+      unsigned nargs = gimple_phi_num_args (phi);
       Value_Range arg_range (TREE_TYPE (name));
       r.set_undefined ();
       for (size_t i = 0; i < nargs; ++i)
@@ -303,36 +302,31 @@  path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       return;
     }
 
+  basic_block bb = gimple_bb (phi);
   basic_block prev = prev_bb ();
   edge e_in = find_edge (prev, bb);
-
-  for (size_t i = 0; i < nargs; ++i)
-    if (e_in == gimple_phi_arg_edge (phi, i))
-      {
-	tree arg = gimple_phi_arg_def (phi, i);
-	// Avoid using the cache for ARGs defined in this block, as
-	// that could create an ordering problem.
-	if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
-	  {
-	    if (m_resolve)
-	      {
-		Value_Range tmp (TREE_TYPE (name));
-		// Using both the range on entry to the path, and the
-		// range on this edge yields significantly better
-		// results.
-		if (defined_outside_path (arg))
-		  range_on_path_entry (r, arg);
-		else
-		  r.set_varying (TREE_TYPE (name));
-		m_ranger->range_on_edge (tmp, e_in, arg);
-		r.intersect (tmp);
-		return;
-	      }
+  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in);
+  // Avoid using the cache for ARGs defined in this block, as
+  // that could create an ordering problem.
+  if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
+    {
+      if (m_resolve)
+	{
+	  Value_Range tmp (TREE_TYPE (name));
+	  // Using both the range on entry to the path, and the
+	  // range on this edge yields significantly better
+	  // results.
+	  if (TREE_CODE (arg) == SSA_NAME
+	      && defined_outside_path (arg))
+	    range_on_path_entry (r, arg);
+	  else
 	    r.set_varying (TREE_TYPE (name));
-	  }
-	return;
-      }
-  gcc_unreachable ();
+	  m_ranger->range_on_edge (tmp, e_in, arg);
+	  r.intersect (tmp);
+	  return;
+	}
+      r.set_varying (TREE_TYPE (name));
+    }
 }
 
 // If NAME is defined in BB, set R to the range of NAME, and return
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c
index 58a872b8d25..62912f372de 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c
@@ -30,4 +30,4 @@  int foo (struct S *chain, _Bool is_ctor, _Bool is_dtor)
 /* We want to thread both paths from A with NULL chain to C, the one through
    B and one around it.
    ???  Ideally we'd thread one "path" containing the half-diamond with B.  */
-/* { dg-final { scan-tree-dump "Jumps threaded: 2" "threadfull1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 2" "threadfull1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-20.c
new file mode 100644
index 00000000000..c6529659021
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-20.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-ethread-stats" } */
+
+struct S { int base; };
+void foo (struct S *p)
+{
+  if (p)
+    {
+      int *q = &p->base;
+      if (q)
+        __builtin_puts ("x");
+    }
+}
+
+/* { dg-final { scan-tree-dump "Jumps threaded: 1" "ethread" } } */
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 6585a30551d..546a776bd91 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -362,32 +362,85 @@  back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
     {
       // For further greedy searching we want to remove interesting
       // names defined in BB but add ones on the PHI edges for the
-      // respective edges.  We do this by starting with all names
+      // respective edges and adding imports from those stmts.
+      // We do this by starting with all names
       // not defined in BB as interesting, collecting a list of
       // interesting PHIs in BB on the fly.  Then we iterate over
       // predecessor edges, adding interesting PHI edge defs to
       // the set of interesting names to consider when processing it.
       auto_bitmap new_interesting;
+      auto_vec<int, 16> new_imports;
       auto_vec<gphi *, 4> interesting_phis;
       bitmap_iterator bi;
       unsigned i;
+      auto_vec<tree, 16> worklist;
       EXECUTE_IF_SET_IN_BITMAP (interesting, 0, i, bi)
 	{
 	  tree name = ssa_name (i);
 	  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+	  /* Imports remain interesting.  */
 	  if (gimple_bb (def_stmt) != bb)
-	    bitmap_set_bit (new_interesting, i);
-	  else if (gphi *phi = dyn_cast<gphi *> (def_stmt))
 	    {
-	      tree res = gimple_phi_result (phi);
-	      if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
-		interesting_phis.safe_push (phi);
+	      bitmap_set_bit (new_interesting, i);
+	      continue;
+	    }
+	  worklist.quick_push (name);
+	  while (!worklist.is_empty ())
+	    {
+	      tree name = worklist.pop ();
+	      gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+	      /* Newly discovered imports are interesting.  */
+	      if (gimple_bb (def_stmt) != bb)
+		{
+		  bitmap_set_bit (new_interesting, SSA_NAME_VERSION (name));
+		  continue;
+		}
+	      /* Local PHIs participate in renaming below.  */
+	      if (gphi *phi = dyn_cast<gphi *> (def_stmt))
+		{
+		  tree res = gimple_phi_result (phi);
+		  if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
+		    interesting_phis.safe_push (phi);
+		}
+	      /* For other local defs process their uses, amending
+		 imports on the way.  */
+	      else if (gassign *ass = dyn_cast <gassign *> (def_stmt))
+		{
+		  tree ssa[3];
+		  if (range_op_handler (ass))
+		    {
+		      ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
+		      ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
+		      ssa[2] = NULL_TREE;
+		    }
+		  else if (gimple_assign_rhs_code (ass) == COND_EXPR)
+		    {
+		      ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
+		      ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
+		      ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
+		    }
+		  else
+		    continue;
+		  for (unsigned j = 0; j < 3; ++j)
+		    {
+		      tree rhs = ssa[j];
+		      if (rhs
+			  && TREE_CODE (rhs) == SSA_NAME
+			  && bitmap_set_bit (m_imports,
+					     SSA_NAME_VERSION (rhs)))
+			{
+			  new_imports.safe_push (SSA_NAME_VERSION (rhs));
+			  worklist.safe_push (rhs);
+			}
+		    }
+		}
 	    }
 	}
       if (!bitmap_empty_p (new_interesting)
 	  || !interesting_phis.is_empty ())
 	{
-	  auto_vec<tree, 4> unwind (interesting_phis.length ());
+	  auto_vec<int, 4> unwind (interesting_phis.length ());
+	  auto_vec<int, 4> imports_unwind (interesting_phis.length ());
 	  edge_iterator iter;
 	  edge e;
 	  FOR_EACH_EDGE (e, iter, bb->preds)
@@ -405,22 +458,31 @@  back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
 		{
 		  tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
 		  if (TREE_CODE (def) == SSA_NAME)
-		    if (bitmap_set_bit (new_interesting,
-					SSA_NAME_VERSION (def)))
-		      {
-			bitmap_set_bit (m_imports, SSA_NAME_VERSION (def));
-			unwind.quick_push (def);
-		      }
+		    {
+		      int ver = SSA_NAME_VERSION (def);
+		      if (bitmap_set_bit (new_interesting, ver))
+			{
+			  if (bitmap_set_bit (m_imports, ver))
+			    imports_unwind.quick_push (ver);
+			  unwind.quick_push (ver);
+			}
+		    }
 		}
 	      find_paths_to_names (e->src, new_interesting, overall_paths);
-	      // Restore new_interesting.  We leave m_imports alone since
-	      // we do not prune defs in BB from it and separately keeping
-	      // track of which bits to unwind isn't worth the trouble.
-	      for (tree def : unwind)
-		bitmap_clear_bit (new_interesting, SSA_NAME_VERSION (def));
+	      // Restore new_interesting.
+	      for (int def : unwind)
+		bitmap_clear_bit (new_interesting, def);
 	      unwind.truncate (0);
+	      // Restore and m_imports.
+	      for (int def : imports_unwind)
+		bitmap_clear_bit (m_imports, def);
+	      imports_unwind.truncate (0);
 	    }
 	}
+      /* m_imports tracks all interesting names on the path, so when
+	 backtracking we have to restore it.  */
+      for (int j : new_imports)
+	bitmap_clear_bit (m_imports, j);
     }
   else if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "  FAIL: Search space limit %d reached.\n",
@@ -444,17 +506,24 @@  back_threader::find_paths (basic_block bb, tree name)
 	  && gimple_code (stmt) != GIMPLE_SWITCH))
     return;
 
-  if (EDGE_COUNT (bb->succs) > 1
-      || single_succ_to_potentially_threadable_block (bb))
+  if (EDGE_COUNT (bb->succs) > 1)
     {
       m_last_stmt = stmt;
       m_visited_bbs.empty ();
       m_path.truncate (0);
       m_name = name;
-      m_path.safe_push (bb);
-      m_solver->compute_imports (m_imports, m_path);
-      m_path.pop ();
 
+      // We compute imports of the path during discovery starting
+      // just with names used in the conditional.
+      bitmap_clear (m_imports);
+      ssa_op_iter iter;
+      FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
+	bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
+
+      // Interesting is the set of imports we still not have see
+      // the definition of.  So while imports only grow, the
+      // set of interesting defs dwindles and once empty we can
+      // stop searching.
       auto_bitmap interesting;
       bitmap_copy (interesting, m_imports);
       find_paths_to_names (bb, interesting, 1);