Adjust backwards threader costing of PHIs

Message ID 20220805135803.8DAE2133B5@imap2.suse-dmz.suse.de
State New
Headers
Series Adjust backwards threader costing of PHIs |

Commit Message

Richard Biener Aug. 5, 2022, 1:58 p.m. UTC
  The following adjusts the costing of PHIs to match how I understand
the comment and maybe the original intent.  The will be no
non-degenerate PHI nodes remaining on the threaded path but when there
are alternate path exits PHI nodes at their destinations will likely
require extra copies on those edges and that's what we want to account
for.

Jeff added this code long time ago and at least the special-casing
of (just) m_name does not make sense anymore.

Unfortunately this tweaks heuristics enough to make the threader
thread an unfortunate path in libgomp/team.c so that
-Walloca-larger-than diagnoses an allocation of -1 elements.  I'm
not 100% sure this condition is impossible so I've added a guard
not allocating zero or "less" stack.  There's also an uninit
diagnostic in opts.cc about best_math::m_best_candidate_len that
looks like a false positive but all but this member are initialized
so the patch initializes this one as well to avoid this false
positive.

I have yet to analyze some fallout as well:

FAIL: gcc.dg/uninit-pred-9_b.c bogus warning (test for bogus messages, line 20)
FAIL: gcc.dg/tree-ssa/phi_on_compare-4.c scan-tree-dump-times dom2 "Removing bas
ic block" 1
FAIL: gcc.dg/tree-ssa/pr59597.c scan-tree-dump-times threadfull1 "Registering jump thread" 4
FAIL: gcc.dg/tree-ssa/pr61839_2.c scan-tree-dump-times evrp "%" 0
FAIL: gcc.dg/tree-ssa/pr61839_2.c scan-tree-dump-times evrp "972195717 % " 0
FAIL: gcc.dg/tree-ssa/pr61839_2.c scan-tree-dump-times evrp "972195717 / " 0
FAIL: gcc.dg/tree-ssa/pr61839_2.c scan-tree-dump-times evrp "__builtin_abort" 1

the early threading fails are because we now account for the mid-exit
PHI copies and early threading has a limit of a single copied insn.
But this testcase was never about threading but VRP which seemingly
regressed meanwhile...

Bootstrapped and tested (with the above FAILs) on 
x86_64-unknown-linux-gnu.

Any comments besides the FAILout?

Thanks,
Richard.

gcc/
	* tree-ssa-threadbackward.cc
	(back_threader_profitability::profitable_path_p): Rewrite
	the costing of PHIs to instead reflect the cost of alternate
	path exits.  Remove m_name argument.
	(back_threader::find_paths_to_names): Adjust.
	(back_threader::maybe_register_path): Likewise.
	* spellcheck.h (best_math::m_best_candidate_len): Initialize.

libgomp/
	* team.c (gomp_team_start): Guard alloca.
---
 gcc/spellcheck.h               |  3 +-
 gcc/tree-ssa-threadbackward.cc | 65 ++++++++++++----------------------
 libgomp/team.c                 |  5 +--
 3 files changed, 28 insertions(+), 45 deletions(-)
  

Comments

Jeff Law Aug. 5, 2022, 3:46 p.m. UTC | #1
On 8/5/2022 7:58 AM, Richard Biener wrote:
> The following adjusts the costing of PHIs to match how I understand
> the comment and maybe the original intent.  The will be no
> non-degenerate PHI nodes remaining on the threaded path but when there
> are alternate path exits PHI nodes at their destinations will likely
> require extra copies on those edges and that's what we want to account
> for.
>
> Jeff added this code long time ago and at least the special-casing
> of (just) m_name does not make sense anymore.
>
> Unfortunately this tweaks heuristics enough to make the threader
> thread an unfortunate path in libgomp/team.c so that
> -Walloca-larger-than diagnoses an allocation of -1 elements.  I'm
> not 100% sure this condition is impossible so I've added a guard
> not allocating zero or "less" stack.  There's also an uninit
> diagnostic in opts.cc about best_math::m_best_candidate_len that
> looks like a false positive but all but this member are initialized
> so the patch initializes this one as well to avoid this false
> positive.
>
> I have yet to analyze some fallout as well:
>
> FAIL: gcc.dg/uninit-pred-9_b.c bogus warning (test for bogus messages, line 20)
> FAIL: gcc.dg/tree-ssa/phi_on_compare-4.c scan-tree-dump-times dom2 "Removing bas
> ic block" 1
> FAIL: gcc.dg/tree-ssa/pr59597.c scan-tree-dump-times threadfull1 "Registering jump thread" 4
> FAIL: gcc.dg/tree-ssa/pr61839_2.c scan-tree-dump-times evrp "%" 0
> FAIL: gcc.dg/tree-ssa/pr61839_2.c scan-tree-dump-times evrp "972195717 % " 0
> FAIL: gcc.dg/tree-ssa/pr61839_2.c scan-tree-dump-times evrp "972195717 / " 0
> FAIL: gcc.dg/tree-ssa/pr61839_2.c scan-tree-dump-times evrp "__builtin_abort" 1
>
> the early threading fails are because we now account for the mid-exit
> PHI copies and early threading has a limit of a single copied insn.
> But this testcase was never about threading but VRP which seemingly
> regressed meanwhile...
>
> Bootstrapped and tested (with the above FAILs) on
> x86_64-unknown-linux-gnu.
>
> Any comments besides the FAILout?
I don't recall adding that code, but I did find it in the archives.

https://gcc.gnu.org/pipermail/gcc-patches/2016-March/443452.html

But even with that and reviewing the PR, I still don't remember much 
about this particular chunk of code.  I do recall that I was never 
actually happy with pr69196 state and that's why we've kept it open all 
these years.

It doesn't look like tree-ssa/pr69196 has regressed, so if you're happy 
with the patch, I've got no objections.

jeff
  

Patch

diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
index 3706de38a9d..1b76a54b678 100644
--- a/gcc/spellcheck.h
+++ b/gcc/spellcheck.h
@@ -95,7 +95,8 @@  class best_match
   : m_goal (goal_traits::get_string (goal)),
     m_goal_len (goal_traits::get_length (goal)),
     m_best_candidate (NULL),
-    m_best_distance (best_distance_so_far)
+    m_best_distance (best_distance_so_far),
+    m_best_candidate_len (0)
   {}
 
   /* Compare the edit distance between CANDIDATE and m_goal,
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 332a1d2a1dd..3377e648445 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -64,7 +64,7 @@  public:
   back_threader_profitability (bool speed_p)
     : m_speed_p (speed_p)
   { }
-  bool profitable_path_p (const vec<basic_block> &, tree name, edge taken,
+  bool profitable_path_p (const vec<basic_block> &, edge taken,
 			  bool *irreducible_loop = NULL);
 private:
   const bool m_speed_p;
@@ -241,7 +241,7 @@  back_threader::maybe_register_path ()
       else
 	{
 	  bool irreducible = false;
-	  if (m_profit.profitable_path_p (m_path, m_name, taken_edge,
+	  if (m_profit.profitable_path_p (m_path, taken_edge,
 					  &irreducible)
 	      && debug_counter ()
 	      && m_registry.register_path (m_path, taken_edge))
@@ -348,7 +348,7 @@  back_threader::find_paths_to_names (basic_block bb, bitmap interesting)
 
   // Try to resolve the path without looking back.
   if (m_path.length () > 1
-      && (!m_profit.profitable_path_p (m_path, m_name, NULL)
+      && (!m_profit.profitable_path_p (m_path, NULL)
 	  || maybe_register_path ()))
     ;
 
@@ -529,9 +529,6 @@  back_threader::debug ()
 /* Examine jump threading path PATH and return TRUE if it is profitable to
    thread it, otherwise return FALSE.
 
-   NAME is the SSA_NAME of the variable we found to have a constant
-   value on PATH.  If unknown, SSA_NAME is NULL.
-
    If the taken edge out of the path is known ahead of time it is passed in
    TAKEN_EDGE, otherwise it is NULL.
 
@@ -543,7 +540,6 @@  back_threader::debug ()
 
 bool
 back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
-						tree name,
 						edge taken_edge,
 						bool *creates_irreducible_loop)
 {
@@ -600,42 +596,27 @@  back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
       if (j < m_path.length () - 1)
 	{
 	  int orig_n_insns = n_insns;
-	  /* PHIs in the path will create degenerate PHIS in the
-	     copied path which will then get propagated away, so
-	     looking at just the duplicate path the PHIs would
-	     seem unimportant.
-
-	     But those PHIs, because they're assignments to objects
-	     typically with lives that exist outside the thread path,
-	     will tend to generate PHIs (or at least new PHI arguments)
-	     at points where we leave the thread path and rejoin
-	     the original blocks.  So we do want to account for them.
-
-	     We ignore virtual PHIs.  We also ignore cases where BB
-	     has a single incoming edge.  That's the most common
-	     degenerate PHI we'll see here.  Finally we ignore PHIs
-	     that are associated with the value we're tracking as
-	     that object likely dies.  */
-	  if (EDGE_COUNT (bb->succs) > 1 && EDGE_COUNT (bb->preds) > 1)
+	  /* If the path has exits other than the known taken_edge from
+	     block j == 0 then those will create new edges into the exit
+	     destination, increasing the number of PHI arguments there.
+	     Account for an extra copy there.
+	     PHI nodes on the path itself all become degenerate and
+	     propagate out on GIMPLE.  */
+	  if (j != 0 && EDGE_COUNT (bb->succs) > 1)
 	    {
-	      for (gphi_iterator gsip = gsi_start_phis (bb);
-		   !gsi_end_p (gsip);
-		   gsi_next (&gsip))
-		{
-		  gphi *phi = gsip.phi ();
-		  tree dst = gimple_phi_result (phi);
-
-		  /* Note that if both NAME and DST are anonymous
-		     SSA_NAMEs, then we do not have enough information
-		     to consider them associated.  */
-		  if (dst != name
-		      && name
-		      && TREE_CODE (name) == SSA_NAME
-		      && (SSA_NAME_VAR (dst) != SSA_NAME_VAR (name)
-			  || !SSA_NAME_VAR (dst))
-		      && !virtual_operand_p (dst))
-		    ++n_insns;
-		}
+	      edge_iterator ei;
+	      edge e;
+	      FOR_EACH_EDGE (e, ei, bb->succs)
+		if (e->dest != m_path[j - 1])
+		  for (gphi_iterator gsip = gsi_start_phis (e->dest);
+		       !gsi_end_p (gsip);
+		       gsi_next (&gsip))
+		    {
+		      gphi *phi = gsip.phi ();
+		      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+		      if (!virtual_operand_p (arg))
+			++n_insns;
+		    }
 	    }
 
 	  if (!contains_hot_bb && m_speed_p)
diff --git a/libgomp/team.c b/libgomp/team.c
index cb6875d70fa..89b263ef148 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -756,8 +756,9 @@  gomp_team_start (void (*fn) (void *), void *data, unsigned nthreads,
       attr = &thread_attr;
     }
 
-  start_data = gomp_alloca (sizeof (struct gomp_thread_start_data)
-			    * (nthreads - i));
+  if (i < nthreads)
+    start_data = gomp_alloca (sizeof (struct gomp_thread_start_data)
+			      * (nthreads - i));
 
   /* Launch new threads.  */
   for (; i < nthreads; ++i)