[v2,2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

Message ID 20210921225429.326017-3-iii@linux.ibm.com
State New
Headers
Series reassoc: Propagate PHI_LOOP_BIAS along single uses |

Commit Message

Ilya Leoshkevich Sept. 21, 2021, 10:54 p.m. UTC
  PR tree-optimization/49749 introduced code that shortens dependency
chains containing loop accumulators by placing them last on operand
lists of associative operations.

456.hmmer benchmark on s390 could benefit from this, however, the code
that needs it modifies loop accumulator before using it, and since only
so-called loop-carried phis are are treated as loop accumulators, the
code in the present form doesn't really help.   According to Bill
Schmidt - the original author - such a conservative approach was chosen
so as to avoid unnecessarily swapping operands, which might cause
unpredictable effects.  However, giving special treatment to forms of
loop accumulators is acceptable.

The definition of loop-carried phi is: it's a single-use phi, which is
used in the same innermost loop it's defined in, at least one argument
of which is defined in the same innermost loop as the phi itself.
Given this, it seems natural to treat single uses of such phis as phis
themselves.

gcc/ChangeLog:

	* tree-ssa-reassoc.c (biased_names): New global.
	(propagate_bias_p): New function.
	(loop_carried_phi): Remove.
	(propagate_rank): Propagate bias along single uses.
	(get_rank): Update biased_names when needed.
---
 gcc/tree-ssa-reassoc.c | 97 ++++++++++++++++++++++++++++--------------
 1 file changed, 64 insertions(+), 33 deletions(-)
  

Comments

Richard Biener Sept. 23, 2021, 11:55 a.m. UTC | #1
On Wed, 22 Sep 2021, Ilya Leoshkevich wrote:

> PR tree-optimization/49749 introduced code that shortens dependency
> chains containing loop accumulators by placing them last on operand
> lists of associative operations.
> 
> 456.hmmer benchmark on s390 could benefit from this, however, the code
> that needs it modifies loop accumulator before using it, and since only
> so-called loop-carried phis are are treated as loop accumulators, the
> code in the present form doesn't really help.   According to Bill
> Schmidt - the original author - such a conservative approach was chosen
> so as to avoid unnecessarily swapping operands, which might cause
> unpredictable effects.  However, giving special treatment to forms of
> loop accumulators is acceptable.
> 
> The definition of loop-carried phi is: it's a single-use phi, which is
> used in the same innermost loop it's defined in, at least one argument
> of which is defined in the same innermost loop as the phi itself.
> Given this, it seems natural to treat single uses of such phis as phis
> themselves.
> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-reassoc.c (biased_names): New global.
> 	(propagate_bias_p): New function.
> 	(loop_carried_phi): Remove.
> 	(propagate_rank): Propagate bias along single uses.
> 	(get_rank): Update biased_names when needed.
> ---
>  gcc/tree-ssa-reassoc.c | 97 ++++++++++++++++++++++++++++--------------
>  1 file changed, 64 insertions(+), 33 deletions(-)
> 
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index 420c14e8cf5..2f7a8882aac 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -211,6 +211,10 @@ static int64_t *bb_rank;
>  /* Operand->rank hashtable.  */
>  static hash_map<tree, int64_t> *operand_rank;
>  
> +/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be
> +   biased.  */
> +static auto_bitmap biased_names;
> +
>  /* Vector of SSA_NAMEs on which after reassociate_bb is done with
>     all basic blocks the CFG should be adjusted - basic blocks
>     split right after that SSA_NAME's definition statement and before
> @@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi)
>     the rank difference between two blocks.  */
>  #define PHI_LOOP_BIAS (1 << 15)
>  
> +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's
> +   operands to the STMT's left-hand side.  The goal is to preserve bias in code
> +   like this:
> +
> +     x_1 = phi(x_0, x_2)
> +     a = x_1 | 1
> +     b = a ^ 2
> +     .MEM = b
> +     c = b + d
> +     x_2 = c + e
> +
> +   That is, we need to preserve bias along single-use chains originating from
> +   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be
> +   uses, because only they participate in rank propagation.  */
> +static bool
> +propagate_bias_p (gimple *stmt)
> +{
> +  use_operand_p use;
> +  imm_use_iterator use_iter;
> +  gimple *single_use_stmt = NULL;
> +
> +  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
> +    {
> +      gimple *current_use_stmt = USE_STMT (use);
> +
> +      if (is_gimple_assign (current_use_stmt)
> +	  && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME)
> +	{
> +	  if (single_use_stmt != NULL)

what if single_use_stmt == current_use_stmt?  We might have two
uses on a stmt after all - should that still be biased?  I guess not
and thus the check is correct?

> +	    return false;
> +	  single_use_stmt = current_use_stmt;
> +	}
> +    }
> +
> +  if (single_use_stmt == NULL)
> +    return false;
> +
> +  if (gimple_bb (stmt)->loop_father
> +      != gimple_bb (single_use_stmt)->loop_father)
> +    return false;
> +
> +  return true;
> +}
> +
>  /* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
>     an innermost loop, and the phi has only a single use which is inside
>     the loop, then the rank is the block rank of the loop latch plus an
> @@ -313,46 +361,23 @@ phi_rank (gimple *stmt)
>    return bb_rank[bb->index];
>  }
>  
> -/* If EXP is an SSA_NAME defined by a PHI statement that represents a
> -   loop-carried dependence of an innermost loop, return TRUE; else
> -   return FALSE.  */
> -static bool
> -loop_carried_phi (tree exp)
> -{
> -  gimple *phi_stmt;
> -  int64_t block_rank;
> -
> -  if (TREE_CODE (exp) != SSA_NAME
> -      || SSA_NAME_IS_DEFAULT_DEF (exp))
> -    return false;
> -
> -  phi_stmt = SSA_NAME_DEF_STMT (exp);
> -
> -  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
> -    return false;
> -
> -  /* Non-loop-carried phis have block rank.  Loop-carried phis have
> -     an additional bias added in.  If this phi doesn't have block rank,
> -     it's biased and should not be propagated.  */
> -  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
> -
> -  if (phi_rank (phi_stmt) != block_rank)
> -    return true;
> -
> -  return false;
> -}
> -
>  /* Return the maximum of RANK and the rank that should be propagated
>     from expression OP.  For most operands, this is just the rank of OP.
>     For loop-carried phis, the value is zero to avoid undoing the bias
>     in favor of the phi.  */
>  static int64_t
> -propagate_rank (int64_t rank, tree op)
> +propagate_rank (int64_t rank, tree op, gimple *stmt, bool *bias_p)
>  {
>    int64_t op_rank;
>  
> -  if (loop_carried_phi (op))
> -    return rank;
> +  if (TREE_CODE (op) == SSA_NAME
> +      && bitmap_bit_p (biased_names, SSA_NAME_VERSION (op)))
> +    {
> +      if (propagate_bias_p (stmt))
> +	*bias_p = true;
> +      else
> +	return rank;
> +    }
>  
>    op_rank = get_rank (op);
>  
> @@ -440,6 +465,8 @@ get_rank (tree e)
>  
>        else
>  	{
> +	  bool bias_p = false;
> +
>  	  /* Otherwise, find the maximum rank for the operands.  As an
>  	     exception, remove the bias from loop-carried phis when propagating
>  	     the rank so that dependent operations are not also biased.  */
> @@ -448,9 +475,12 @@ get_rank (tree e)
>  	     thus have rank 0.  */
>  	  rank = 0;
>  	  FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
> -	    rank = propagate_rank (rank, op);
> +	    rank = propagate_rank (rank, op, stmt, &bias_p);

It looks like if (propagate_bias_p (stmt)) is loop invariant here
and so when we inline the head this should simplify, avoiding
the new parameters to propagate_rank?

Otherwise looks good to me.

Richard.
  
Ilya Leoshkevich Sept. 24, 2021, 10:45 a.m. UTC | #2
On Thu, 2021-09-23 at 13:55 +0200, Richard Biener wrote:
> On Wed, 22 Sep 2021, Ilya Leoshkevich wrote:
> 
> > PR tree-optimization/49749 introduced code that shortens dependency
> > chains containing loop accumulators by placing them last on operand
> > lists of associative operations.
> > 
> > 456.hmmer benchmark on s390 could benefit from this, however, the
> > code
> > that needs it modifies loop accumulator before using it, and since
> > only
> > so-called loop-carried phis are are treated as loop accumulators,
> > the
> > code in the present form doesn't really help.   According to Bill
> > Schmidt - the original author - such a conservative approach was
> > chosen
> > so as to avoid unnecessarily swapping operands, which might cause
> > unpredictable effects.  However, giving special treatment to forms
> > of
> > loop accumulators is acceptable.
> > 
> > The definition of loop-carried phi is: it's a single-use phi, which
> > is
> > used in the same innermost loop it's defined in, at least one
> > argument
> > of which is defined in the same innermost loop as the phi itself.
> > Given this, it seems natural to treat single uses of such phis as
> > phis
> > themselves.
> > 
> > gcc/ChangeLog:
> > 
> >         * tree-ssa-reassoc.c (biased_names): New global.
> >         (propagate_bias_p): New function.
> >         (loop_carried_phi): Remove.
> >         (propagate_rank): Propagate bias along single uses.
> >         (get_rank): Update biased_names when needed.
> > ---
> >  gcc/tree-ssa-reassoc.c | 97 ++++++++++++++++++++++++++++----------
> > ----
> >  1 file changed, 64 insertions(+), 33 deletions(-)
> > 
> > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> > index 420c14e8cf5..2f7a8882aac 100644
> > --- a/gcc/tree-ssa-reassoc.c
> > +++ b/gcc/tree-ssa-reassoc.c
> > @@ -211,6 +211,10 @@ static int64_t *bb_rank;
> >  /* Operand->rank hashtable.  */
> >  static hash_map<tree, int64_t> *operand_rank;
> >  
> > +/* SSA_NAMEs that are forms of loop accumulators and whose ranks
> > need to be
> > +   biased.  */
> > +static auto_bitmap biased_names;
> > +
> >  /* Vector of SSA_NAMEs on which after reassociate_bb is done with
> >     all basic blocks the CFG should be adjusted - basic blocks
> >     split right after that SSA_NAME's definition statement and
> > before
> > @@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator
> > *gsi)
> >     the rank difference between two blocks.  */
> >  #define PHI_LOOP_BIAS (1 << 15)
> >  
> > +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of
> > the STMT's
> > +   operands to the STMT's left-hand side.  The goal is to preserve
> > bias in code
> > +   like this:
> > +
> > +     x_1 = phi(x_0, x_2)
> > +     a = x_1 | 1
> > +     b = a ^ 2
> > +     .MEM = b
> > +     c = b + d
> > +     x_2 = c + e
> > +
> > +   That is, we need to preserve bias along single-use chains
> > originating from
> > +   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are
> > considered to be
> > +   uses, because only they participate in rank propagation.  */
> > +static bool
> > +propagate_bias_p (gimple *stmt)
> > +{
> > +  use_operand_p use;
> > +  imm_use_iterator use_iter;
> > +  gimple *single_use_stmt = NULL;
> > +
> > +  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
> > +    {
> > +      gimple *current_use_stmt = USE_STMT (use);
> > +
> > +      if (is_gimple_assign (current_use_stmt)
> > +         && TREE_CODE (gimple_assign_lhs (current_use_stmt)) ==
> > SSA_NAME)
> > +       {
> > +         if (single_use_stmt != NULL)
> 
> what if single_use_stmt == current_use_stmt?  We might have two
> uses on a stmt after all - should that still be biased?  I guess not
> and thus the check is correct?

Come to think of it, it should be ok to bias it.  Things like
x = x + x are fine (this particular case can be transformed into
something else earlier, but I think the overall point still holds).
> 
> > +           return false;
> > +         single_use_stmt = current_use_stmt;
> > +       }
> > +    }
> > +
> > +  if (single_use_stmt == NULL)
> > +    return false;
> > +
> > +  if (gimple_bb (stmt)->loop_father
> > +      != gimple_bb (single_use_stmt)->loop_father)
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> >  /* Rank assigned to a phi statement.  If STMT is a loop-carried
> > phi of
> >     an innermost loop, and the phi has only a single use which is
> > inside
> >     the loop, then the rank is the block rank of the loop latch
> > plus an
> > @@ -313,46 +361,23 @@ phi_rank (gimple *stmt)
> >    return bb_rank[bb->index];
> >  }
> >  
> > -/* If EXP is an SSA_NAME defined by a PHI statement that
> > represents a
> > -   loop-carried dependence of an innermost loop, return TRUE; else
> > -   return FALSE.  */
> > -static bool
> > -loop_carried_phi (tree exp)
> > -{
> > -  gimple *phi_stmt;
> > -  int64_t block_rank;
> > -
> > -  if (TREE_CODE (exp) != SSA_NAME
> > -      || SSA_NAME_IS_DEFAULT_DEF (exp))
> > -    return false;
> > -
> > -  phi_stmt = SSA_NAME_DEF_STMT (exp);
> > -
> > -  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
> > -    return false;
> > -
> > -  /* Non-loop-carried phis have block rank.  Loop-carried phis
> > have
> > -     an additional bias added in.  If this phi doesn't have block
> > rank,
> > -     it's biased and should not be propagated.  */
> > -  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
> > -
> > -  if (phi_rank (phi_stmt) != block_rank)
> > -    return true;
> > -
> > -  return false;
> > -}
> > -
> >  /* Return the maximum of RANK and the rank that should be
> > propagated
> >     from expression OP.  For most operands, this is just the rank
> > of OP.
> >     For loop-carried phis, the value is zero to avoid undoing the
> > bias
> >     in favor of the phi.  */
> >  static int64_t
> > -propagate_rank (int64_t rank, tree op)
> > +propagate_rank (int64_t rank, tree op, gimple *stmt, bool *bias_p)
> >  {
> >    int64_t op_rank;
> >  
> > -  if (loop_carried_phi (op))
> > -    return rank;
> > +  if (TREE_CODE (op) == SSA_NAME
> > +      && bitmap_bit_p (biased_names, SSA_NAME_VERSION (op)))
> > +    {
> > +      if (propagate_bias_p (stmt))
> > +       *bias_p = true;
> > +      else
> > +       return rank;
> > +    }
> >  
> >    op_rank = get_rank (op);
> >  
> > @@ -440,6 +465,8 @@ get_rank (tree e)
> >  
> >        else
> >         {
> > +         bool bias_p = false;
> > +
> >           /* Otherwise, find the maximum rank for the operands.  As
> > an
> >              exception, remove the bias from loop-carried phis when
> > propagating
> >              the rank so that dependent operations are not also
> > biased.  */
> > @@ -448,9 +475,12 @@ get_rank (tree e)
> >              thus have rank 0.  */
> >           rank = 0;
> >           FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
> > -           rank = propagate_rank (rank, op);
> > +           rank = propagate_rank (rank, op, stmt, &bias_p);
> 
> It looks like if (propagate_bias_p (stmt)) is loop invariant here
> and so when we inline the head this should simplify, avoiding
> the new parameters to propagate_rank?

I managed to move propagate_bias_p (stmt) out of the loop, but couldn't
find any worthy simplification opportunities.  For example, if we move
the biasing logic out of propagate_rank () into the loop,  nothing gets
cancelled out and the resulting code looks more cluttered.

I'll post the v3 after regtest finishes.

> 
> Otherwise looks good to me.
> 
> Richard.
  

Patch

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 420c14e8cf5..2f7a8882aac 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -211,6 +211,10 @@  static int64_t *bb_rank;
 /* Operand->rank hashtable.  */
 static hash_map<tree, int64_t> *operand_rank;
 
+/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be
+   biased.  */
+static auto_bitmap biased_names;
+
 /* Vector of SSA_NAMEs on which after reassociate_bb is done with
    all basic blocks the CFG should be adjusted - basic blocks
    split right after that SSA_NAME's definition statement and before
@@ -256,6 +260,50 @@  reassoc_remove_stmt (gimple_stmt_iterator *gsi)
    the rank difference between two blocks.  */
 #define PHI_LOOP_BIAS (1 << 15)
 
+/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's
+   operands to the STMT's left-hand side.  The goal is to preserve bias in code
+   like this:
+
+     x_1 = phi(x_0, x_2)
+     a = x_1 | 1
+     b = a ^ 2
+     .MEM = b
+     c = b + d
+     x_2 = c + e
+
+   That is, we need to preserve bias along single-use chains originating from
+   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be
+   uses, because only they participate in rank propagation.  */
+static bool
+propagate_bias_p (gimple *stmt)
+{
+  use_operand_p use;
+  imm_use_iterator use_iter;
+  gimple *single_use_stmt = NULL;
+
+  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
+    {
+      gimple *current_use_stmt = USE_STMT (use);
+
+      if (is_gimple_assign (current_use_stmt)
+	  && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME)
+	{
+	  if (single_use_stmt != NULL)
+	    return false;
+	  single_use_stmt = current_use_stmt;
+	}
+    }
+
+  if (single_use_stmt == NULL)
+    return false;
+
+  if (gimple_bb (stmt)->loop_father
+      != gimple_bb (single_use_stmt)->loop_father)
+    return false;
+
+  return true;
+}
+
 /* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
    an innermost loop, and the phi has only a single use which is inside
    the loop, then the rank is the block rank of the loop latch plus an
@@ -313,46 +361,23 @@  phi_rank (gimple *stmt)
   return bb_rank[bb->index];
 }
 
-/* If EXP is an SSA_NAME defined by a PHI statement that represents a
-   loop-carried dependence of an innermost loop, return TRUE; else
-   return FALSE.  */
-static bool
-loop_carried_phi (tree exp)
-{
-  gimple *phi_stmt;
-  int64_t block_rank;
-
-  if (TREE_CODE (exp) != SSA_NAME
-      || SSA_NAME_IS_DEFAULT_DEF (exp))
-    return false;
-
-  phi_stmt = SSA_NAME_DEF_STMT (exp);
-
-  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
-    return false;
-
-  /* Non-loop-carried phis have block rank.  Loop-carried phis have
-     an additional bias added in.  If this phi doesn't have block rank,
-     it's biased and should not be propagated.  */
-  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
-
-  if (phi_rank (phi_stmt) != block_rank)
-    return true;
-
-  return false;
-}
-
 /* Return the maximum of RANK and the rank that should be propagated
    from expression OP.  For most operands, this is just the rank of OP.
    For loop-carried phis, the value is zero to avoid undoing the bias
    in favor of the phi.  */
 static int64_t
-propagate_rank (int64_t rank, tree op)
+propagate_rank (int64_t rank, tree op, gimple *stmt, bool *bias_p)
 {
   int64_t op_rank;
 
-  if (loop_carried_phi (op))
-    return rank;
+  if (TREE_CODE (op) == SSA_NAME
+      && bitmap_bit_p (biased_names, SSA_NAME_VERSION (op)))
+    {
+      if (propagate_bias_p (stmt))
+	*bias_p = true;
+      else
+	return rank;
+    }
 
   op_rank = get_rank (op);
 
@@ -440,6 +465,8 @@  get_rank (tree e)
 
       else
 	{
+	  bool bias_p = false;
+
 	  /* Otherwise, find the maximum rank for the operands.  As an
 	     exception, remove the bias from loop-carried phis when propagating
 	     the rank so that dependent operations are not also biased.  */
@@ -448,9 +475,12 @@  get_rank (tree e)
 	     thus have rank 0.  */
 	  rank = 0;
 	  FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
-	    rank = propagate_rank (rank, op);
+	    rank = propagate_rank (rank, op, stmt, &bias_p);
 
 	  rank += 1;
+	  if (bias_p)
+	    bitmap_set_bit (biased_names,
+			    SSA_NAME_VERSION (gimple_assign_lhs (stmt)));
 	}
 
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -6933,6 +6963,7 @@  fini_reassoc (void)
 			    reassociate_stats.pows_created);
 
   delete operand_rank;
+  bitmap_clear (biased_names);
   operand_entry_pool.release ();
   free (bb_rank);
   plus_negates.release ();