[4/5] if-conv: Apply VN to hoisted conversions

Message ID mptk0hdmesi.fsf@arm.com
State Committed
Headers
Series [1/5] vect: Use code_helper when building SLP nodes |

Commit Message

Richard Sandiford Nov. 12, 2021, 6:04 p.m. UTC
  This patch is a prerequisite for a later one.  At the moment,
if-conversion converts predicated POINTER_PLUS_EXPRs into
non-wrapping forms, which for:

    … = base + offset

becomes:

    tmp = (unsigned long) base
    … = tmp + offset

It then hoists these conversions out of the loop where possible.

However, because “base” is a valid gimple operand, there can be
multiple POINTER_PLUS_EXPRs with the same base, which can in turn
lead to multiple instances of the same conversion.  The later VN pass
is (and I think needs to be) restricted to the new if-converted code,
whereas here we're deliberately inserting the conversions before the
.LOOP_VECTORIZED condition:

	/* If we versioned loop then make sure to insert invariant
	   stmts before the .LOOP_VECTORIZED check since the vectorizer
	   will re-use that for things like runtime alias versioning
	   whose condition can end up using those invariants.  */

We can therefore enter the vectoriser with redundant conversions.

The easiest fix seemed to be to defer the hoisting until after VN.
This catches other hoisting opportunities too.

Hoisting the code from the (artificial) loop in pr99102.c means
that it's no longer worth vectorising.  The patch forces vectorisation
instead of relying on the cost model.

The patch also reverts pr87007-4.c and pr87007-5.c back to their
original forms, undoing changes in 783dc66f9ccb0019c3dad.
The code at the time the tests were added was:

        testl   %edi, %edi
        je      .L10
        vxorps  %xmm1, %xmm1, %xmm1
        vsqrtsd d3(%rip), %xmm1, %xmm0
        vsqrtsd d2(%rip), %xmm1, %xmm1
	...
.L10:
        ret

with the operations being hoisted, and the vxorps was specifically
wanted (compared to the previous code).  This patch restores the code
to that form, with the hoisted operations and the vxorps.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
	* tree-if-conv.c: Include tree-eh.h.
	(predicate_statements): Remove pe argument.  Don't hoist
	statements here.
	(combine_blocks): Remove pe argument.
	(ifcvt_can_hoist, ifcvt_can_hoist_further): New functions.
	(ifcvt_hoist_invariants): Likewise.
	(tree_if_conversion): Update call to combine_blocks.  Call
	ifcvt_hoist_invariants after VN.

gcc/testsuite/
	* gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.

	Revert:

	2020-09-09  Richard Biener  <rguenther@suse.de>

	* gcc.target/i386/pr87007-4.c: Adjust.
	* gcc.target/i386/pr87007-5.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr99102.c       |   2 +-
 gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
 gcc/tree-if-conv.c                        | 122 ++++++++++++++++++++--
 4 files changed, 114 insertions(+), 14 deletions(-)
  

Comments

Richard Biener Nov. 15, 2021, 12:47 p.m. UTC | #1
On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch is a prerequisite for a later one.  At the moment,
> if-conversion converts predicated POINTER_PLUS_EXPRs into
> non-wrapping forms, which for:
>
>     … = base + offset
>
> becomes:
>
>     tmp = (unsigned long) base
>     … = tmp + offset
>
> It then hoists these conversions out of the loop where possible.
>
> However, because “base” is a valid gimple operand, there can be
> multiple POINTER_PLUS_EXPRs with the same base, which can in turn
> lead to multiple instances of the same conversion.  The later VN pass
> is (and I think needs to be) restricted to the new if-converted code,
> whereas here we're deliberately inserting the conversions before the
> .LOOP_VECTORIZED condition:
>
>         /* If we versioned loop then make sure to insert invariant
>            stmts before the .LOOP_VECTORIZED check since the vectorizer
>            will re-use that for things like runtime alias versioning
>            whose condition can end up using those invariants.  */
>
> We can therefore enter the vectoriser with redundant conversions.
>
> The easiest fix seemed to be to defer the hoisting until after VN.
> This catches other hoisting opportunities too.
>
> Hoisting the code from the (artificial) loop in pr99102.c means
> that it's no longer worth vectorising.  The patch forces vectorisation
> instead of relying on the cost model.
>
> The patch also reverts pr87007-4.c and pr87007-5.c back to their
> original forms, undoing changes in 783dc66f9ccb0019c3dad.
> The code at the time the tests were added was:
>
>         testl   %edi, %edi
>         je      .L10
>         vxorps  %xmm1, %xmm1, %xmm1
>         vsqrtsd d3(%rip), %xmm1, %xmm0
>         vsqrtsd d2(%rip), %xmm1, %xmm1
>         ...
> .L10:
>         ret
>
> with the operations being hoisted, and the vxorps was specifically
> wanted (compared to the previous code).  This patch restores the code
> to that form, with the hoisted operations and the vxorps.
>
> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
>         * tree-if-conv.c: Include tree-eh.h.
>         (predicate_statements): Remove pe argument.  Don't hoist
>         statements here.
>         (combine_blocks): Remove pe argument.
>         (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions.
>         (ifcvt_hoist_invariants): Likewise.
>         (tree_if_conversion): Update call to combine_blocks.  Call
>         ifcvt_hoist_invariants after VN.
>
> gcc/testsuite/
>         * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.
>
>         Revert:
>
>         2020-09-09  Richard Biener  <rguenther@suse.de>
>
>         * gcc.target/i386/pr87007-4.c: Adjust.
>         * gcc.target/i386/pr87007-5.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/pr99102.c       |   2 +-
>  gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
>  gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
>  gcc/tree-if-conv.c                        | 122 ++++++++++++++++++++--
>  4 files changed, 114 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
> index 6c1a13f0783..0d030d15c86 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
> @@ -1,4 +1,4 @@
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
>  /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
>  long a[44];
>  short d, e = -7;
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
> index 9c4b8005af3..e91bdcbac44 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
> @@ -15,4 +15,4 @@ foo (int n, int k)
>        d1 = ceil (d3);
>  }
>
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> index e4d956a5d7f..20d13cf650b 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> @@ -15,4 +15,4 @@ foo (int n, int k)
>        d1 = sqrt (d3);
>  }
>
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index e88ddc9f788..0ad557a2f4d 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-cfgcleanup.h"
>  #include "tree-ssa-dse.h"
>  #include "tree-vectorizer.h"
> +#include "tree-eh.h"
>
>  /* Only handle PHIs with no more arguments unless we are asked to by
>     simd pragma.  */
> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
>  */
>
>  static void
> -predicate_statements (loop_p loop, edge pe)
> +predicate_statements (loop_p loop)
>  {
>    unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>    auto_vec<int, 1> vect_sizes;
> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe)
>                 {
>                   gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
>                   gsi_remove (&gsi2, false);
> -                 /* Make sure to move invariant conversions out of the
> -                    loop.  */
> -                 if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
> -                     && expr_invariant_in_loop_p (loop,
> -                                                  gimple_assign_rhs1 (stmt2)))
> -                   gsi_insert_on_edge_immediate (pe, stmt2);
> -                 else if (first)
> +                 if (first)
>                     {
>                       gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
>                       first = false;
> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop)
>     blocks.  Replace PHI nodes with conditional modify expressions.  */
>
>  static void
> -combine_blocks (class loop *loop, edge pe)
> +combine_blocks (class loop *loop)
>  {
>    basic_block bb, exit_bb, merge_target_bb;
>    unsigned int orig_loop_num_nodes = loop->num_nodes;
> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe)
>    predicate_all_scalar_phis (loop);
>
>    if (need_to_predicate || need_to_rewrite_undefined)
> -    predicate_statements (loop, pe);
> +    predicate_statements (loop);
>
>    /* Merge basic blocks.  */
>    exit_bb = NULL;
> @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop)
>      }
>  }
>
> +/* Return true if STMT can be hoisted from if-converted loop LOOP.  */
> +
> +static bool
> +ifcvt_can_hoist (class loop *loop, gimple *stmt)
> +{
> +  if (auto *call = dyn_cast<gcall *> (stmt))
> +    {
> +      if (gimple_call_internal_p (call)
> +         && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
> +       return false;
> +    }
> +  else if (auto *assign = dyn_cast<gassign *> (stmt))
> +    {
> +      if (gimple_assign_rhs_code (assign) == COND_EXPR)
> +       return false;
> +    }
> +  else
> +    return false;
> +
> +  if (gimple_has_side_effects (stmt)
> +      || gimple_could_trap_p (stmt)
> +      || stmt_could_throw_p (cfun, stmt)
> +      || gimple_vdef (stmt)
> +      || gimple_vuse (stmt))
> +    return false;
> +
> +  int num_args = gimple_num_args (stmt);
> +  for (int i = 0; i < num_args; ++i)
> +    if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
> +      return false;
> +
> +  return true;
> +}
> +
> +/* PE is the preferred hoisting edge selected by tree_if_conversion, which
> +   s known to be different from (and to dominate) the preheader edge of the
> +   if-converted loop.  We already know that STMT can be inserted on the loop
> +   preheader edge.  Return true if we prefer to insert it on PE instead.  */
> +
> +static bool
> +ifcvt_can_hoist_further (edge pe, gimple *stmt)
> +{
> +  /* As explained in tree_if_conversion, we want to hoist invariant
> +     conversions further so that they can be reused by alias analysis.  */
> +  auto *assign = dyn_cast<gassign *> (stmt);
> +  if (assign
> +      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign)))
> +    {
> +      tree rhs = gimple_assign_rhs1 (assign);
> +      if (is_gimple_min_invariant (rhs))
> +       return true;
> +
> +      if (TREE_CODE (rhs) == SSA_NAME)
> +       {
> +         basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
> +         if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
> +           return true;
> +       }
> +    }
> +  return false;
> +}
> +
> +/* Hoist invariant statements from LOOP.  PE is the preferred edge for
> +   hoisting conversions, as selected by tree_if_conversion; see there
> +   for details.  */
> +
> +static void
> +ifcvt_hoist_invariants (class loop *loop, edge pe)
> +{
> +  gimple_stmt_iterator hoist_gsi = {};
> +  gimple_stmt_iterator hoist_gsi_pe = {};
> +  unsigned int num_blocks = loop->num_nodes;
> +  basic_block *body = get_loop_body (loop);
> +  for (unsigned int i = 0; i < num_blocks; ++i)
> +    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
> +      {
> +       gimple *stmt = gsi_stmt (gsi);
> +       if (ifcvt_can_hoist (loop, stmt))
> +         {
> +           /* Once we've hoisted one statement, insert other statements
> +              after it.  */
> +           edge e = loop_preheader_edge (loop);
> +           gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi;
> +           if (e != pe && ifcvt_can_hoist_further (pe, stmt))

One issue with hoisting to loop_preheader_edge instead of 'pe'
is that eventually the loop versioning code will pick up the defs
through the versioning condition and that will break because the
versioning condition will be inserted in place of the
IFN_VECTORIZED_LOOP, which is _before_ the preheader.
The code basically expects the preheader to be empty.

I don't see how ifcvt_can_hoist_further avoids this for
non-CONVERT_EXPRs.

So I don't see how this is actually safe?  It's probably moderately
easy to trigger issues when you disable both PRE and
loop invariant motion before if-conversion/vectorization.

Maybe doing something similar as cse_and_gimplify_to_preheader
done in the vectorizer helps for the original problem.  Or alternatively
do the hoisting during loop versioning ...

Richard.

> +             {
> +               e = pe;
> +               hoist_gsi_ptr = &hoist_gsi_pe;
> +             }
> +           gsi_remove (&gsi, false);
> +           if (hoist_gsi_ptr->ptr)
> +             gsi_insert_after (hoist_gsi_ptr, stmt, GSI_NEW_STMT);
> +           else
> +             {
> +               gsi_insert_on_edge_immediate (e, stmt);
> +               *hoist_gsi_ptr = gsi_for_stmt (stmt);
> +             }
> +           continue;
> +         }
> +       gsi_next (&gsi);
> +      }
> +  free (body);
> +}
> +
>  /* If-convert LOOP when it is legal.  For the moment this pass has no
>     profitability analysis.  Returns non-zero todo flags when something
>     changed.  */
> @@ -3275,7 +3373,7 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>    /* Now all statements are if-convertible.  Combine all the basic
>       blocks into one huge basic block doing the if-conversion
>       on-the-fly.  */
> -  combine_blocks (loop, pe);
> +  combine_blocks (loop);
>
>    /* Perform local CSE, this esp. helps the vectorizer analysis if loads
>       and stores are involved.  CSE only the loop body, not the entry
> @@ -3297,6 +3395,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>    ifcvt_local_dce (loop);
>    BITMAP_FREE (exit_bbs);
>
> +  ifcvt_hoist_invariants (loop, pe);
> +
>    todo |= TODO_cleanup_cfg;
>
>   cleanup:
> --
> 2.25.1
>
  
Richard Sandiford Nov. 15, 2021, 1:59 p.m. UTC | #2
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> This patch is a prerequisite for a later one.  At the moment,
>> if-conversion converts predicated POINTER_PLUS_EXPRs into
>> non-wrapping forms, which for:
>>
>>     … = base + offset
>>
>> becomes:
>>
>>     tmp = (unsigned long) base
>>     … = tmp + offset
>>
>> It then hoists these conversions out of the loop where possible.
>>
>> However, because “base” is a valid gimple operand, there can be
>> multiple POINTER_PLUS_EXPRs with the same base, which can in turn
>> lead to multiple instances of the same conversion.  The later VN pass
>> is (and I think needs to be) restricted to the new if-converted code,
>> whereas here we're deliberately inserting the conversions before the
>> .LOOP_VECTORIZED condition:
>>
>>         /* If we versioned loop then make sure to insert invariant
>>            stmts before the .LOOP_VECTORIZED check since the vectorizer
>>            will re-use that for things like runtime alias versioning
>>            whose condition can end up using those invariants.  */
>>
>> We can therefore enter the vectoriser with redundant conversions.
>>
>> The easiest fix seemed to be to defer the hoisting until after VN.
>> This catches other hoisting opportunities too.
>>
>> Hoisting the code from the (artificial) loop in pr99102.c means
>> that it's no longer worth vectorising.  The patch forces vectorisation
>> instead of relying on the cost model.
>>
>> The patch also reverts pr87007-4.c and pr87007-5.c back to their
>> original forms, undoing changes in 783dc66f9ccb0019c3dad.
>> The code at the time the tests were added was:
>>
>>         testl   %edi, %edi
>>         je      .L10
>>         vxorps  %xmm1, %xmm1, %xmm1
>>         vsqrtsd d3(%rip), %xmm1, %xmm0
>>         vsqrtsd d2(%rip), %xmm1, %xmm1
>>         ...
>> .L10:
>>         ret
>>
>> with the operations being hoisted, and the vxorps was specifically
>> wanted (compared to the previous code).  This patch restores the code
>> to that form, with the hoisted operations and the vxorps.
>>
>> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> gcc/
>>         * tree-if-conv.c: Include tree-eh.h.
>>         (predicate_statements): Remove pe argument.  Don't hoist
>>         statements here.
>>         (combine_blocks): Remove pe argument.
>>         (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions.
>>         (ifcvt_hoist_invariants): Likewise.
>>         (tree_if_conversion): Update call to combine_blocks.  Call
>>         ifcvt_hoist_invariants after VN.
>>
>> gcc/testsuite/
>>         * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.
>>
>>         Revert:
>>
>>         2020-09-09  Richard Biener  <rguenther@suse.de>
>>
>>         * gcc.target/i386/pr87007-4.c: Adjust.
>>         * gcc.target/i386/pr87007-5.c: Likewise.
>> ---
>>  gcc/testsuite/gcc.dg/vect/pr99102.c       |   2 +-
>>  gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
>>  gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
>>  gcc/tree-if-conv.c                        | 122 ++++++++++++++++++++--
>>  4 files changed, 114 insertions(+), 14 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> index 6c1a13f0783..0d030d15c86 100644
>> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c
>> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
>> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
>>  /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
>>  long a[44];
>>  short d, e = -7;
>> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> index 9c4b8005af3..e91bdcbac44 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> @@ -15,4 +15,4 @@ foo (int n, int k)
>>        d1 = ceil (d3);
>>  }
>>
>> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
>> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> index e4d956a5d7f..20d13cf650b 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> @@ -15,4 +15,4 @@ foo (int n, int k)
>>        d1 = sqrt (d3);
>>  }
>>
>> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
>> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
>> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> index e88ddc9f788..0ad557a2f4d 100644
>> --- a/gcc/tree-if-conv.c
>> +++ b/gcc/tree-if-conv.c
>> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-cfgcleanup.h"
>>  #include "tree-ssa-dse.h"
>>  #include "tree-vectorizer.h"
>> +#include "tree-eh.h"
>>
>>  /* Only handle PHIs with no more arguments unless we are asked to by
>>     simd pragma.  */
>> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
>>  */
>>
>>  static void
>> -predicate_statements (loop_p loop, edge pe)
>> +predicate_statements (loop_p loop)
>>  {
>>    unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>>    auto_vec<int, 1> vect_sizes;
>> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe)
>>                 {
>>                   gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
>>                   gsi_remove (&gsi2, false);
>> -                 /* Make sure to move invariant conversions out of the
>> -                    loop.  */
>> -                 if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
>> -                     && expr_invariant_in_loop_p (loop,
>> -                                                  gimple_assign_rhs1 (stmt2)))
>> -                   gsi_insert_on_edge_immediate (pe, stmt2);
>> -                 else if (first)
>> +                 if (first)
>>                     {
>>                       gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
>>                       first = false;
>> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop)
>>     blocks.  Replace PHI nodes with conditional modify expressions.  */
>>
>>  static void
>> -combine_blocks (class loop *loop, edge pe)
>> +combine_blocks (class loop *loop)
>>  {
>>    basic_block bb, exit_bb, merge_target_bb;
>>    unsigned int orig_loop_num_nodes = loop->num_nodes;
>> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe)
>>    predicate_all_scalar_phis (loop);
>>
>>    if (need_to_predicate || need_to_rewrite_undefined)
>> -    predicate_statements (loop, pe);
>> +    predicate_statements (loop);
>>
>>    /* Merge basic blocks.  */
>>    exit_bb = NULL;
>> @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop)
>>      }
>>  }
>>
>> +/* Return true if STMT can be hoisted from if-converted loop LOOP.  */
>> +
>> +static bool
>> +ifcvt_can_hoist (class loop *loop, gimple *stmt)
>> +{
>> +  if (auto *call = dyn_cast<gcall *> (stmt))
>> +    {
>> +      if (gimple_call_internal_p (call)
>> +         && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
>> +       return false;
>> +    }
>> +  else if (auto *assign = dyn_cast<gassign *> (stmt))
>> +    {
>> +      if (gimple_assign_rhs_code (assign) == COND_EXPR)
>> +       return false;
>> +    }
>> +  else
>> +    return false;
>> +
>> +  if (gimple_has_side_effects (stmt)
>> +      || gimple_could_trap_p (stmt)
>> +      || stmt_could_throw_p (cfun, stmt)
>> +      || gimple_vdef (stmt)
>> +      || gimple_vuse (stmt))
>> +    return false;
>> +
>> +  int num_args = gimple_num_args (stmt);
>> +  for (int i = 0; i < num_args; ++i)
>> +    if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
>> +      return false;
>> +
>> +  return true;
>> +}
>> +
>> +/* PE is the preferred hoisting edge selected by tree_if_conversion, which
>> +   s known to be different from (and to dominate) the preheader edge of the
>> +   if-converted loop.  We already know that STMT can be inserted on the loop
>> +   preheader edge.  Return true if we prefer to insert it on PE instead.  */
>> +
>> +static bool
>> +ifcvt_can_hoist_further (edge pe, gimple *stmt)
>> +{
>> +  /* As explained in tree_if_conversion, we want to hoist invariant
>> +     conversions further so that they can be reused by alias analysis.  */
>> +  auto *assign = dyn_cast<gassign *> (stmt);
>> +  if (assign
>> +      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign)))
>> +    {
>> +      tree rhs = gimple_assign_rhs1 (assign);
>> +      if (is_gimple_min_invariant (rhs))
>> +       return true;
>> +
>> +      if (TREE_CODE (rhs) == SSA_NAME)
>> +       {
>> +         basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
>> +         if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
>> +           return true;
>> +       }
>> +    }
>> +  return false;
>> +}
>> +
>> +/* Hoist invariant statements from LOOP.  PE is the preferred edge for
>> +   hoisting conversions, as selected by tree_if_conversion; see there
>> +   for details.  */
>> +
>> +static void
>> +ifcvt_hoist_invariants (class loop *loop, edge pe)
>> +{
>> +  gimple_stmt_iterator hoist_gsi = {};
>> +  gimple_stmt_iterator hoist_gsi_pe = {};
>> +  unsigned int num_blocks = loop->num_nodes;
>> +  basic_block *body = get_loop_body (loop);
>> +  for (unsigned int i = 0; i < num_blocks; ++i)
>> +    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
>> +      {
>> +       gimple *stmt = gsi_stmt (gsi);
>> +       if (ifcvt_can_hoist (loop, stmt))
>> +         {
>> +           /* Once we've hoisted one statement, insert other statements
>> +              after it.  */
>> +           edge e = loop_preheader_edge (loop);
>> +           gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi;
>> +           if (e != pe && ifcvt_can_hoist_further (pe, stmt))
>
> One issue with hoisting to loop_preheader_edge instead of 'pe'
> is that eventually the loop versioning code will pick up the defs
> through the versioning condition and that will break because the
> versioning condition will be inserted in place of the
> IFN_VECTORIZED_LOOP, which is _before_ the preheader.
> The code basically expects the preheader to be empty.

Do you mean that the versioning-for-aliases code assumes that all
loop-invariant definitions are available in the versioning condition?
I hadn't realised that's what the comment was saying.  It sounded more
like an optimisation.

Thanks,
Richard

> I don't see how ifcvt_can_hoist_further avoids this for
> non-CONVERT_EXPRs.
>
> So I don't see how this is actually safe?  It's probably moderately
> easy to trigger issues when you disable both PRE and
> loop invariant motion before if-conversion/vectorization.
>
> Maybe doing something similar as cse_and_gimplify_to_preheader
> done in the vectorizer helps for the original problem.  Or alternatively
> do the hoisting during loop versioning ...
>
> Richard.
>
>> +             {
>> +               e = pe;
>> +               hoist_gsi_ptr = &hoist_gsi_pe;
>> +             }
>> +           gsi_remove (&gsi, false);
>> +           if (hoist_gsi_ptr->ptr)
>> +             gsi_insert_after (hoist_gsi_ptr, stmt, GSI_NEW_STMT);
>> +           else
>> +             {
>> +               gsi_insert_on_edge_immediate (e, stmt);
>> +               *hoist_gsi_ptr = gsi_for_stmt (stmt);
>> +             }
>> +           continue;
>> +         }
>> +       gsi_next (&gsi);
>> +      }
>> +  free (body);
>> +}
>> +
>>  /* If-convert LOOP when it is legal.  For the moment this pass has no
>>     profitability analysis.  Returns non-zero todo flags when something
>>     changed.  */
>> @@ -3275,7 +3373,7 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>>    /* Now all statements are if-convertible.  Combine all the basic
>>       blocks into one huge basic block doing the if-conversion
>>       on-the-fly.  */
>> -  combine_blocks (loop, pe);
>> +  combine_blocks (loop);
>>
>>    /* Perform local CSE, this esp. helps the vectorizer analysis if loads
>>       and stores are involved.  CSE only the loop body, not the entry
>> @@ -3297,6 +3395,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>>    ifcvt_local_dce (loop);
>>    BITMAP_FREE (exit_bbs);
>>
>> +  ifcvt_hoist_invariants (loop, pe);
>> +
>>    todo |= TODO_cleanup_cfg;
>>
>>   cleanup:
>> --
>> 2.25.1
>>
  
Richard Biener Nov. 15, 2021, 2:11 p.m. UTC | #3
On Mon, Nov 15, 2021 at 3:00 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> This patch is a prerequisite for a later one.  At the moment,
> >> if-conversion converts predicated POINTER_PLUS_EXPRs into
> >> non-wrapping forms, which for:
> >>
> >>     … = base + offset
> >>
> >> becomes:
> >>
> >>     tmp = (unsigned long) base
> >>     … = tmp + offset
> >>
> >> It then hoists these conversions out of the loop where possible.
> >>
> >> However, because “base” is a valid gimple operand, there can be
> >> multiple POINTER_PLUS_EXPRs with the same base, which can in turn
> >> lead to multiple instances of the same conversion.  The later VN pass
> >> is (and I think needs to be) restricted to the new if-converted code,
> >> whereas here we're deliberately inserting the conversions before the
> >> .LOOP_VECTORIZED condition:
> >>
> >>         /* If we versioned loop then make sure to insert invariant
> >>            stmts before the .LOOP_VECTORIZED check since the vectorizer
> >>            will re-use that for things like runtime alias versioning
> >>            whose condition can end up using those invariants.  */
> >>
> >> We can therefore enter the vectoriser with redundant conversions.
> >>
> >> The easiest fix seemed to be to defer the hoisting until after VN.
> >> This catches other hoisting opportunities too.
> >>
> >> Hoisting the code from the (artificial) loop in pr99102.c means
> >> that it's no longer worth vectorising.  The patch forces vectorisation
> >> instead of relying on the cost model.
> >>
> >> The patch also reverts pr87007-4.c and pr87007-5.c back to their
> >> original forms, undoing changes in 783dc66f9ccb0019c3dad.
> >> The code at the time the tests were added was:
> >>
> >>         testl   %edi, %edi
> >>         je      .L10
> >>         vxorps  %xmm1, %xmm1, %xmm1
> >>         vsqrtsd d3(%rip), %xmm1, %xmm0
> >>         vsqrtsd d2(%rip), %xmm1, %xmm1
> >>         ...
> >> .L10:
> >>         ret
> >>
> >> with the operations being hoisted, and the vxorps was specifically
> >> wanted (compared to the previous code).  This patch restores the code
> >> to that form, with the hoisted operations and the vxorps.
> >>
> >> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>
> >> Richard
> >>
> >>
> >> gcc/
> >>         * tree-if-conv.c: Include tree-eh.h.
> >>         (predicate_statements): Remove pe argument.  Don't hoist
> >>         statements here.
> >>         (combine_blocks): Remove pe argument.
> >>         (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions.
> >>         (ifcvt_hoist_invariants): Likewise.
> >>         (tree_if_conversion): Update call to combine_blocks.  Call
> >>         ifcvt_hoist_invariants after VN.
> >>
> >> gcc/testsuite/
> >>         * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.
> >>
> >>         Revert:
> >>
> >>         2020-09-09  Richard Biener  <rguenther@suse.de>
> >>
> >>         * gcc.target/i386/pr87007-4.c: Adjust.
> >>         * gcc.target/i386/pr87007-5.c: Likewise.
> >> ---
> >>  gcc/testsuite/gcc.dg/vect/pr99102.c       |   2 +-
> >>  gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
> >>  gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
> >>  gcc/tree-if-conv.c                        | 122 ++++++++++++++++++++--
> >>  4 files changed, 114 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
> >> index 6c1a13f0783..0d030d15c86 100644
> >> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c
> >> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
> >> @@ -1,4 +1,4 @@
> >> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> >> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
> >>  /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
> >>  long a[44];
> >>  short d, e = -7;
> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
> >> index 9c4b8005af3..e91bdcbac44 100644
> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
> >> @@ -15,4 +15,4 @@ foo (int n, int k)
> >>        d1 = ceil (d3);
> >>  }
> >>
> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> >> index e4d956a5d7f..20d13cf650b 100644
> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> >> @@ -15,4 +15,4 @@ foo (int n, int k)
> >>        d1 = sqrt (d3);
> >>  }
> >>
> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> >> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> >> index e88ddc9f788..0ad557a2f4d 100644
> >> --- a/gcc/tree-if-conv.c
> >> +++ b/gcc/tree-if-conv.c
> >> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3.  If not see
> >>  #include "tree-cfgcleanup.h"
> >>  #include "tree-ssa-dse.h"
> >>  #include "tree-vectorizer.h"
> >> +#include "tree-eh.h"
> >>
> >>  /* Only handle PHIs with no more arguments unless we are asked to by
> >>     simd pragma.  */
> >> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
> >>  */
> >>
> >>  static void
> >> -predicate_statements (loop_p loop, edge pe)
> >> +predicate_statements (loop_p loop)
> >>  {
> >>    unsigned int i, orig_loop_num_nodes = loop->num_nodes;
> >>    auto_vec<int, 1> vect_sizes;
> >> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe)
> >>                 {
> >>                   gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
> >>                   gsi_remove (&gsi2, false);
> >> -                 /* Make sure to move invariant conversions out of the
> >> -                    loop.  */
> >> -                 if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
> >> -                     && expr_invariant_in_loop_p (loop,
> >> -                                                  gimple_assign_rhs1 (stmt2)))
> >> -                   gsi_insert_on_edge_immediate (pe, stmt2);
> >> -                 else if (first)
> >> +                 if (first)
> >>                     {
> >>                       gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
> >>                       first = false;
> >> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop)
> >>     blocks.  Replace PHI nodes with conditional modify expressions.  */
> >>
> >>  static void
> >> -combine_blocks (class loop *loop, edge pe)
> >> +combine_blocks (class loop *loop)
> >>  {
> >>    basic_block bb, exit_bb, merge_target_bb;
> >>    unsigned int orig_loop_num_nodes = loop->num_nodes;
> >> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe)
> >>    predicate_all_scalar_phis (loop);
> >>
> >>    if (need_to_predicate || need_to_rewrite_undefined)
> >> -    predicate_statements (loop, pe);
> >> +    predicate_statements (loop);
> >>
> >>    /* Merge basic blocks.  */
> >>    exit_bb = NULL;
> >> @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop)
> >>      }
> >>  }
> >>
> >> +/* Return true if STMT can be hoisted from if-converted loop LOOP.  */
> >> +
> >> +static bool
> >> +ifcvt_can_hoist (class loop *loop, gimple *stmt)
> >> +{
> >> +  if (auto *call = dyn_cast<gcall *> (stmt))
> >> +    {
> >> +      if (gimple_call_internal_p (call)
> >> +         && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
> >> +       return false;
> >> +    }
> >> +  else if (auto *assign = dyn_cast<gassign *> (stmt))
> >> +    {
> >> +      if (gimple_assign_rhs_code (assign) == COND_EXPR)
> >> +       return false;
> >> +    }
> >> +  else
> >> +    return false;
> >> +
> >> +  if (gimple_has_side_effects (stmt)
> >> +      || gimple_could_trap_p (stmt)
> >> +      || stmt_could_throw_p (cfun, stmt)
> >> +      || gimple_vdef (stmt)
> >> +      || gimple_vuse (stmt))
> >> +    return false;
> >> +
> >> +  int num_args = gimple_num_args (stmt);
> >> +  for (int i = 0; i < num_args; ++i)
> >> +    if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
> >> +      return false;
> >> +
> >> +  return true;
> >> +}
> >> +
> >> +/* PE is the preferred hoisting edge selected by tree_if_conversion, which
> >> +   s known to be different from (and to dominate) the preheader edge of the
> >> +   if-converted loop.  We already know that STMT can be inserted on the loop
> >> +   preheader edge.  Return true if we prefer to insert it on PE instead.  */
> >> +
> >> +static bool
> >> +ifcvt_can_hoist_further (edge pe, gimple *stmt)
> >> +{
> >> +  /* As explained in tree_if_conversion, we want to hoist invariant
> >> +     conversions further so that they can be reused by alias analysis.  */
> >> +  auto *assign = dyn_cast<gassign *> (stmt);
> >> +  if (assign
> >> +      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign)))
> >> +    {
> >> +      tree rhs = gimple_assign_rhs1 (assign);
> >> +      if (is_gimple_min_invariant (rhs))
> >> +       return true;
> >> +
> >> +      if (TREE_CODE (rhs) == SSA_NAME)
> >> +       {
> >> +         basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
> >> +         if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
> >> +           return true;
> >> +       }
> >> +    }
> >> +  return false;
> >> +}
> >> +
> >> +/* Hoist invariant statements from LOOP.  PE is the preferred edge for
> >> +   hoisting conversions, as selected by tree_if_conversion; see there
> >> +   for details.  */
> >> +
> >> +static void
> >> +ifcvt_hoist_invariants (class loop *loop, edge pe)
> >> +{
> >> +  gimple_stmt_iterator hoist_gsi = {};
> >> +  gimple_stmt_iterator hoist_gsi_pe = {};
> >> +  unsigned int num_blocks = loop->num_nodes;
> >> +  basic_block *body = get_loop_body (loop);
> >> +  for (unsigned int i = 0; i < num_blocks; ++i)
> >> +    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
> >> +      {
> >> +       gimple *stmt = gsi_stmt (gsi);
> >> +       if (ifcvt_can_hoist (loop, stmt))
> >> +         {
> >> +           /* Once we've hoisted one statement, insert other statements
> >> +              after it.  */
> >> +           edge e = loop_preheader_edge (loop);
> >> +           gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi;
> >> +           if (e != pe && ifcvt_can_hoist_further (pe, stmt))
> >
> > One issue with hoisting to loop_preheader_edge instead of 'pe'
> > is that eventually the loop versioning code will pick up the defs
> > through the versioning condition and that will break because the
> > versioning condition will be inserted in place of the
> > IFN_VECTORIZED_LOOP, which is _before_ the preheader.
> > The code basically expects the preheader to be empty.
>
> Do you mean that the versioning-for-aliases code assumes that all
> loop-invariant definitions are available in the versioning condition?

Yes.  Or well, it assumes it can use the .LOOP_VECTORIZED condition
as versioning condition and there simply use insert the condition without
further checks.  And the checks are generated utlimatively from SCEV
which just expands things up to the loops preheader.

> I hadn't realised that's what the comment was saying.  It sounded more
> like an optimisation.

The hoisting itself is, but that we don't just hoist to the loops preheader
is for correctness, not an optimization ... it's indeed a bit messy though.

Richard.

>
> Thanks,
> Richard
>
> > I don't see how ifcvt_can_hoist_further avoids this for
> > non-CONVERT_EXPRs.
> >
> > So I don't see how this is actually safe?  It's probably moderately
> > easy to trigger issues when you disable both PRE and
> > loop invariant motion before if-conversion/vectorization.
> >
> > Maybe doing something similar as cse_and_gimplify_to_preheader
> > done in the vectorizer helps for the original problem.  Or alternatively
> > do the hoisting during loop versioning ...
> >
> > Richard.
> >
> >> +             {
> >> +               e = pe;
> >> +               hoist_gsi_ptr = &hoist_gsi_pe;
> >> +             }
> >> +           gsi_remove (&gsi, false);
> >> +           if (hoist_gsi_ptr->ptr)
> >> +             gsi_insert_after (hoist_gsi_ptr, stmt, GSI_NEW_STMT);
> >> +           else
> >> +             {
> >> +               gsi_insert_on_edge_immediate (e, stmt);
> >> +               *hoist_gsi_ptr = gsi_for_stmt (stmt);
> >> +             }
> >> +           continue;
> >> +         }
> >> +       gsi_next (&gsi);
> >> +      }
> >> +  free (body);
> >> +}
> >> +
> >>  /* If-convert LOOP when it is legal.  For the moment this pass has no
> >>     profitability analysis.  Returns non-zero todo flags when something
> >>     changed.  */
> >> @@ -3275,7 +3373,7 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
> >>    /* Now all statements are if-convertible.  Combine all the basic
> >>       blocks into one huge basic block doing the if-conversion
> >>       on-the-fly.  */
> >> -  combine_blocks (loop, pe);
> >> +  combine_blocks (loop);
> >>
> >>    /* Perform local CSE, this esp. helps the vectorizer analysis if loads
> >>       and stores are involved.  CSE only the loop body, not the entry
> >> @@ -3297,6 +3395,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
> >>    ifcvt_local_dce (loop);
> >>    BITMAP_FREE (exit_bbs);
> >>
> >> +  ifcvt_hoist_invariants (loop, pe);
> >> +
> >>    todo |= TODO_cleanup_cfg;
> >>
> >>   cleanup:
> >> --
> >> 2.25.1
> >>
  
Richard Sandiford Nov. 16, 2021, 6:05 p.m. UTC | #4
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Mon, Nov 15, 2021 at 3:00 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches
>> > <gcc-patches@gcc.gnu.org> wrote:
>> >>
>> >> This patch is a prerequisite for a later one.  At the moment,
>> >> if-conversion converts predicated POINTER_PLUS_EXPRs into
>> >> non-wrapping forms, which for:
>> >>
>> >>     … = base + offset
>> >>
>> >> becomes:
>> >>
>> >>     tmp = (unsigned long) base
>> >>     … = tmp + offset
>> >>
>> >> It then hoists these conversions out of the loop where possible.
>> >>
>> >> However, because “base” is a valid gimple operand, there can be
>> >> multiple POINTER_PLUS_EXPRs with the same base, which can in turn
>> >> lead to multiple instances of the same conversion.  The later VN pass
>> >> is (and I think needs to be) restricted to the new if-converted code,
>> >> whereas here we're deliberately inserting the conversions before the
>> >> .LOOP_VECTORIZED condition:
>> >>
>> >>         /* If we versioned loop then make sure to insert invariant
>> >>            stmts before the .LOOP_VECTORIZED check since the vectorizer
>> >>            will re-use that for things like runtime alias versioning
>> >>            whose condition can end up using those invariants.  */
>> >>
>> >> We can therefore enter the vectoriser with redundant conversions.
>> >>
>> >> The easiest fix seemed to be to defer the hoisting until after VN.
>> >> This catches other hoisting opportunities too.
>> >>
>> >> Hoisting the code from the (artificial) loop in pr99102.c means
>> >> that it's no longer worth vectorising.  The patch forces vectorisation
>> >> instead of relying on the cost model.
>> >>
>> >> The patch also reverts pr87007-4.c and pr87007-5.c back to their
>> >> original forms, undoing changes in 783dc66f9ccb0019c3dad.
>> >> The code at the time the tests were added was:
>> >>
>> >>         testl   %edi, %edi
>> >>         je      .L10
>> >>         vxorps  %xmm1, %xmm1, %xmm1
>> >>         vsqrtsd d3(%rip), %xmm1, %xmm0
>> >>         vsqrtsd d2(%rip), %xmm1, %xmm1
>> >>         ...
>> >> .L10:
>> >>         ret
>> >>
>> >> with the operations being hoisted, and the vxorps was specifically
>> >> wanted (compared to the previous code).  This patch restores the code
>> >> to that form, with the hoisted operations and the vxorps.
>> >>
>> >> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>> >>
>> >> Richard
>> >>
>> >>
>> >> gcc/
>> >>         * tree-if-conv.c: Include tree-eh.h.
>> >>         (predicate_statements): Remove pe argument.  Don't hoist
>> >>         statements here.
>> >>         (combine_blocks): Remove pe argument.
>> >>         (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions.
>> >>         (ifcvt_hoist_invariants): Likewise.
>> >>         (tree_if_conversion): Update call to combine_blocks.  Call
>> >>         ifcvt_hoist_invariants after VN.
>> >>
>> >> gcc/testsuite/
>> >>         * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.
>> >>
>> >>         Revert:
>> >>
>> >>         2020-09-09  Richard Biener  <rguenther@suse.de>
>> >>
>> >>         * gcc.target/i386/pr87007-4.c: Adjust.
>> >>         * gcc.target/i386/pr87007-5.c: Likewise.
>> >> ---
>> >>  gcc/testsuite/gcc.dg/vect/pr99102.c       |   2 +-
>> >>  gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
>> >>  gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
>> >>  gcc/tree-if-conv.c                        | 122 ++++++++++++++++++++--
>> >>  4 files changed, 114 insertions(+), 14 deletions(-)
>> >>
>> >> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> >> index 6c1a13f0783..0d030d15c86 100644
>> >> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c
>> >> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> >> @@ -1,4 +1,4 @@
>> >> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
>> >> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
>> >>  /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
>> >>  long a[44];
>> >>  short d, e = -7;
>> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> >> index 9c4b8005af3..e91bdcbac44 100644
>> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> >> @@ -15,4 +15,4 @@ foo (int n, int k)
>> >>        d1 = ceil (d3);
>> >>  }
>> >>
>> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
>> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
>> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> >> index e4d956a5d7f..20d13cf650b 100644
>> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> >> @@ -15,4 +15,4 @@ foo (int n, int k)
>> >>        d1 = sqrt (d3);
>> >>  }
>> >>
>> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
>> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
>> >> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> >> index e88ddc9f788..0ad557a2f4d 100644
>> >> --- a/gcc/tree-if-conv.c
>> >> +++ b/gcc/tree-if-conv.c
>> >> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3.  If not see
>> >>  #include "tree-cfgcleanup.h"
>> >>  #include "tree-ssa-dse.h"
>> >>  #include "tree-vectorizer.h"
>> >> +#include "tree-eh.h"
>> >>
>> >>  /* Only handle PHIs with no more arguments unless we are asked to by
>> >>     simd pragma.  */
>> >> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
>> >>  */
>> >>
>> >>  static void
>> >> -predicate_statements (loop_p loop, edge pe)
>> >> +predicate_statements (loop_p loop)
>> >>  {
>> >>    unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>> >>    auto_vec<int, 1> vect_sizes;
>> >> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe)
>> >>                 {
>> >>                   gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
>> >>                   gsi_remove (&gsi2, false);
>> >> -                 /* Make sure to move invariant conversions out of the
>> >> -                    loop.  */
>> >> -                 if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
>> >> -                     && expr_invariant_in_loop_p (loop,
>> >> -                                                  gimple_assign_rhs1 (stmt2)))
>> >> -                   gsi_insert_on_edge_immediate (pe, stmt2);
>> >> -                 else if (first)
>> >> +                 if (first)
>> >>                     {
>> >>                       gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
>> >>                       first = false;
>> >> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop)
>> >>     blocks.  Replace PHI nodes with conditional modify expressions.  */
>> >>
>> >>  static void
>> >> -combine_blocks (class loop *loop, edge pe)
>> >> +combine_blocks (class loop *loop)
>> >>  {
>> >>    basic_block bb, exit_bb, merge_target_bb;
>> >>    unsigned int orig_loop_num_nodes = loop->num_nodes;
>> >> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe)
>> >>    predicate_all_scalar_phis (loop);
>> >>
>> >>    if (need_to_predicate || need_to_rewrite_undefined)
>> >> -    predicate_statements (loop, pe);
>> >> +    predicate_statements (loop);
>> >>
>> >>    /* Merge basic blocks.  */
>> >>    exit_bb = NULL;
>> >> @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop)
>> >>      }
>> >>  }
>> >>
>> >> +/* Return true if STMT can be hoisted from if-converted loop LOOP.  */
>> >> +
>> >> +static bool
>> >> +ifcvt_can_hoist (class loop *loop, gimple *stmt)
>> >> +{
>> >> +  if (auto *call = dyn_cast<gcall *> (stmt))
>> >> +    {
>> >> +      if (gimple_call_internal_p (call)
>> >> +         && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
>> >> +       return false;
>> >> +    }
>> >> +  else if (auto *assign = dyn_cast<gassign *> (stmt))
>> >> +    {
>> >> +      if (gimple_assign_rhs_code (assign) == COND_EXPR)
>> >> +       return false;
>> >> +    }
>> >> +  else
>> >> +    return false;
>> >> +
>> >> +  if (gimple_has_side_effects (stmt)
>> >> +      || gimple_could_trap_p (stmt)
>> >> +      || stmt_could_throw_p (cfun, stmt)
>> >> +      || gimple_vdef (stmt)
>> >> +      || gimple_vuse (stmt))
>> >> +    return false;
>> >> +
>> >> +  int num_args = gimple_num_args (stmt);
>> >> +  for (int i = 0; i < num_args; ++i)
>> >> +    if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
>> >> +      return false;
>> >> +
>> >> +  return true;
>> >> +}
>> >> +
>> >> +/* PE is the preferred hoisting edge selected by tree_if_conversion, which
>> >> +   s known to be different from (and to dominate) the preheader edge of the
>> >> +   if-converted loop.  We already know that STMT can be inserted on the loop
>> >> +   preheader edge.  Return true if we prefer to insert it on PE instead.  */
>> >> +
>> >> +static bool
>> >> +ifcvt_can_hoist_further (edge pe, gimple *stmt)
>> >> +{
>> >> +  /* As explained in tree_if_conversion, we want to hoist invariant
>> >> +     conversions further so that they can be reused by alias analysis.  */
>> >> +  auto *assign = dyn_cast<gassign *> (stmt);
>> >> +  if (assign
>> >> +      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign)))
>> >> +    {
>> >> +      tree rhs = gimple_assign_rhs1 (assign);
>> >> +      if (is_gimple_min_invariant (rhs))
>> >> +       return true;
>> >> +
>> >> +      if (TREE_CODE (rhs) == SSA_NAME)
>> >> +       {
>> >> +         basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
>> >> +         if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
>> >> +           return true;
>> >> +       }
>> >> +    }
>> >> +  return false;
>> >> +}
>> >> +
>> >> +/* Hoist invariant statements from LOOP.  PE is the preferred edge for
>> >> +   hoisting conversions, as selected by tree_if_conversion; see there
>> >> +   for details.  */
>> >> +
>> >> +static void
>> >> +ifcvt_hoist_invariants (class loop *loop, edge pe)
>> >> +{
>> >> +  gimple_stmt_iterator hoist_gsi = {};
>> >> +  gimple_stmt_iterator hoist_gsi_pe = {};
>> >> +  unsigned int num_blocks = loop->num_nodes;
>> >> +  basic_block *body = get_loop_body (loop);
>> >> +  for (unsigned int i = 0; i < num_blocks; ++i)
>> >> +    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
>> >> +      {
>> >> +       gimple *stmt = gsi_stmt (gsi);
>> >> +       if (ifcvt_can_hoist (loop, stmt))
>> >> +         {
>> >> +           /* Once we've hoisted one statement, insert other statements
>> >> +              after it.  */
>> >> +           edge e = loop_preheader_edge (loop);
>> >> +           gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi;
>> >> +           if (e != pe && ifcvt_can_hoist_further (pe, stmt))
>> >
>> > One issue with hoisting to loop_preheader_edge instead of 'pe'
>> > is that eventually the loop versioning code will pick up the defs
>> > through the versioning condition and that will break because the
>> > versioning condition will be inserted in place of the
>> > IFN_VECTORIZED_LOOP, which is _before_ the preheader.
>> > The code basically expects the preheader to be empty.
>>
>> Do you mean that the versioning-for-aliases code assumes that all
>> loop-invariant definitions are available in the versioning condition?
>
> Yes.  Or well, it assumes it can use the .LOOP_VECTORIZED condition
> as versioning condition and there simply use insert the condition without
> further checks.  And the checks are generated utlimatively from SCEV
> which just expands things up to the loops preheader.
>
>> I hadn't realised that's what the comment was saying.  It sounded more
>> like an optimisation.
>
> The hoisting itself is, but that we don't just hoist to the loops preheader
> is for correctness, not an optimization ... it's indeed a bit messy though.

Ah, OK.

How does this version look?  I was in two minds about whether to keep
the expr_invariant_in_loop_p path, but it seemed like the more natural
test for the case that it handles.  I'd be happy to drop it and use
ifcvt_available_on_edge_p instead if you prefer though.

Tested as before.

Thanks,
Richard


gcc/
	* tree-if-conv.c: Include tree-eh.h.
	(predicate_statements): Remove pe argument.  Don't hoist
	statements here.
	(combine_blocks): Remove pe argument.
	(ifcvt_available_on_edge_p, ifcvt_can_hoist): New functions.
	(ifcvt_hoist_invariants): Likewise.
	(tree_if_conversion): Update call to combine_blocks.  Call
	ifcvt_hoist_invariants after VN.

gcc/testsuite/
	* gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.

	Revert:

	2020-09-09  Richard Biener  <rguenther@suse.de>

	* gcc.target/i386/pr87007-4.c: Adjust.
	* gcc.target/i386/pr87007-5.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr99102.c       |   2 +-
 gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
 gcc/tree-if-conv.c                        | 112 +++++++++++++++++++---
 4 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
index 6c1a13f0783..0d030d15c86 100644
--- a/gcc/testsuite/gcc.dg/vect/pr99102.c
+++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
 /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
 long a[44];
 short d, e = -7;
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
index 9c4b8005af3..e91bdcbac44 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -15,4 +15,4 @@ foo (int n, int k)
       d1 = ceil (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index e4d956a5d7f..20d13cf650b 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -15,4 +15,4 @@ foo (int n, int k)
       d1 = sqrt (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index e88ddc9f788..d0ca04600ce 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -121,6 +121,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-ssa-dse.h"
 #include "tree-vectorizer.h"
+#include "tree-eh.h"
 
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
@@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
 */
 
 static void
-predicate_statements (loop_p loop, edge pe)
+predicate_statements (loop_p loop)
 {
   unsigned int i, orig_loop_num_nodes = loop->num_nodes;
   auto_vec<int, 1> vect_sizes;
@@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe)
 		{
 		  gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
 		  gsi_remove (&gsi2, false);
-		  /* Make sure to move invariant conversions out of the
-		     loop.  */
-		  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
-		      && expr_invariant_in_loop_p (loop,
-						   gimple_assign_rhs1 (stmt2)))
-		    gsi_insert_on_edge_immediate (pe, stmt2);
-		  else if (first)
+		  if (first)
 		    {
 		      gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
 		      first = false;
@@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop)
    blocks.  Replace PHI nodes with conditional modify expressions.  */
 
 static void
-combine_blocks (class loop *loop, edge pe)
+combine_blocks (class loop *loop)
 {
   basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe)
   predicate_all_scalar_phis (loop);
 
   if (need_to_predicate || need_to_rewrite_undefined)
-    predicate_statements (loop, pe);
+    predicate_statements (loop);
 
   /* Merge basic blocks.  */
   exit_bb = NULL;
@@ -3181,6 +3176,99 @@ ifcvt_local_dce (class loop *loop)
     }
 }
 
+/* Return true if VALUE is already available on edge PE.  */
+
+static bool
+ifcvt_available_on_edge_p (edge pe, tree value)
+{
+  if (is_gimple_min_invariant (value))
+    return true;
+
+  if (TREE_CODE (value) == SSA_NAME)
+    {
+      basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (value));
+      if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
+	return true;
+    }
+
+  return false;
+}
+
+/* Return true if STMT can be hoisted from if-converted loop LOOP to
+   edge PE.  */
+
+static bool
+ifcvt_can_hoist (class loop *loop, edge pe, gimple *stmt)
+{
+  if (auto *call = dyn_cast<gcall *> (stmt))
+    {
+      if (gimple_call_internal_p (call)
+	  && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
+	return false;
+    }
+  else if (auto *assign = dyn_cast<gassign *> (stmt))
+    {
+      if (gimple_assign_rhs_code (assign) == COND_EXPR)
+	return false;
+    }
+  else
+    return false;
+
+  if (gimple_has_side_effects (stmt)
+      || gimple_could_trap_p (stmt)
+      || stmt_could_throw_p (cfun, stmt)
+      || gimple_vdef (stmt)
+      || gimple_vuse (stmt))
+    return false;
+
+  int num_args = gimple_num_args (stmt);
+  if (pe != loop_preheader_edge (loop))
+    {
+      for (int i = 0; i < num_args; ++i)
+	if (!ifcvt_available_on_edge_p (pe, gimple_arg (stmt, i)))
+	  return false;
+    }
+  else
+    {
+      for (int i = 0; i < num_args; ++i)
+	if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
+	  return false;
+    }
+
+  return true;
+}
+
+/* Hoist invariant statements from LOOP to edge PE.  */
+
+static void
+ifcvt_hoist_invariants (class loop *loop, edge pe)
+{
+  gimple_stmt_iterator hoist_gsi = {};
+  unsigned int num_blocks = loop->num_nodes;
+  basic_block *body = get_loop_body (loop);
+  for (unsigned int i = 0; i < num_blocks; ++i)
+    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
+      {
+	gimple *stmt = gsi_stmt (gsi);
+	if (ifcvt_can_hoist (loop, pe, stmt))
+	  {
+	    /* Once we've hoisted one statement, insert other statements
+	       after it.  */
+	    gsi_remove (&gsi, false);
+	    if (hoist_gsi.ptr)
+	      gsi_insert_after (&hoist_gsi, stmt, GSI_NEW_STMT);
+	    else
+	      {
+		gsi_insert_on_edge_immediate (pe, stmt);
+		hoist_gsi = gsi_for_stmt (stmt);
+	      }
+	    continue;
+	  }
+	gsi_next (&gsi);
+      }
+  free (body);
+}
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3275,7 +3363,7 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   /* Now all statements are if-convertible.  Combine all the basic
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
-  combine_blocks (loop, pe);
+  combine_blocks (loop);
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3297,6 +3385,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   ifcvt_local_dce (loop);
   BITMAP_FREE (exit_bbs);
 
+  ifcvt_hoist_invariants (loop, pe);
+
   todo |= TODO_cleanup_cfg;
 
  cleanup:
  
Richard Biener Nov. 19, 2021, 10:09 a.m. UTC | #5
On Tue, Nov 16, 2021 at 7:05 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Mon, Nov 15, 2021 at 3:00 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches
> >> > <gcc-patches@gcc.gnu.org> wrote:
> >> >>
> >> >> This patch is a prerequisite for a later one.  At the moment,
> >> >> if-conversion converts predicated POINTER_PLUS_EXPRs into
> >> >> non-wrapping forms, which for:
> >> >>
> >> >>     … = base + offset
> >> >>
> >> >> becomes:
> >> >>
> >> >>     tmp = (unsigned long) base
> >> >>     … = tmp + offset
> >> >>
> >> >> It then hoists these conversions out of the loop where possible.
> >> >>
> >> >> However, because “base” is a valid gimple operand, there can be
> >> >> multiple POINTER_PLUS_EXPRs with the same base, which can in turn
> >> >> lead to multiple instances of the same conversion.  The later VN pass
> >> >> is (and I think needs to be) restricted to the new if-converted code,
> >> >> whereas here we're deliberately inserting the conversions before the
> >> >> .LOOP_VECTORIZED condition:
> >> >>
> >> >>         /* If we versioned loop then make sure to insert invariant
> >> >>            stmts before the .LOOP_VECTORIZED check since the vectorizer
> >> >>            will re-use that for things like runtime alias versioning
> >> >>            whose condition can end up using those invariants.  */
> >> >>
> >> >> We can therefore enter the vectoriser with redundant conversions.
> >> >>
> >> >> The easiest fix seemed to be to defer the hoisting until after VN.
> >> >> This catches other hoisting opportunities too.
> >> >>
> >> >> Hoisting the code from the (artificial) loop in pr99102.c means
> >> >> that it's no longer worth vectorising.  The patch forces vectorisation
> >> >> instead of relying on the cost model.
> >> >>
> >> >> The patch also reverts pr87007-4.c and pr87007-5.c back to their
> >> >> original forms, undoing changes in 783dc66f9ccb0019c3dad.
> >> >> The code at the time the tests were added was:
> >> >>
> >> >>         testl   %edi, %edi
> >> >>         je      .L10
> >> >>         vxorps  %xmm1, %xmm1, %xmm1
> >> >>         vsqrtsd d3(%rip), %xmm1, %xmm0
> >> >>         vsqrtsd d2(%rip), %xmm1, %xmm1
> >> >>         ...
> >> >> .L10:
> >> >>         ret
> >> >>
> >> >> with the operations being hoisted, and the vxorps was specifically
> >> >> wanted (compared to the previous code).  This patch restores the code
> >> >> to that form, with the hoisted operations and the vxorps.
> >> >>
> >> >> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >> >>
> >> >> Richard
> >> >>
> >> >>
> >> >> gcc/
> >> >>         * tree-if-conv.c: Include tree-eh.h.
> >> >>         (predicate_statements): Remove pe argument.  Don't hoist
> >> >>         statements here.
> >> >>         (combine_blocks): Remove pe argument.
> >> >>         (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions.
> >> >>         (ifcvt_hoist_invariants): Likewise.
> >> >>         (tree_if_conversion): Update call to combine_blocks.  Call
> >> >>         ifcvt_hoist_invariants after VN.
> >> >>
> >> >> gcc/testsuite/
> >> >>         * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.
> >> >>
> >> >>         Revert:
> >> >>
> >> >>         2020-09-09  Richard Biener  <rguenther@suse.de>
> >> >>
> >> >>         * gcc.target/i386/pr87007-4.c: Adjust.
> >> >>         * gcc.target/i386/pr87007-5.c: Likewise.
> >> >> ---
> >> >>  gcc/testsuite/gcc.dg/vect/pr99102.c       |   2 +-
> >> >>  gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
> >> >>  gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
> >> >>  gcc/tree-if-conv.c                        | 122 ++++++++++++++++++++--
> >> >>  4 files changed, 114 insertions(+), 14 deletions(-)
> >> >>
> >> >> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
> >> >> index 6c1a13f0783..0d030d15c86 100644
> >> >> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c
> >> >> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
> >> >> @@ -1,4 +1,4 @@
> >> >> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> >> >> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
> >> >>  /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
> >> >>  long a[44];
> >> >>  short d, e = -7;
> >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
> >> >> index 9c4b8005af3..e91bdcbac44 100644
> >> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
> >> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
> >> >> @@ -15,4 +15,4 @@ foo (int n, int k)
> >> >>        d1 = ceil (d3);
> >> >>  }
> >> >>
> >> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> >> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> >> >> index e4d956a5d7f..20d13cf650b 100644
> >> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> >> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> >> >> @@ -15,4 +15,4 @@ foo (int n, int k)
> >> >>        d1 = sqrt (d3);
> >> >>  }
> >> >>
> >> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> >> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> >> >> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> >> >> index e88ddc9f788..0ad557a2f4d 100644
> >> >> --- a/gcc/tree-if-conv.c
> >> >> +++ b/gcc/tree-if-conv.c
> >> >> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3.  If not see
> >> >>  #include "tree-cfgcleanup.h"
> >> >>  #include "tree-ssa-dse.h"
> >> >>  #include "tree-vectorizer.h"
> >> >> +#include "tree-eh.h"
> >> >>
> >> >>  /* Only handle PHIs with no more arguments unless we are asked to by
> >> >>     simd pragma.  */
> >> >> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
> >> >>  */
> >> >>
> >> >>  static void
> >> >> -predicate_statements (loop_p loop, edge pe)
> >> >> +predicate_statements (loop_p loop)
> >> >>  {
> >> >>    unsigned int i, orig_loop_num_nodes = loop->num_nodes;
> >> >>    auto_vec<int, 1> vect_sizes;
> >> >> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe)
> >> >>                 {
> >> >>                   gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
> >> >>                   gsi_remove (&gsi2, false);
> >> >> -                 /* Make sure to move invariant conversions out of the
> >> >> -                    loop.  */
> >> >> -                 if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
> >> >> -                     && expr_invariant_in_loop_p (loop,
> >> >> -                                                  gimple_assign_rhs1 (stmt2)))
> >> >> -                   gsi_insert_on_edge_immediate (pe, stmt2);
> >> >> -                 else if (first)
> >> >> +                 if (first)
> >> >>                     {
> >> >>                       gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
> >> >>                       first = false;
> >> >> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop)
> >> >>     blocks.  Replace PHI nodes with conditional modify expressions.  */
> >> >>
> >> >>  static void
> >> >> -combine_blocks (class loop *loop, edge pe)
> >> >> +combine_blocks (class loop *loop)
> >> >>  {
> >> >>    basic_block bb, exit_bb, merge_target_bb;
> >> >>    unsigned int orig_loop_num_nodes = loop->num_nodes;
> >> >> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe)
> >> >>    predicate_all_scalar_phis (loop);
> >> >>
> >> >>    if (need_to_predicate || need_to_rewrite_undefined)
> >> >> -    predicate_statements (loop, pe);
> >> >> +    predicate_statements (loop);
> >> >>
> >> >>    /* Merge basic blocks.  */
> >> >>    exit_bb = NULL;
> >> >> @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop)
> >> >>      }
> >> >>  }
> >> >>
> >> >> +/* Return true if STMT can be hoisted from if-converted loop LOOP.  */
> >> >> +
> >> >> +static bool
> >> >> +ifcvt_can_hoist (class loop *loop, gimple *stmt)
> >> >> +{
> >> >> +  if (auto *call = dyn_cast<gcall *> (stmt))
> >> >> +    {
> >> >> +      if (gimple_call_internal_p (call)
> >> >> +         && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
> >> >> +       return false;
> >> >> +    }
> >> >> +  else if (auto *assign = dyn_cast<gassign *> (stmt))
> >> >> +    {
> >> >> +      if (gimple_assign_rhs_code (assign) == COND_EXPR)
> >> >> +       return false;
> >> >> +    }
> >> >> +  else
> >> >> +    return false;
> >> >> +
> >> >> +  if (gimple_has_side_effects (stmt)
> >> >> +      || gimple_could_trap_p (stmt)
> >> >> +      || stmt_could_throw_p (cfun, stmt)
> >> >> +      || gimple_vdef (stmt)
> >> >> +      || gimple_vuse (stmt))
> >> >> +    return false;
> >> >> +
> >> >> +  int num_args = gimple_num_args (stmt);
> >> >> +  for (int i = 0; i < num_args; ++i)
> >> >> +    if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
> >> >> +      return false;
> >> >> +
> >> >> +  return true;
> >> >> +}
> >> >> +
> >> >> +/* PE is the preferred hoisting edge selected by tree_if_conversion, which
> >> >> +   s known to be different from (and to dominate) the preheader edge of the
> >> >> +   if-converted loop.  We already know that STMT can be inserted on the loop
> >> >> +   preheader edge.  Return true if we prefer to insert it on PE instead.  */
> >> >> +
> >> >> +static bool
> >> >> +ifcvt_can_hoist_further (edge pe, gimple *stmt)
> >> >> +{
> >> >> +  /* As explained in tree_if_conversion, we want to hoist invariant
> >> >> +     conversions further so that they can be reused by alias analysis.  */
> >> >> +  auto *assign = dyn_cast<gassign *> (stmt);
> >> >> +  if (assign
> >> >> +      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign)))
> >> >> +    {
> >> >> +      tree rhs = gimple_assign_rhs1 (assign);
> >> >> +      if (is_gimple_min_invariant (rhs))
> >> >> +       return true;
> >> >> +
> >> >> +      if (TREE_CODE (rhs) == SSA_NAME)
> >> >> +       {
> >> >> +         basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
> >> >> +         if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
> >> >> +           return true;
> >> >> +       }
> >> >> +    }
> >> >> +  return false;
> >> >> +}
> >> >> +
> >> >> +/* Hoist invariant statements from LOOP.  PE is the preferred edge for
> >> >> +   hoisting conversions, as selected by tree_if_conversion; see there
> >> >> +   for details.  */
> >> >> +
> >> >> +static void
> >> >> +ifcvt_hoist_invariants (class loop *loop, edge pe)
> >> >> +{
> >> >> +  gimple_stmt_iterator hoist_gsi = {};
> >> >> +  gimple_stmt_iterator hoist_gsi_pe = {};
> >> >> +  unsigned int num_blocks = loop->num_nodes;
> >> >> +  basic_block *body = get_loop_body (loop);
> >> >> +  for (unsigned int i = 0; i < num_blocks; ++i)
> >> >> +    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
> >> >> +      {
> >> >> +       gimple *stmt = gsi_stmt (gsi);
> >> >> +       if (ifcvt_can_hoist (loop, stmt))
> >> >> +         {
> >> >> +           /* Once we've hoisted one statement, insert other statements
> >> >> +              after it.  */
> >> >> +           edge e = loop_preheader_edge (loop);
> >> >> +           gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi;
> >> >> +           if (e != pe && ifcvt_can_hoist_further (pe, stmt))
> >> >
> >> > One issue with hoisting to loop_preheader_edge instead of 'pe'
> >> > is that eventually the loop versioning code will pick up the defs
> >> > through the versioning condition and that will break because the
> >> > versioning condition will be inserted in place of the
> >> > IFN_VECTORIZED_LOOP, which is _before_ the preheader.
> >> > The code basically expects the preheader to be empty.
> >>
> >> Do you mean that the versioning-for-aliases code assumes that all
> >> loop-invariant definitions are available in the versioning condition?
> >
> > Yes.  Or well, it assumes it can use the .LOOP_VECTORIZED condition
> > as versioning condition and there simply use insert the condition without
> > further checks.  And the checks are generated utlimatively from SCEV
> > which just expands things up to the loops preheader.
> >
> >> I hadn't realised that's what the comment was saying.  It sounded more
> >> like an optimisation.
> >
> > The hoisting itself is, but that we don't just hoist to the loops preheader
> > is for correctness, not an optimization ... it's indeed a bit messy though.
>
> Ah, OK.
>
> How does this version look?  I was in two minds about whether to keep
> the expr_invariant_in_loop_p path, but it seemed like the more natural
> test for the case that it handles.  I'd be happy to drop it and use
> ifcvt_available_on_edge_p instead if you prefer though.
>
> Tested as before.

OK.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
>         * tree-if-conv.c: Include tree-eh.h.
>         (predicate_statements): Remove pe argument.  Don't hoist
>         statements here.
>         (combine_blocks): Remove pe argument.
>         (ifcvt_available_on_edge_p, ifcvt_can_hoist): New functions.
>         (ifcvt_hoist_invariants): Likewise.
>         (tree_if_conversion): Update call to combine_blocks.  Call
>         ifcvt_hoist_invariants after VN.
>
> gcc/testsuite/
>         * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.
>
>         Revert:
>
>         2020-09-09  Richard Biener  <rguenther@suse.de>
>
>         * gcc.target/i386/pr87007-4.c: Adjust.
>         * gcc.target/i386/pr87007-5.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/pr99102.c       |   2 +-
>  gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
>  gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
>  gcc/tree-if-conv.c                        | 112 +++++++++++++++++++---
>  4 files changed, 104 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
> index 6c1a13f0783..0d030d15c86 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
> @@ -1,4 +1,4 @@
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
>  /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
>  long a[44];
>  short d, e = -7;
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
> index 9c4b8005af3..e91bdcbac44 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
> @@ -15,4 +15,4 @@ foo (int n, int k)
>        d1 = ceil (d3);
>  }
>
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> index e4d956a5d7f..20d13cf650b 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> @@ -15,4 +15,4 @@ foo (int n, int k)
>        d1 = sqrt (d3);
>  }
>
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index e88ddc9f788..d0ca04600ce 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-cfgcleanup.h"
>  #include "tree-ssa-dse.h"
>  #include "tree-vectorizer.h"
> +#include "tree-eh.h"
>
>  /* Only handle PHIs with no more arguments unless we are asked to by
>     simd pragma.  */
> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
>  */
>
>  static void
> -predicate_statements (loop_p loop, edge pe)
> +predicate_statements (loop_p loop)
>  {
>    unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>    auto_vec<int, 1> vect_sizes;
> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe)
>                 {
>                   gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
>                   gsi_remove (&gsi2, false);
> -                 /* Make sure to move invariant conversions out of the
> -                    loop.  */
> -                 if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
> -                     && expr_invariant_in_loop_p (loop,
> -                                                  gimple_assign_rhs1 (stmt2)))
> -                   gsi_insert_on_edge_immediate (pe, stmt2);
> -                 else if (first)
> +                 if (first)
>                     {
>                       gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
>                       first = false;
> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop)
>     blocks.  Replace PHI nodes with conditional modify expressions.  */
>
>  static void
> -combine_blocks (class loop *loop, edge pe)
> +combine_blocks (class loop *loop)
>  {
>    basic_block bb, exit_bb, merge_target_bb;
>    unsigned int orig_loop_num_nodes = loop->num_nodes;
> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe)
>    predicate_all_scalar_phis (loop);
>
>    if (need_to_predicate || need_to_rewrite_undefined)
> -    predicate_statements (loop, pe);
> +    predicate_statements (loop);
>
>    /* Merge basic blocks.  */
>    exit_bb = NULL;
> @@ -3181,6 +3176,99 @@ ifcvt_local_dce (class loop *loop)
>      }
>  }
>
> +/* Return true if VALUE is already available on edge PE.  */
> +
> +static bool
> +ifcvt_available_on_edge_p (edge pe, tree value)
> +{
> +  if (is_gimple_min_invariant (value))
> +    return true;
> +
> +  if (TREE_CODE (value) == SSA_NAME)
> +    {
> +      basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (value));
> +      if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
> +       return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Return true if STMT can be hoisted from if-converted loop LOOP to
> +   edge PE.  */
> +
> +static bool
> +ifcvt_can_hoist (class loop *loop, edge pe, gimple *stmt)
> +{
> +  if (auto *call = dyn_cast<gcall *> (stmt))
> +    {
> +      if (gimple_call_internal_p (call)
> +         && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
> +       return false;
> +    }
> +  else if (auto *assign = dyn_cast<gassign *> (stmt))
> +    {
> +      if (gimple_assign_rhs_code (assign) == COND_EXPR)
> +       return false;
> +    }
> +  else
> +    return false;
> +
> +  if (gimple_has_side_effects (stmt)
> +      || gimple_could_trap_p (stmt)
> +      || stmt_could_throw_p (cfun, stmt)
> +      || gimple_vdef (stmt)
> +      || gimple_vuse (stmt))
> +    return false;
> +
> +  int num_args = gimple_num_args (stmt);
> +  if (pe != loop_preheader_edge (loop))
> +    {
> +      for (int i = 0; i < num_args; ++i)
> +       if (!ifcvt_available_on_edge_p (pe, gimple_arg (stmt, i)))
> +         return false;
> +    }
> +  else
> +    {
> +      for (int i = 0; i < num_args; ++i)
> +       if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
> +         return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Hoist invariant statements from LOOP to edge PE.  */
> +
> +static void
> +ifcvt_hoist_invariants (class loop *loop, edge pe)
> +{
> +  gimple_stmt_iterator hoist_gsi = {};
> +  unsigned int num_blocks = loop->num_nodes;
> +  basic_block *body = get_loop_body (loop);
> +  for (unsigned int i = 0; i < num_blocks; ++i)
> +    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
> +      {
> +       gimple *stmt = gsi_stmt (gsi);
> +       if (ifcvt_can_hoist (loop, pe, stmt))
> +         {
> +           /* Once we've hoisted one statement, insert other statements
> +              after it.  */
> +           gsi_remove (&gsi, false);
> +           if (hoist_gsi.ptr)
> +             gsi_insert_after (&hoist_gsi, stmt, GSI_NEW_STMT);
> +           else
> +             {
> +               gsi_insert_on_edge_immediate (pe, stmt);
> +               hoist_gsi = gsi_for_stmt (stmt);
> +             }
> +           continue;
> +         }
> +       gsi_next (&gsi);
> +      }
> +  free (body);
> +}
> +
>  /* If-convert LOOP when it is legal.  For the moment this pass has no
>     profitability analysis.  Returns non-zero todo flags when something
>     changed.  */
> @@ -3275,7 +3363,7 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>    /* Now all statements are if-convertible.  Combine all the basic
>       blocks into one huge basic block doing the if-conversion
>       on-the-fly.  */
> -  combine_blocks (loop, pe);
> +  combine_blocks (loop);
>
>    /* Perform local CSE, this esp. helps the vectorizer analysis if loads
>       and stores are involved.  CSE only the loop body, not the entry
> @@ -3297,6 +3385,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>    ifcvt_local_dce (loop);
>    BITMAP_FREE (exit_bbs);
>
> +  ifcvt_hoist_invariants (loop, pe);
> +
>    todo |= TODO_cleanup_cfg;
>
>   cleanup:
> --
> 2.25.1
>
  

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
index 6c1a13f0783..0d030d15c86 100644
--- a/gcc/testsuite/gcc.dg/vect/pr99102.c
+++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
@@ -1,4 +1,4 @@ 
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
 /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
 long a[44];
 short d, e = -7;
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
index 9c4b8005af3..e91bdcbac44 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -15,4 +15,4 @@  foo (int n, int k)
       d1 = ceil (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index e4d956a5d7f..20d13cf650b 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -15,4 +15,4 @@  foo (int n, int k)
       d1 = sqrt (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index e88ddc9f788..0ad557a2f4d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -121,6 +121,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-ssa-dse.h"
 #include "tree-vectorizer.h"
+#include "tree-eh.h"
 
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
@@ -2496,7 +2497,7 @@  predicate_rhs_code (gassign *stmt, tree mask, tree cond,
 */
 
 static void
-predicate_statements (loop_p loop, edge pe)
+predicate_statements (loop_p loop)
 {
   unsigned int i, orig_loop_num_nodes = loop->num_nodes;
   auto_vec<int, 1> vect_sizes;
@@ -2597,13 +2598,7 @@  predicate_statements (loop_p loop, edge pe)
 		{
 		  gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
 		  gsi_remove (&gsi2, false);
-		  /* Make sure to move invariant conversions out of the
-		     loop.  */
-		  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
-		      && expr_invariant_in_loop_p (loop,
-						   gimple_assign_rhs1 (stmt2)))
-		    gsi_insert_on_edge_immediate (pe, stmt2);
-		  else if (first)
+		  if (first)
 		    {
 		      gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
 		      first = false;
@@ -2684,7 +2679,7 @@  remove_conditions_and_labels (loop_p loop)
    blocks.  Replace PHI nodes with conditional modify expressions.  */
 
 static void
-combine_blocks (class loop *loop, edge pe)
+combine_blocks (class loop *loop)
 {
   basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -2697,7 +2692,7 @@  combine_blocks (class loop *loop, edge pe)
   predicate_all_scalar_phis (loop);
 
   if (need_to_predicate || need_to_rewrite_undefined)
-    predicate_statements (loop, pe);
+    predicate_statements (loop);
 
   /* Merge basic blocks.  */
   exit_bb = NULL;
@@ -3181,6 +3176,109 @@  ifcvt_local_dce (class loop *loop)
     }
 }
 
+/* Return true if STMT can be hoisted from if-converted loop LOOP.  */
+
+static bool
+ifcvt_can_hoist (class loop *loop, gimple *stmt)
+{
+  if (auto *call = dyn_cast<gcall *> (stmt))
+    {
+      if (gimple_call_internal_p (call)
+	  && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
+	return false;
+    }
+  else if (auto *assign = dyn_cast<gassign *> (stmt))
+    {
+      if (gimple_assign_rhs_code (assign) == COND_EXPR)
+	return false;
+    }
+  else
+    return false;
+
+  if (gimple_has_side_effects (stmt)
+      || gimple_could_trap_p (stmt)
+      || stmt_could_throw_p (cfun, stmt)
+      || gimple_vdef (stmt)
+      || gimple_vuse (stmt))
+    return false;
+
+  int num_args = gimple_num_args (stmt);
+  for (int i = 0; i < num_args; ++i)
+    if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
+      return false;
+
+  return true;
+}
+
+/* PE is the preferred hoisting edge selected by tree_if_conversion, which
+   s known to be different from (and to dominate) the preheader edge of the
+   if-converted loop.  We already know that STMT can be inserted on the loop
+   preheader edge.  Return true if we prefer to insert it on PE instead.  */
+
+static bool
+ifcvt_can_hoist_further (edge pe, gimple *stmt)
+{
+  /* As explained in tree_if_conversion, we want to hoist invariant
+     conversions further so that they can be reused by alias analysis.  */
+  auto *assign = dyn_cast<gassign *> (stmt);
+  if (assign
+      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign)))
+    {
+      tree rhs = gimple_assign_rhs1 (assign);
+      if (is_gimple_min_invariant (rhs))
+	return true;
+
+      if (TREE_CODE (rhs) == SSA_NAME)
+	{
+	  basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
+	  if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
+	    return true;
+	}
+    }
+  return false;
+}
+
+/* Hoist invariant statements from LOOP.  PE is the preferred edge for
+   hoisting conversions, as selected by tree_if_conversion; see there
+   for details.  */
+
+static void
+ifcvt_hoist_invariants (class loop *loop, edge pe)
+{
+  gimple_stmt_iterator hoist_gsi = {};
+  gimple_stmt_iterator hoist_gsi_pe = {};
+  unsigned int num_blocks = loop->num_nodes;
+  basic_block *body = get_loop_body (loop);
+  for (unsigned int i = 0; i < num_blocks; ++i)
+    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
+      {
+	gimple *stmt = gsi_stmt (gsi);
+	if (ifcvt_can_hoist (loop, stmt))
+	  {
+	    /* Once we've hoisted one statement, insert other statements
+	       after it.  */
+	    edge e = loop_preheader_edge (loop);
+	    gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi;
+	    if (e != pe && ifcvt_can_hoist_further (pe, stmt))
+	      {
+		e = pe;
+		hoist_gsi_ptr = &hoist_gsi_pe;
+	      }
+	    gsi_remove (&gsi, false);
+	    if (hoist_gsi_ptr->ptr)
+	      gsi_insert_after (hoist_gsi_ptr, stmt, GSI_NEW_STMT);
+	    else
+	      {
+		gsi_insert_on_edge_immediate (e, stmt);
+		*hoist_gsi_ptr = gsi_for_stmt (stmt);
+	      }
+	    continue;
+	  }
+	gsi_next (&gsi);
+      }
+  free (body);
+}
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3275,7 +3373,7 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   /* Now all statements are if-convertible.  Combine all the basic
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
-  combine_blocks (loop, pe);
+  combine_blocks (loop);
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3297,6 +3395,8 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   ifcvt_local_dce (loop);
   BITMAP_FREE (exit_bbs);
 
+  ifcvt_hoist_invariants (loop, pe);
+
   todo |= TODO_cleanup_cfg;
 
  cleanup: