tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

Message ID SN6PR0102MB348749DE7F642DFC46298CEBE186A@SN6PR0102MB3487.prod.exchangelabs.com
State New
Headers
Series tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 fail Testing failed

Commit Message

Hao Liu OS Dec. 4, 2023, 10:05 a.m. UTC
  Loop vecotorization can not optimize following case due to SCEV is not affine
failure (i+offset may overflow):

    int A[1024 * 2];
    
    int foo (unsigned offset, unsigned N) 
    {
      int sum = 0;
      for (unsigned i = 0; i < N; i++)
        sum += A[i + offset];
      return sum;
    }

Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
can not overflow) by inferring from undefined behaviors like array access (see
record_nonwrapping_iv). But this information is not shared to SCEV yet. This
patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it 
to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop 
vectorization could succeed.

The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
"{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two
SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to 
set and check the flag respectively.

However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
reset_scev_htab), which may not be synchronized with the calling of
free_numbers_of_iterations_estimates, which set the loop->estimate_state to
EST_NOT_COMPUTED and make sure the above inferring from array access is called.
In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
cache if the loop->estimate_state is not reset.
E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the 
estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
loop vectorization.

This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
vect_analyze_loop, which will make sure the flag is always set propriately in
loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
only), if there is more reasonable method, I will revert the simple fix and try
that.

This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions. 
OK for trunk?

---
The patch is as following:

[PATCH] SCEV: extend the chrec tree with a nonwrapping flag
 [PR112774]

The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
"{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
added to set and check the flag respectively.

As resetting the SCEV cache (i.e., the chrec trees) may not reset the
loop->estimate_state, free_numbers_of_iterations_estimates is called
explicitly in loop vectorization to make sure the flag can be
calculated propriately by niter.

gcc/ChangeLog:

	PR tree-optimization/112774
	* tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
	printed with additional <nw> info.
	* tree-scalar-evolution.cc: add record_nonwrapping_chrec and
	nonwrapping_chrec_p to set and check the new flag respectively.
	* tree-scalar-evolution.h: Likewise.
	* tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
	infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
	scev_probably_wraps_p): call record_nonwrapping_chrec before
	record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
	return false from scev_probably_wraps_p.
	* tree-vect-loop.cc (vect_analyze_loop): call
	free_numbers_of_iterations_estimates explicitly.
	* gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
	to represent the nonwrapping info.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/scev-16.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++
 gcc/tree-pretty-print.cc                |  2 +-
 gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
 gcc/tree-scalar-evolution.h             |  2 ++
 gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
 gcc/tree-vect-loop.cc                   |  4 ++++
 gcc/tree.h                              |  8 +++++---
 7 files changed, 69 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
  

Comments

Hao Liu OS Dec. 6, 2023, 9:46 a.m. UTC | #1
Hi,

Update the patch to fix problems in the test case:
 - add "-details" option to the dump command
 - add dg-require and target filters to avoid potential failures on platforms that don't support vectorization.

Thanks,
-Hao

gcc/ChangeLog:

        PR tree-optimization/112774
        * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
        printed with additional <nw> info.
        * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
        nonwrapping_chrec_p to set and check the new flag respectively.
        * tree-scalar-evolution.h: Likewise.
        * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
        infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
        scev_probably_wraps_p): call record_nonwrapping_chrec before
        record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
        return false from scev_probably_wraps_p.
        * tree-vect-loop.cc (vect_analyze_loop): call
        free_numbers_of_iterations_estimates explicitly.
        * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
        to represent the nonwrapping info.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/scev-16.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++++++++++++++++++
 gcc/tree-pretty-print.cc                |  2 +-
 gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
 gcc/tree-scalar-evolution.h             |  2 ++
 gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
 gcc/tree-vect-loop.cc                   |  4 ++++
 gcc/tree.h                              |  8 +++++---
 7 files changed, 70 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
new file mode 100644
index 00000000000..120f40c0b6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+
+int A[1024 * 2];
+
+int foo (unsigned offset, unsigned N)
+{
+  int sum = 0;
+
+  for (unsigned i = 0; i < N; i++)
+    sum += A[i + offset];
+
+  return sum;
+}
+
+/* Loop can be vectorized by referring "i + offset" is nonwrapping from array.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { avr-*-* msp430-*-* pru-*-* } } } } } */
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 1fadd752d05..0dabb6d1580 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
       dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
       pp_string (pp, ", +, ");
       dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
-      pp_string (pp, "}_");
+      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
       pp_scalar (pp, "%u", CHREC_VARIABLE (node));
       is_stmt = false;
       break;
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index f61277c32df..81630603c12 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
   return res;
 }

+/* If CHREC doesn't overflow, set the nonwrapping flag.  */
+
+void record_nonwrapping_chrec (tree chrec)
+{
+  CHREC_NOWRAP(chrec) = 1;
+
+  if (dump_file && (dump_flags & TDF_SCEV))
+    {
+      fprintf (dump_file, "(record_nonwrapping_chrec: ");
+      print_generic_expr (dump_file, chrec);
+      fprintf (dump_file, ")\n");
+    }
+}
+
+/* Return true if CHREC's nonwrapping flag is set.  */
+
+bool nonwrapping_chrec_p (tree chrec)
+{
+  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
+    return false;
+
+  return CHREC_NOWRAP(chrec);
+}
+
 /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */

 static tree
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index a64ed78fe63..f57fde12ee2 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
                       bool);
 extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
+extern void record_nonwrapping_chrec (tree);
+extern bool nonwrapping_chrec_p (tree);

 /* Returns the basic block preceding LOOP, or the CFG entry block when
    the loop is function's body.  */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 2098bef9a97..d465e0ed7e1 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)

   /* If access is not executed on every iteration, we must ensure that overlow
      may not make the access valid later.  */
-  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
-      && scev_probably_wraps_p (NULL_TREE,
-                               initial_condition_in_loop_num (ev, loop->num),
-                               step, data->stmt, loop, true))
-    upper = false;
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
+    {
+      if (scev_probably_wraps_p (NULL_TREE,
+                                initial_condition_in_loop_num (ev, loop->num),
+                                step, data->stmt, loop, true))
+       upper = false;
+    }
+  else
+    record_nonwrapping_chrec (ev);

   record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
   return true;
@@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
   if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
     low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));

+  record_nonwrapping_chrec (scev);
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }

@@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
       high = wide_int_to_tree (type, r.upper_bound ());
     }

+  record_nonwrapping_chrec (scev);
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }

@@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
   if (loop_exits_before_overflow (base, step, at_stmt, loop))
     return false;

+  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
+     above loop exits before overflow).  */
+  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
+    return false;
+
   /* At this point we still don't have a proof that the iv does not
      overflow: give up.  */
   return true;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3df020d2228..7f278e2d8f4 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
         analysis are done under the assumptions.  */
       loop_constraint_set (loop, LOOP_C_FINITE);
     }
+  else
+    /* Clear the existing niter information to make sure the nonwrapping flag
+       will be calculated and set propriately.  */
+    free_numbers_of_iterations_estimates (loop);

   auto_vector_modes vector_modes;
   /* Autodetect first vector size we try.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index 086b55f0375..59af8920f02 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
 #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))

 /* Accessors for the chains of recurrences.  */
-#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
-#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
-#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
+#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
+#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
+#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag

 /* LABEL_EXPR accessor. This gives access to the label associated with
    the given label expression.  */
--
2.34.1
  
Richard Biener Dec. 6, 2023, 11:49 a.m. UTC | #2
On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS <hliu@os.amperecomputing.com> wrote:
>
> Hi,
>
> Update the patch to fix problems in the test case:
>  - add "-details" option to the dump command
>  - add dg-require and target filters to avoid potential failures on platforms that don't support vectorization.

Interesting simple trick - the downside is that this makes the
recursive dependence
of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
set the flag on CHRECs that are not necessarily cached, so I'm not sure how
effective this will be ...

Can you try to do some statistics on say SPEC CPU?  I'm usually
building (with -j1) with -fopt-info-vec and diff build logs, you can then see
how many more loops (and which) we vectorize additionally?

Thanks,
Richard.

> Thanks,
> -Hao
>
> gcc/ChangeLog:
>
>         PR tree-optimization/112774
>         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
>         printed with additional <nw> info.
>         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
>         nonwrapping_chrec_p to set and check the new flag respectively.
>         * tree-scalar-evolution.h: Likewise.
>         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
>         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
>         scev_probably_wraps_p): call record_nonwrapping_chrec before
>         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
>         return false from scev_probably_wraps_p.
>         * tree-vect-loop.cc (vect_analyze_loop): call
>         free_numbers_of_iterations_estimates explicitly.
>         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
>         to represent the nonwrapping info.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/scev-16.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++++++++++++++++++
>  gcc/tree-pretty-print.cc                |  2 +-
>  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
>  gcc/tree-scalar-evolution.h             |  2 ++
>  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
>  gcc/tree-vect-loop.cc                   |  4 ++++
>  gcc/tree.h                              |  8 +++++---
>  7 files changed, 70 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> new file mode 100644
> index 00000000000..120f40c0b6c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +
> +int A[1024 * 2];
> +
> +int foo (unsigned offset, unsigned N)
> +{
> +  int sum = 0;
> +
> +  for (unsigned i = 0; i < N; i++)
> +    sum += A[i + offset];
> +
> +  return sum;
> +}
> +
> +/* Loop can be vectorized by referring "i + offset" is nonwrapping from array.  */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { avr-*-* msp430-*-* pru-*-* } } } } } */
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 1fadd752d05..0dabb6d1580 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
>        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
>        pp_string (pp, ", +, ");
>        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> -      pp_string (pp, "}_");
> +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
>        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
>        is_stmt = false;
>        break;
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index f61277c32df..81630603c12 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
>    return res;
>  }
>
> +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> +
> +void record_nonwrapping_chrec (tree chrec)
> +{
> +  CHREC_NOWRAP(chrec) = 1;
> +
> +  if (dump_file && (dump_flags & TDF_SCEV))
> +    {
> +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> +      print_generic_expr (dump_file, chrec);
> +      fprintf (dump_file, ")\n");
> +    }
> +}
> +
> +/* Return true if CHREC's nonwrapping flag is set.  */
> +
> +bool nonwrapping_chrec_p (tree chrec)
> +{
> +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> +    return false;
> +
> +  return CHREC_NOWRAP(chrec);
> +}
> +
>  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
>
>  static tree
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index a64ed78fe63..f57fde12ee2 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
>                        bool);
>  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> +extern void record_nonwrapping_chrec (tree);
> +extern bool nonwrapping_chrec_p (tree);
>
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
>     the loop is function's body.  */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 2098bef9a97..d465e0ed7e1 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
>
>    /* If access is not executed on every iteration, we must ensure that overlow
>       may not make the access valid later.  */
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> -      && scev_probably_wraps_p (NULL_TREE,
> -                               initial_condition_in_loop_num (ev, loop->num),
> -                               step, data->stmt, loop, true))
> -    upper = false;
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> +    {
> +      if (scev_probably_wraps_p (NULL_TREE,
> +                                initial_condition_in_loop_num (ev, loop->num),
> +                                step, data->stmt, loop, true))
> +       upper = false;
> +    }
> +  else
> +    record_nonwrapping_chrec (ev);
>
>    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
> @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
>    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
>      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
>        high = wide_int_to_tree (type, r.upper_bound ());
>      }
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
>    if (loop_exits_before_overflow (base, step, at_stmt, loop))
>      return false;
>
> +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> +     above loop exits before overflow).  */
> +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> +    return false;
> +
>    /* At this point we still don't have a proof that the iv does not
>       overflow: give up.  */
>    return true;
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3df020d2228..7f278e2d8f4 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>          analysis are done under the assumptions.  */
>        loop_constraint_set (loop, LOOP_C_FINITE);
>      }
> +  else
> +    /* Clear the existing niter information to make sure the nonwrapping flag
> +       will be calculated and set propriately.  */
> +    free_numbers_of_iterations_estimates (loop);
>
>    auto_vector_modes vector_modes;
>    /* Autodetect first vector size we try.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 086b55f0375..59af8920f02 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
>  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
>
>  /* Accessors for the chains of recurrences.  */
> -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
>
>  /* LABEL_EXPR accessor. This gives access to the label associated with
>     the given label expression.  */
> --
> 2.34.1
>
>
> ________________________________________
> From: Hao Liu OS <hliu@os.amperecomputing.com>
> Sent: Monday, December 4, 2023 18:05
> To: GCC-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
>
> Loop vecotorization can not optimize following case due to SCEV is not affine
> failure (i+offset may overflow):
>
>     int A[1024 * 2];
>
>     int foo (unsigned offset, unsigned N)
>     {
>       int sum = 0;
>       for (unsigned i = 0; i < N; i++)
>         sum += A[i + offset];
>       return sum;
>     }
>
> Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
> can not overflow) by inferring from undefined behaviors like array access (see
> record_nonwrapping_iv). But this information is not shared to SCEV yet. This
> patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it
> to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop
> vectorization could succeed.
>
> The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
> "{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two
> SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to
> set and check the flag respectively.
>
> However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
> reset_scev_htab), which may not be synchronized with the calling of
> free_numbers_of_iterations_estimates, which set the loop->estimate_state to
> EST_NOT_COMPUTED and make sure the above inferring from array access is called.
> In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
> cache if the loop->estimate_state is not reset.
> E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
> which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the
> estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
> loop vectorization.
>
> This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
> vect_analyze_loop, which will make sure the flag is always set propriately in
> loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
> only), if there is more reasonable method, I will revert the simple fix and try
> that.
>
> This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions.
> OK for trunk?
>
> ---
> The patch is as following:
>
> [PATCH] SCEV: extend the chrec tree with a nonwrapping flag
>  [PR112774]
>
> The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
> "{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
> Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
> added to set and check the flag respectively.
>
> As resetting the SCEV cache (i.e., the chrec trees) may not reset the
> loop->estimate_state, free_numbers_of_iterations_estimates is called
> explicitly in loop vectorization to make sure the flag can be
> calculated propriately by niter.
>
> gcc/ChangeLog:
>
>         PR tree-optimization/112774
>         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
>         printed with additional <nw> info.
>         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
>         nonwrapping_chrec_p to set and check the new flag respectively.
>         * tree-scalar-evolution.h: Likewise.
>         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
>         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
>         scev_probably_wraps_p): call record_nonwrapping_chrec before
>         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
>         return false from scev_probably_wraps_p.
>         * tree-vect-loop.cc (vect_analyze_loop): call
>         free_numbers_of_iterations_estimates explicitly.
>         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
>         to represent the nonwrapping info.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/scev-16.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++
>  gcc/tree-pretty-print.cc                |  2 +-
>  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
>  gcc/tree-scalar-evolution.h             |  2 ++
>  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
>  gcc/tree-vect-loop.cc                   |  4 ++++
>  gcc/tree.h                              |  8 +++++---
>  7 files changed, 69 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> new file mode 100644
> index 00000000000..96ea36e4c65
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-vect-scev" } */
> +
> +int A[1024 * 2];
> +
> +int foo (unsigned offset, unsigned N)
> +{
> +  int sum = 0;
> +
> +  for (unsigned i = 0; i < N; i++)
> +    sum += A[i + offset];
> +
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "missed:  failed: evolution of offset is not affine" "vect" } } */
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 1fadd752d05..0dabb6d1580 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
>        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
>        pp_string (pp, ", +, ");
>        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> -      pp_string (pp, "}_");
> +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
>        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
>        is_stmt = false;
>        break;
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index 065bcd0743d..a9112572e0c 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
>    return res;
>  }
>
> +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> +
> +void record_nonwrapping_chrec (tree chrec)
> +{
> +  CHREC_NOWRAP(chrec) = 1;
> +
> +  if (dump_file && (dump_flags & TDF_SCEV))
> +    {
> +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> +      print_generic_expr (dump_file, chrec);
> +      fprintf (dump_file, ")\n");
> +    }
> +}
> +
> +/* Return true if CHREC's nonwrapping flag is set.  */
> +
> +bool nonwrapping_chrec_p (tree chrec)
> +{
> +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> +    return false;
> +
> +  return CHREC_NOWRAP(chrec);
> +}
> +
>  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
>
>  static tree
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index a64ed78fe63..f57fde12ee2 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
>                        bool);
>  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> +extern void record_nonwrapping_chrec (tree);
> +extern bool nonwrapping_chrec_p (tree);
>
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
>     the loop is function's body.  */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 2098bef9a97..d465e0ed7e1 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
>
>    /* If access is not executed on every iteration, we must ensure that overlow
>       may not make the access valid later.  */
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> -      && scev_probably_wraps_p (NULL_TREE,
> -                               initial_condition_in_loop_num (ev, loop->num),
> -                               step, data->stmt, loop, true))
> -    upper = false;
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> +    {
> +      if (scev_probably_wraps_p (NULL_TREE,
> +                                initial_condition_in_loop_num (ev, loop->num),
> +                                step, data->stmt, loop, true))
> +       upper = false;
> +    }
> +  else
> +    record_nonwrapping_chrec (ev);
>
>    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
> @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
>    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
>      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
>        high = wide_int_to_tree (type, r.upper_bound ());
>      }
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
>    if (loop_exits_before_overflow (base, step, at_stmt, loop))
>      return false;
>
> +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> +     above loop exits before overflow).  */
> +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> +    return false;
> +
>    /* At this point we still don't have a proof that the iv does not
>       overflow: give up.  */
>    return true;
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index dd584ab4a42..6261cd1be1d 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>          analysis are done under the assumptions.  */
>        loop_constraint_set (loop, LOOP_C_FINITE);
>      }
> +  else
> +    /* Clear the existing niter information to make sure the nonwrapping flag
> +       will be calculated and set propriately.  */
> +    free_numbers_of_iterations_estimates (loop);
>
>    auto_vector_modes vector_modes;
>    /* Autodetect first vector size we try.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 086b55f0375..59af8920f02 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
>  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
>
>  /* Accessors for the chains of recurrences.  */
> -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
>
>  /* LABEL_EXPR accessor. This gives access to the label associated with
>     the given label expression.  */
> --
> 2.34.1
>
  
Hao Liu OS Dec. 7, 2023, 8:54 a.m. UTC | #3
> Can you try to do some statistics on say SPEC CPU?  I'm usually
> building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> how many more loops (and which) we vectorize additionally?

I tried this option with SPEC2017 intrate+fprate and count the "optimized: "
lines. Five more loops are vectorized (aarch64-linux-gnu: O3/Ofast for int/fp):

    | O3/Ofast        | before | after | additionally vectorized     |
    | --------------- | ------ | ----- | --------------------------- |
    | 502.gcc_r       | 1075   | 1076  | reload1.c:1934              |
    | 510.parest_r    | 9818   | 9819  | fe_dgp_monomial.cc:104      |
    | 523.xalancbmk_r | 4791   | 4824  | XMLReader.cpp:1650          |
    | 526.blender_r   | 4520   | 4522  | infback.c:441,inflate.c:983 |

All of them access arrays with unsigned offsets, which are previously thought
can be overflow. E.g., the case in 502.gcc:

    unsigned int i;
    unsigned int this_nregs = ...;

    for (j = 1; j < this_nregs; j++)
      {
        this_cost += spill_add_cost[regno + j];
        if ((TEST_HARD_REG_BIT (not_usable, regno + j))
          || TEST_HARD_REG_BIT (used_by_other_reload, regno + j))
          ok = 0;
      }

However, as they are not hot, the performance is not affected. I measured the build time, which is also not affected. With "-flto", more benchmarks (12 in total) will be affected (details are not analyzed):

    | O3/Ofast + -flto | before | after | diff |
    | ---------------- | ------ | ----- | ---- |
    | 502.gcc_r        | 979    | 980   | +1   |
    | 507.cactuBSSN_r  | 3454   | 3458  | +4   |
    | 508.namd_r       | 858    | 857   | -1   |
    | 510.parest_r     | 1575   | 1577  | +2   |
    | 511.povray_r     | 810    | 812   | +2   |
    | 521.wrf_r        | 8769   | 8763  | -6   |
    | 523.xalancbmk_r  | 3959   | 3979  | +20  |
    | 526.blender_r    | 4580   | 4575  | -5   |
    | 527.cam4_r       | 2371   | 2370  | -1   |
    | 538.imagick_r    | 462    | 461   | -1   |
    | 549.fotonik3d_r  | 436    | 437   | +1   |
    | 554.roms_r       | 852    | 851   | -1   |

I think using unsigned index to access array should be rare. Programmers
tend to use "for (int i; ...)" instead of unsigned values. But there may be
special requirements. This opportunity is found in a real application, which
has hot loops with such unsigned access pattern, and it can get huge
improvements.

Thanks,
Hao

________________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Wednesday, December 6, 2023 19:49
To: Hao Liu OS
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS <hliu@os.amperecomputing.com> wrote:
>
> Hi,
>
> Update the patch to fix problems in the test case:
>  - add "-details" option to the dump command
>  - add dg-require and target filters to avoid potential failures on platforms that don't support vectorization.

Interesting simple trick - the downside is that this makes the
recursive dependence
of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
set the flag on CHRECs that are not necessarily cached, so I'm not sure how
effective this will be ...

Can you try to do some statistics on say SPEC CPU?  I'm usually
building (with -j1) with -fopt-info-vec and diff build logs, you can then see
how many more loops (and which) we vectorize additionally?

Thanks,
Richard.

> Thanks,
> -Hao
>
> gcc/ChangeLog:
>
>         PR tree-optimization/112774
>         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
>         printed with additional <nw> info.
>         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
>         nonwrapping_chrec_p to set and check the new flag respectively.
>         * tree-scalar-evolution.h: Likewise.
>         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
>         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
>         scev_probably_wraps_p): call record_nonwrapping_chrec before
>         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
>         return false from scev_probably_wraps_p.
>         * tree-vect-loop.cc (vect_analyze_loop): call
>         free_numbers_of_iterations_estimates explicitly.
>         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
>         to represent the nonwrapping info.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/scev-16.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++++++++++++++++++
>  gcc/tree-pretty-print.cc                |  2 +-
>  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
>  gcc/tree-scalar-evolution.h             |  2 ++
>  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
>  gcc/tree-vect-loop.cc                   |  4 ++++
>  gcc/tree.h                              |  8 +++++---
>  7 files changed, 70 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> new file mode 100644
> index 00000000000..120f40c0b6c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +
> +int A[1024 * 2];
> +
> +int foo (unsigned offset, unsigned N)
> +{
> +  int sum = 0;
> +
> +  for (unsigned i = 0; i < N; i++)
> +    sum += A[i + offset];
> +
> +  return sum;
> +}
> +
> +/* Loop can be vectorized by referring "i + offset" is nonwrapping from array.  */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { avr-*-* msp430-*-* pru-*-* } } } } } */
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 1fadd752d05..0dabb6d1580 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
>        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
>        pp_string (pp, ", +, ");
>        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> -      pp_string (pp, "}_");
> +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
>        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
>        is_stmt = false;
>        break;
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index f61277c32df..81630603c12 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
>    return res;
>  }
>
> +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> +
> +void record_nonwrapping_chrec (tree chrec)
> +{
> +  CHREC_NOWRAP(chrec) = 1;
> +
> +  if (dump_file && (dump_flags & TDF_SCEV))
> +    {
> +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> +      print_generic_expr (dump_file, chrec);
> +      fprintf (dump_file, ")\n");
> +    }
> +}
> +
> +/* Return true if CHREC's nonwrapping flag is set.  */
> +
> +bool nonwrapping_chrec_p (tree chrec)
> +{
> +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> +    return false;
> +
> +  return CHREC_NOWRAP(chrec);
> +}
> +
>  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
>
>  static tree
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index a64ed78fe63..f57fde12ee2 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
>                        bool);
>  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> +extern void record_nonwrapping_chrec (tree);
> +extern bool nonwrapping_chrec_p (tree);
>
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
>     the loop is function's body.  */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 2098bef9a97..d465e0ed7e1 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
>
>    /* If access is not executed on every iteration, we must ensure that overlow
>       may not make the access valid later.  */
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> -      && scev_probably_wraps_p (NULL_TREE,
> -                               initial_condition_in_loop_num (ev, loop->num),
> -                               step, data->stmt, loop, true))
> -    upper = false;
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> +    {
> +      if (scev_probably_wraps_p (NULL_TREE,
> +                                initial_condition_in_loop_num (ev, loop->num),
> +                                step, data->stmt, loop, true))
> +       upper = false;
> +    }
> +  else
> +    record_nonwrapping_chrec (ev);
>
>    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
> @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
>    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
>      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
>        high = wide_int_to_tree (type, r.upper_bound ());
>      }
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
>    if (loop_exits_before_overflow (base, step, at_stmt, loop))
>      return false;
>
> +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> +     above loop exits before overflow).  */
> +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> +    return false;
> +
>    /* At this point we still don't have a proof that the iv does not
>       overflow: give up.  */
>    return true;
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3df020d2228..7f278e2d8f4 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>          analysis are done under the assumptions.  */
>        loop_constraint_set (loop, LOOP_C_FINITE);
>      }
> +  else
> +    /* Clear the existing niter information to make sure the nonwrapping flag
> +       will be calculated and set propriately.  */
> +    free_numbers_of_iterations_estimates (loop);
>
>    auto_vector_modes vector_modes;
>    /* Autodetect first vector size we try.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 086b55f0375..59af8920f02 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
>  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
>
>  /* Accessors for the chains of recurrences.  */
> -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
>
>  /* LABEL_EXPR accessor. This gives access to the label associated with
>     the given label expression.  */
> --
> 2.34.1
>
>
> ________________________________________
> From: Hao Liu OS <hliu@os.amperecomputing.com>
> Sent: Monday, December 4, 2023 18:05
> To: GCC-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
>
> Loop vecotorization can not optimize following case due to SCEV is not affine
> failure (i+offset may overflow):
>
>     int A[1024 * 2];
>
>     int foo (unsigned offset, unsigned N)
>     {
>       int sum = 0;
>       for (unsigned i = 0; i < N; i++)
>         sum += A[i + offset];
>       return sum;
>     }
>
> Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
> can not overflow) by inferring from undefined behaviors like array access (see
> record_nonwrapping_iv). But this information is not shared to SCEV yet. This
> patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it
> to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop
> vectorization could succeed.
>
> The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
> "{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two
> SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to
> set and check the flag respectively.
>
> However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
> reset_scev_htab), which may not be synchronized with the calling of
> free_numbers_of_iterations_estimates, which set the loop->estimate_state to
> EST_NOT_COMPUTED and make sure the above inferring from array access is called.
> In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
> cache if the loop->estimate_state is not reset.
> E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
> which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the
> estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
> loop vectorization.
>
> This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
> vect_analyze_loop, which will make sure the flag is always set propriately in
> loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
> only), if there is more reasonable method, I will revert the simple fix and try
> that.
>
> This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions.
> OK for trunk?
>
> ---
> The patch is as following:
>
> [PATCH] SCEV: extend the chrec tree with a nonwrapping flag
>  [PR112774]
>
> The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
> "{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
> Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
> added to set and check the flag respectively.
>
> As resetting the SCEV cache (i.e., the chrec trees) may not reset the
> loop->estimate_state, free_numbers_of_iterations_estimates is called
> explicitly in loop vectorization to make sure the flag can be
> calculated propriately by niter.
>
> gcc/ChangeLog:
>
>         PR tree-optimization/112774
>         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
>         printed with additional <nw> info.
>         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
>         nonwrapping_chrec_p to set and check the new flag respectively.
>         * tree-scalar-evolution.h: Likewise.
>         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
>         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
>         scev_probably_wraps_p): call record_nonwrapping_chrec before
>         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
>         return false from scev_probably_wraps_p.
>         * tree-vect-loop.cc (vect_analyze_loop): call
>         free_numbers_of_iterations_estimates explicitly.
>         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
>         to represent the nonwrapping info.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/scev-16.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++
>  gcc/tree-pretty-print.cc                |  2 +-
>  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
>  gcc/tree-scalar-evolution.h             |  2 ++
>  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
>  gcc/tree-vect-loop.cc                   |  4 ++++
>  gcc/tree.h                              |  8 +++++---
>  7 files changed, 69 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> new file mode 100644
> index 00000000000..96ea36e4c65
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-vect-scev" } */
> +
> +int A[1024 * 2];
> +
> +int foo (unsigned offset, unsigned N)
> +{
> +  int sum = 0;
> +
> +  for (unsigned i = 0; i < N; i++)
> +    sum += A[i + offset];
> +
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "missed:  failed: evolution of offset is not affine" "vect" } } */
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 1fadd752d05..0dabb6d1580 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
>        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
>        pp_string (pp, ", +, ");
>        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> -      pp_string (pp, "}_");
> +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
>        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
>        is_stmt = false;
>        break;
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index 065bcd0743d..a9112572e0c 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
>    return res;
>  }
>
> +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> +
> +void record_nonwrapping_chrec (tree chrec)
> +{
> +  CHREC_NOWRAP(chrec) = 1;
> +
> +  if (dump_file && (dump_flags & TDF_SCEV))
> +    {
> +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> +      print_generic_expr (dump_file, chrec);
> +      fprintf (dump_file, ")\n");
> +    }
> +}
> +
> +/* Return true if CHREC's nonwrapping flag is set.  */
> +
> +bool nonwrapping_chrec_p (tree chrec)
> +{
> +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> +    return false;
> +
> +  return CHREC_NOWRAP(chrec);
> +}
> +
>  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
>
>  static tree
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index a64ed78fe63..f57fde12ee2 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
>                        bool);
>  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> +extern void record_nonwrapping_chrec (tree);
> +extern bool nonwrapping_chrec_p (tree);
>
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
>     the loop is function's body.  */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 2098bef9a97..d465e0ed7e1 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
>
>    /* If access is not executed on every iteration, we must ensure that overlow
>       may not make the access valid later.  */
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> -      && scev_probably_wraps_p (NULL_TREE,
> -                               initial_condition_in_loop_num (ev, loop->num),
> -                               step, data->stmt, loop, true))
> -    upper = false;
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> +    {
> +      if (scev_probably_wraps_p (NULL_TREE,
> +                                initial_condition_in_loop_num (ev, loop->num),
> +                                step, data->stmt, loop, true))
> +       upper = false;
> +    }
> +  else
> +    record_nonwrapping_chrec (ev);
>
>    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
> @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
>    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
>      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
>        high = wide_int_to_tree (type, r.upper_bound ());
>      }
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
>    if (loop_exits_before_overflow (base, step, at_stmt, loop))
>      return false;
>
> +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> +     above loop exits before overflow).  */
> +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> +    return false;
> +
>    /* At this point we still don't have a proof that the iv does not
>       overflow: give up.  */
>    return true;
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index dd584ab4a42..6261cd1be1d 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>          analysis are done under the assumptions.  */
>        loop_constraint_set (loop, LOOP_C_FINITE);
>      }
> +  else
> +    /* Clear the existing niter information to make sure the nonwrapping flag
> +       will be calculated and set propriately.  */
> +    free_numbers_of_iterations_estimates (loop);
>
>    auto_vector_modes vector_modes;
>    /* Autodetect first vector size we try.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 086b55f0375..59af8920f02 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
>  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
>
>  /* Accessors for the chains of recurrences.  */
> -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
>
>  /* LABEL_EXPR accessor. This gives access to the label associated with
>     the given label expression.  */
> --
> 2.34.1
>
  
Richard Biener Dec. 7, 2023, 2:12 p.m. UTC | #4
On Thu, Dec 7, 2023 at 9:59 AM Hao Liu OS <hliu@os.amperecomputing.com> wrote:
>
> > Can you try to do some statistics on say SPEC CPU?  I'm usually
> > building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> > how many more loops (and which) we vectorize additionally?
>
> I tried this option with SPEC2017 intrate+fprate and count the "optimized: "
> lines. Five more loops are vectorized (aarch64-linux-gnu: O3/Ofast for int/fp):
>
>     | O3/Ofast        | before | after | additionally vectorized     |
>     | --------------- | ------ | ----- | --------------------------- |
>     | 502.gcc_r       | 1075   | 1076  | reload1.c:1934              |
>     | 510.parest_r    | 9818   | 9819  | fe_dgp_monomial.cc:104      |
>     | 523.xalancbmk_r | 4791   | 4824  | XMLReader.cpp:1650          |
>     | 526.blender_r   | 4520   | 4522  | infback.c:441,inflate.c:983 |
>
> All of them access arrays with unsigned offsets, which are previously thought
> can be overflow. E.g., the case in 502.gcc:
>
>     unsigned int i;
>     unsigned int this_nregs = ...;
>
>     for (j = 1; j < this_nregs; j++)
>       {
>         this_cost += spill_add_cost[regno + j];
>         if ((TEST_HARD_REG_BIT (not_usable, regno + j))
>           || TEST_HARD_REG_BIT (used_by_other_reload, regno + j))
>           ok = 0;
>       }
>
> However, as they are not hot, the performance is not affected. I measured the build time, which is also not affected. With "-flto", more benchmarks (12 in total) will be affected (details are not analyzed):
>
>     | O3/Ofast + -flto | before | after | diff |
>     | ---------------- | ------ | ----- | ---- |
>     | 502.gcc_r        | 979    | 980   | +1   |
>     | 507.cactuBSSN_r  | 3454   | 3458  | +4   |
>     | 508.namd_r       | 858    | 857   | -1   |
>     | 510.parest_r     | 1575   | 1577  | +2   |
>     | 511.povray_r     | 810    | 812   | +2   |
>     | 521.wrf_r        | 8769   | 8763  | -6   |
>     | 523.xalancbmk_r  | 3959   | 3979  | +20  |
>     | 526.blender_r    | 4580   | 4575  | -5   |
>     | 527.cam4_r       | 2371   | 2370  | -1   |
>     | 538.imagick_r    | 462    | 461   | -1   |
>     | 549.fotonik3d_r  | 436    | 437   | +1   |
>     | 554.roms_r       | 852    | 851   | -1   |
>
> I think using unsigned index to access array should be rare. Programmers
> tend to use "for (int i; ...)" instead of unsigned values. But there may be
> special requirements. This opportunity is found in a real application, which
> has hot loops with such unsigned access pattern, and it can get huge
> improvements.

Yes, I can see that.  I think the patch is OK with a minor nit - can you
please document the nothrow flag usage in TREE_CHREC in
tree-core.h?  There's a big comment doing flags documentation:


/* The following table lists the uses of each of the above flags and
   for which types of nodes they are defined.
...

OK with that change.   Let's see how it goes ...

Thanks,
Richard.

> Thanks,
> Hao
>
> ________________________________________
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, December 6, 2023 19:49
> To: Hao Liu OS
> Cc: GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
>
> On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS <hliu@os.amperecomputing.com> wrote:
> >
> > Hi,
> >
> > Update the patch to fix problems in the test case:
> >  - add "-details" option to the dump command
> >  - add dg-require and target filters to avoid potential failures on platforms that don't support vectorization.
>
> Interesting simple trick - the downside is that this makes the
> recursive dependence
> of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
> set the flag on CHRECs that are not necessarily cached, so I'm not sure how
> effective this will be ...
>
> Can you try to do some statistics on say SPEC CPU?  I'm usually
> building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> how many more loops (and which) we vectorize additionally?
>
> Thanks,
> Richard.
>
> > Thanks,
> > -Hao
> >
> > gcc/ChangeLog:
> >
> >         PR tree-optimization/112774
> >         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
> >         printed with additional <nw> info.
> >         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
> >         nonwrapping_chrec_p to set and check the new flag respectively.
> >         * tree-scalar-evolution.h: Likewise.
> >         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
> >         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
> >         scev_probably_wraps_p): call record_nonwrapping_chrec before
> >         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
> >         return false from scev_probably_wraps_p.
> >         * tree-vect-loop.cc (vect_analyze_loop): call
> >         free_numbers_of_iterations_estimates explicitly.
> >         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
> >         to represent the nonwrapping info.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/tree-ssa/scev-16.c: New test.
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++++++++++++++++++
> >  gcc/tree-pretty-print.cc                |  2 +-
> >  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
> >  gcc/tree-scalar-evolution.h             |  2 ++
> >  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
> >  gcc/tree-vect-loop.cc                   |  4 ++++
> >  gcc/tree.h                              |  8 +++++---
> >  7 files changed, 70 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > new file mode 100644
> > index 00000000000..120f40c0b6c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> > +
> > +int A[1024 * 2];
> > +
> > +int foo (unsigned offset, unsigned N)
> > +{
> > +  int sum = 0;
> > +
> > +  for (unsigned i = 0; i < N; i++)
> > +    sum += A[i + offset];
> > +
> > +  return sum;
> > +}
> > +
> > +/* Loop can be vectorized by referring "i + offset" is nonwrapping from array.  */
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { avr-*-* msp430-*-* pru-*-* } } } } } */
> > diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> > index 1fadd752d05..0dabb6d1580 100644
> > --- a/gcc/tree-pretty-print.cc
> > +++ b/gcc/tree-pretty-print.cc
> > @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
> >        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
> >        pp_string (pp, ", +, ");
> >        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> > -      pp_string (pp, "}_");
> > +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
> >        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
> >        is_stmt = false;
> >        break;
> > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > index f61277c32df..81630603c12 100644
> > --- a/gcc/tree-scalar-evolution.cc
> > +++ b/gcc/tree-scalar-evolution.cc
> > @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
> >    return res;
> >  }
> >
> > +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> > +
> > +void record_nonwrapping_chrec (tree chrec)
> > +{
> > +  CHREC_NOWRAP(chrec) = 1;
> > +
> > +  if (dump_file && (dump_flags & TDF_SCEV))
> > +    {
> > +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> > +      print_generic_expr (dump_file, chrec);
> > +      fprintf (dump_file, ")\n");
> > +    }
> > +}
> > +
> > +/* Return true if CHREC's nonwrapping flag is set.  */
> > +
> > +bool nonwrapping_chrec_p (tree chrec)
> > +{
> > +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> > +    return false;
> > +
> > +  return CHREC_NOWRAP(chrec);
> > +}
> > +
> >  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
> >
> >  static tree
> > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> > index a64ed78fe63..f57fde12ee2 100644
> > --- a/gcc/tree-scalar-evolution.h
> > +++ b/gcc/tree-scalar-evolution.h
> > @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
> >                        bool);
> >  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
> >  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> > +extern void record_nonwrapping_chrec (tree);
> > +extern bool nonwrapping_chrec_p (tree);
> >
> >  /* Returns the basic block preceding LOOP, or the CFG entry block when
> >     the loop is function's body.  */
> > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > index 2098bef9a97..d465e0ed7e1 100644
> > --- a/gcc/tree-ssa-loop-niter.cc
> > +++ b/gcc/tree-ssa-loop-niter.cc
> > @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
> >
> >    /* If access is not executed on every iteration, we must ensure that overlow
> >       may not make the access valid later.  */
> > -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> > -      && scev_probably_wraps_p (NULL_TREE,
> > -                               initial_condition_in_loop_num (ev, loop->num),
> > -                               step, data->stmt, loop, true))
> > -    upper = false;
> > +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> > +    {
> > +      if (scev_probably_wraps_p (NULL_TREE,
> > +                                initial_condition_in_loop_num (ev, loop->num),
> > +                                step, data->stmt, loop, true))
> > +       upper = false;
> > +    }
> > +  else
> > +    record_nonwrapping_chrec (ev);
> >
> >    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
> >    return true;
> > @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
> >    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
> >      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
> >
> > +  record_nonwrapping_chrec (scev);
> >    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
> >  }
> >
> > @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
> >        high = wide_int_to_tree (type, r.upper_bound ());
> >      }
> >
> > +  record_nonwrapping_chrec (scev);
> >    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
> >  }
> >
> > @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
> >    if (loop_exits_before_overflow (base, step, at_stmt, loop))
> >      return false;
> >
> > +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> > +     above loop exits before overflow).  */
> > +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> > +    return false;
> > +
> >    /* At this point we still don't have a proof that the iv does not
> >       overflow: give up.  */
> >    return true;
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 3df020d2228..7f278e2d8f4 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
> >          analysis are done under the assumptions.  */
> >        loop_constraint_set (loop, LOOP_C_FINITE);
> >      }
> > +  else
> > +    /* Clear the existing niter information to make sure the nonwrapping flag
> > +       will be calculated and set propriately.  */
> > +    free_numbers_of_iterations_estimates (loop);
> >
> >    auto_vector_modes vector_modes;
> >    /* Autodetect first vector size we try.  */
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index 086b55f0375..59af8920f02 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
> >  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
> >
> >  /* Accessors for the chains of recurrences.  */
> > -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> > -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> > -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> > +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> > +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> > +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> > +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> > +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
> >
> >  /* LABEL_EXPR accessor. This gives access to the label associated with
> >     the given label expression.  */
> > --
> > 2.34.1
> >
> >
> > ________________________________________
> > From: Hao Liu OS <hliu@os.amperecomputing.com>
> > Sent: Monday, December 4, 2023 18:05
> > To: GCC-patches@gcc.gnu.org
> > Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
> >
> > Loop vecotorization can not optimize following case due to SCEV is not affine
> > failure (i+offset may overflow):
> >
> >     int A[1024 * 2];
> >
> >     int foo (unsigned offset, unsigned N)
> >     {
> >       int sum = 0;
> >       for (unsigned i = 0; i < N; i++)
> >         sum += A[i + offset];
> >       return sum;
> >     }
> >
> > Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
> > can not overflow) by inferring from undefined behaviors like array access (see
> > record_nonwrapping_iv). But this information is not shared to SCEV yet. This
> > patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it
> > to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop
> > vectorization could succeed.
> >
> > The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
> > "{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two
> > SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to
> > set and check the flag respectively.
> >
> > However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
> > reset_scev_htab), which may not be synchronized with the calling of
> > free_numbers_of_iterations_estimates, which set the loop->estimate_state to
> > EST_NOT_COMPUTED and make sure the above inferring from array access is called.
> > In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
> > cache if the loop->estimate_state is not reset.
> > E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
> > which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the
> > estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
> > loop vectorization.
> >
> > This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
> > vect_analyze_loop, which will make sure the flag is always set propriately in
> > loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
> > only), if there is more reasonable method, I will revert the simple fix and try
> > that.
> >
> > This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions.
> > OK for trunk?
> >
> > ---
> > The patch is as following:
> >
> > [PATCH] SCEV: extend the chrec tree with a nonwrapping flag
> >  [PR112774]
> >
> > The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
> > "{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
> > Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
> > added to set and check the flag respectively.
> >
> > As resetting the SCEV cache (i.e., the chrec trees) may not reset the
> > loop->estimate_state, free_numbers_of_iterations_estimates is called
> > explicitly in loop vectorization to make sure the flag can be
> > calculated propriately by niter.
> >
> > gcc/ChangeLog:
> >
> >         PR tree-optimization/112774
> >         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
> >         printed with additional <nw> info.
> >         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
> >         nonwrapping_chrec_p to set and check the new flag respectively.
> >         * tree-scalar-evolution.h: Likewise.
> >         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
> >         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
> >         scev_probably_wraps_p): call record_nonwrapping_chrec before
> >         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
> >         return false from scev_probably_wraps_p.
> >         * tree-vect-loop.cc (vect_analyze_loop): call
> >         free_numbers_of_iterations_estimates explicitly.
> >         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
> >         to represent the nonwrapping info.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/tree-ssa/scev-16.c: New test.
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++
> >  gcc/tree-pretty-print.cc                |  2 +-
> >  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
> >  gcc/tree-scalar-evolution.h             |  2 ++
> >  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
> >  gcc/tree-vect-loop.cc                   |  4 ++++
> >  gcc/tree.h                              |  8 +++++---
> >  7 files changed, 69 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > new file mode 100644
> > index 00000000000..96ea36e4c65
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-vect-scev" } */
> > +
> > +int A[1024 * 2];
> > +
> > +int foo (unsigned offset, unsigned N)
> > +{
> > +  int sum = 0;
> > +
> > +  for (unsigned i = 0; i < N; i++)
> > +    sum += A[i + offset];
> > +
> > +  return sum;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "missed:  failed: evolution of offset is not affine" "vect" } } */
> > diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> > index 1fadd752d05..0dabb6d1580 100644
> > --- a/gcc/tree-pretty-print.cc
> > +++ b/gcc/tree-pretty-print.cc
> > @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
> >        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
> >        pp_string (pp, ", +, ");
> >        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> > -      pp_string (pp, "}_");
> > +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
> >        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
> >        is_stmt = false;
> >        break;
> > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > index 065bcd0743d..a9112572e0c 100644
> > --- a/gcc/tree-scalar-evolution.cc
> > +++ b/gcc/tree-scalar-evolution.cc
> > @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
> >    return res;
> >  }
> >
> > +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> > +
> > +void record_nonwrapping_chrec (tree chrec)
> > +{
> > +  CHREC_NOWRAP(chrec) = 1;
> > +
> > +  if (dump_file && (dump_flags & TDF_SCEV))
> > +    {
> > +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> > +      print_generic_expr (dump_file, chrec);
> > +      fprintf (dump_file, ")\n");
> > +    }
> > +}
> > +
> > +/* Return true if CHREC's nonwrapping flag is set.  */
> > +
> > +bool nonwrapping_chrec_p (tree chrec)
> > +{
> > +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> > +    return false;
> > +
> > +  return CHREC_NOWRAP(chrec);
> > +}
> > +
> >  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
> >
> >  static tree
> > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> > index a64ed78fe63..f57fde12ee2 100644
> > --- a/gcc/tree-scalar-evolution.h
> > +++ b/gcc/tree-scalar-evolution.h
> > @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
> >                        bool);
> >  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
> >  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> > +extern void record_nonwrapping_chrec (tree);
> > +extern bool nonwrapping_chrec_p (tree);
> >
> >  /* Returns the basic block preceding LOOP, or the CFG entry block when
> >     the loop is function's body.  */
> > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > index 2098bef9a97..d465e0ed7e1 100644
> > --- a/gcc/tree-ssa-loop-niter.cc
> > +++ b/gcc/tree-ssa-loop-niter.cc
> > @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
> >
> >    /* If access is not executed on every iteration, we must ensure that overlow
> >       may not make the access valid later.  */
> > -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> > -      && scev_probably_wraps_p (NULL_TREE,
> > -                               initial_condition_in_loop_num (ev, loop->num),
> > -                               step, data->stmt, loop, true))
> > -    upper = false;
> > +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> > +    {
> > +      if (scev_probably_wraps_p (NULL_TREE,
> > +                                initial_condition_in_loop_num (ev, loop->num),
> > +                                step, data->stmt, loop, true))
> > +       upper = false;
> > +    }
> > +  else
> > +    record_nonwrapping_chrec (ev);
> >
> >    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
> >    return true;
> > @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
> >    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
> >      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
> >
> > +  record_nonwrapping_chrec (scev);
> >    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
> >  }
> >
> > @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
> >        high = wide_int_to_tree (type, r.upper_bound ());
> >      }
> >
> > +  record_nonwrapping_chrec (scev);
> >    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
> >  }
> >
> > @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
> >    if (loop_exits_before_overflow (base, step, at_stmt, loop))
> >      return false;
> >
> > +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> > +     above loop exits before overflow).  */
> > +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> > +    return false;
> > +
> >    /* At this point we still don't have a proof that the iv does not
> >       overflow: give up.  */
> >    return true;
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index dd584ab4a42..6261cd1be1d 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
> >          analysis are done under the assumptions.  */
> >        loop_constraint_set (loop, LOOP_C_FINITE);
> >      }
> > +  else
> > +    /* Clear the existing niter information to make sure the nonwrapping flag
> > +       will be calculated and set propriately.  */
> > +    free_numbers_of_iterations_estimates (loop);
> >
> >    auto_vector_modes vector_modes;
> >    /* Autodetect first vector size we try.  */
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index 086b55f0375..59af8920f02 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
> >  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
> >
> >  /* Accessors for the chains of recurrences.  */
> > -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> > -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> > -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> > +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> > +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> > +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> > +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> > +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
> >
> >  /* LABEL_EXPR accessor. This gives access to the label associated with
> >     the given label expression.  */
> > --
> > 2.34.1
> >
  
Hao Liu OS Dec. 8, 2023, 3:34 a.m. UTC | #5
> Yes, I can see that.  I think the patch is OK with a minor nit - can you
> please document the nothrow flag usage in TREE_CHREC in
> tree-core.h?  There's a big comment doing flags documentation:

Thanks, committed with the new documentation:
https://gcc.gnu.org/g:2efe3a7de0107618397264017fb045f237764cc7

Thanks,
Hao.

________________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Thursday, December 7, 2023 22:12
To: Hao Liu OS
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

On Thu, Dec 7, 2023 at 9:59 AM Hao Liu OS <hliu@os.amperecomputing.com> wrote:
>
> > Can you try to do some statistics on say SPEC CPU?  I'm usually
> > building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> > how many more loops (and which) we vectorize additionally?
>
> I tried this option with SPEC2017 intrate+fprate and count the "optimized: "
> lines. Five more loops are vectorized (aarch64-linux-gnu: O3/Ofast for int/fp):
>
>     | O3/Ofast        | before | after | additionally vectorized     |
>     | --------------- | ------ | ----- | --------------------------- |
>     | 502.gcc_r       | 1075   | 1076  | reload1.c:1934              |
>     | 510.parest_r    | 9818   | 9819  | fe_dgp_monomial.cc:104      |
>     | 523.xalancbmk_r | 4791   | 4824  | XMLReader.cpp:1650          |
>     | 526.blender_r   | 4520   | 4522  | infback.c:441,inflate.c:983 |
>
> All of them access arrays with unsigned offsets, which are previously thought
> can be overflow. E.g., the case in 502.gcc:
>
>     unsigned int i;
>     unsigned int this_nregs = ...;
>
>     for (j = 1; j < this_nregs; j++)
>       {
>         this_cost += spill_add_cost[regno + j];
>         if ((TEST_HARD_REG_BIT (not_usable, regno + j))
>           || TEST_HARD_REG_BIT (used_by_other_reload, regno + j))
>           ok = 0;
>       }
>
> However, as they are not hot, the performance is not affected. I measured the build time, which is also not affected. With "-flto", more benchmarks (12 in total) will be affected (details are not analyzed):
>
>     | O3/Ofast + -flto | before | after | diff |
>     | ---------------- | ------ | ----- | ---- |
>     | 502.gcc_r        | 979    | 980   | +1   |
>     | 507.cactuBSSN_r  | 3454   | 3458  | +4   |
>     | 508.namd_r       | 858    | 857   | -1   |
>     | 510.parest_r     | 1575   | 1577  | +2   |
>     | 511.povray_r     | 810    | 812   | +2   |
>     | 521.wrf_r        | 8769   | 8763  | -6   |
>     | 523.xalancbmk_r  | 3959   | 3979  | +20  |
>     | 526.blender_r    | 4580   | 4575  | -5   |
>     | 527.cam4_r       | 2371   | 2370  | -1   |
>     | 538.imagick_r    | 462    | 461   | -1   |
>     | 549.fotonik3d_r  | 436    | 437   | +1   |
>     | 554.roms_r       | 852    | 851   | -1   |
>
> I think using unsigned index to access array should be rare. Programmers
> tend to use "for (int i; ...)" instead of unsigned values. But there may be
> special requirements. This opportunity is found in a real application, which
> has hot loops with such unsigned access pattern, and it can get huge
> improvements.

Yes, I can see that.  I think the patch is OK with a minor nit - can you
please document the nothrow flag usage in TREE_CHREC in
tree-core.h?  There's a big comment doing flags documentation:


/* The following table lists the uses of each of the above flags and
   for which types of nodes they are defined.
...

OK with that change.   Let's see how it goes ...

Thanks,
Richard.

> Thanks,
> Hao
>
> ________________________________________
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, December 6, 2023 19:49
> To: Hao Liu OS
> Cc: GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
>
> On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS <hliu@os.amperecomputing.com> wrote:
> >
> > Hi,
> >
> > Update the patch to fix problems in the test case:
> >  - add "-details" option to the dump command
> >  - add dg-require and target filters to avoid potential failures on platforms that don't support vectorization.
>
> Interesting simple trick - the downside is that this makes the
> recursive dependence
> of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
> set the flag on CHRECs that are not necessarily cached, so I'm not sure how
> effective this will be ...
>
> Can you try to do some statistics on say SPEC CPU?  I'm usually
> building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> how many more loops (and which) we vectorize additionally?
>
> Thanks,
> Richard.
>
> > Thanks,
> > -Hao
> >
> > gcc/ChangeLog:
> >
> >         PR tree-optimization/112774
> >         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
> >         printed with additional <nw> info.
> >         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
> >         nonwrapping_chrec_p to set and check the new flag respectively.
> >         * tree-scalar-evolution.h: Likewise.
> >         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
> >         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
> >         scev_probably_wraps_p): call record_nonwrapping_chrec before
> >         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
> >         return false from scev_probably_wraps_p.
> >         * tree-vect-loop.cc (vect_analyze_loop): call
> >         free_numbers_of_iterations_estimates explicitly.
> >         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
> >         to represent the nonwrapping info.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/tree-ssa/scev-16.c: New test.
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++++++++++++++++++
> >  gcc/tree-pretty-print.cc                |  2 +-
> >  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
> >  gcc/tree-scalar-evolution.h             |  2 ++
> >  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
> >  gcc/tree-vect-loop.cc                   |  4 ++++
> >  gcc/tree.h                              |  8 +++++---
> >  7 files changed, 70 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > new file mode 100644
> > index 00000000000..120f40c0b6c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> > +
> > +int A[1024 * 2];
> > +
> > +int foo (unsigned offset, unsigned N)
> > +{
> > +  int sum = 0;
> > +
> > +  for (unsigned i = 0; i < N; i++)
> > +    sum += A[i + offset];
> > +
> > +  return sum;
> > +}
> > +
> > +/* Loop can be vectorized by referring "i + offset" is nonwrapping from array.  */
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { avr-*-* msp430-*-* pru-*-* } } } } } */
> > diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> > index 1fadd752d05..0dabb6d1580 100644
> > --- a/gcc/tree-pretty-print.cc
> > +++ b/gcc/tree-pretty-print.cc
> > @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
> >        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
> >        pp_string (pp, ", +, ");
> >        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> > -      pp_string (pp, "}_");
> > +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
> >        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
> >        is_stmt = false;
> >        break;
> > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > index f61277c32df..81630603c12 100644
> > --- a/gcc/tree-scalar-evolution.cc
> > +++ b/gcc/tree-scalar-evolution.cc
> > @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
> >    return res;
> >  }
> >
> > +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> > +
> > +void record_nonwrapping_chrec (tree chrec)
> > +{
> > +  CHREC_NOWRAP(chrec) = 1;
> > +
> > +  if (dump_file && (dump_flags & TDF_SCEV))
> > +    {
> > +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> > +      print_generic_expr (dump_file, chrec);
> > +      fprintf (dump_file, ")\n");
> > +    }
> > +}
> > +
> > +/* Return true if CHREC's nonwrapping flag is set.  */
> > +
> > +bool nonwrapping_chrec_p (tree chrec)
> > +{
> > +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> > +    return false;
> > +
> > +  return CHREC_NOWRAP(chrec);
> > +}
> > +
> >  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
> >
> >  static tree
> > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> > index a64ed78fe63..f57fde12ee2 100644
> > --- a/gcc/tree-scalar-evolution.h
> > +++ b/gcc/tree-scalar-evolution.h
> > @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
> >                        bool);
> >  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
> >  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> > +extern void record_nonwrapping_chrec (tree);
> > +extern bool nonwrapping_chrec_p (tree);
> >
> >  /* Returns the basic block preceding LOOP, or the CFG entry block when
> >     the loop is function's body.  */
> > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > index 2098bef9a97..d465e0ed7e1 100644
> > --- a/gcc/tree-ssa-loop-niter.cc
> > +++ b/gcc/tree-ssa-loop-niter.cc
> > @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
> >
> >    /* If access is not executed on every iteration, we must ensure that overlow
> >       may not make the access valid later.  */
> > -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> > -      && scev_probably_wraps_p (NULL_TREE,
> > -                               initial_condition_in_loop_num (ev, loop->num),
> > -                               step, data->stmt, loop, true))
> > -    upper = false;
> > +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> > +    {
> > +      if (scev_probably_wraps_p (NULL_TREE,
> > +                                initial_condition_in_loop_num (ev, loop->num),
> > +                                step, data->stmt, loop, true))
> > +       upper = false;
> > +    }
> > +  else
> > +    record_nonwrapping_chrec (ev);
> >
> >    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
> >    return true;
> > @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
> >    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
> >      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
> >
> > +  record_nonwrapping_chrec (scev);
> >    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
> >  }
> >
> > @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
> >        high = wide_int_to_tree (type, r.upper_bound ());
> >      }
> >
> > +  record_nonwrapping_chrec (scev);
> >    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
> >  }
> >
> > @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
> >    if (loop_exits_before_overflow (base, step, at_stmt, loop))
> >      return false;
> >
> > +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> > +     above loop exits before overflow).  */
> > +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> > +    return false;
> > +
> >    /* At this point we still don't have a proof that the iv does not
> >       overflow: give up.  */
> >    return true;
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 3df020d2228..7f278e2d8f4 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
> >          analysis are done under the assumptions.  */
> >        loop_constraint_set (loop, LOOP_C_FINITE);
> >      }
> > +  else
> > +    /* Clear the existing niter information to make sure the nonwrapping flag
> > +       will be calculated and set propriately.  */
> > +    free_numbers_of_iterations_estimates (loop);
> >
> >    auto_vector_modes vector_modes;
> >    /* Autodetect first vector size we try.  */
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index 086b55f0375..59af8920f02 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
> >  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
> >
> >  /* Accessors for the chains of recurrences.  */
> > -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> > -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> > -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> > +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> > +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> > +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> > +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> > +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
> >
> >  /* LABEL_EXPR accessor. This gives access to the label associated with
> >     the given label expression.  */
> > --
> > 2.34.1
> >
> >
> > ________________________________________
> > From: Hao Liu OS <hliu@os.amperecomputing.com>
> > Sent: Monday, December 4, 2023 18:05
> > To: GCC-patches@gcc.gnu.org
> > Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
> >
> > Loop vecotorization can not optimize following case due to SCEV is not affine
> > failure (i+offset may overflow):
> >
> >     int A[1024 * 2];
> >
> >     int foo (unsigned offset, unsigned N)
> >     {
> >       int sum = 0;
> >       for (unsigned i = 0; i < N; i++)
> >         sum += A[i + offset];
> >       return sum;
> >     }
> >
> > Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
> > can not overflow) by inferring from undefined behaviors like array access (see
> > record_nonwrapping_iv). But this information is not shared to SCEV yet. This
> > patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it
> > to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop
> > vectorization could succeed.
> >
> > The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
> > "{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two
> > SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to
> > set and check the flag respectively.
> >
> > However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
> > reset_scev_htab), which may not be synchronized with the calling of
> > free_numbers_of_iterations_estimates, which set the loop->estimate_state to
> > EST_NOT_COMPUTED and make sure the above inferring from array access is called.
> > In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
> > cache if the loop->estimate_state is not reset.
> > E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
> > which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the
> > estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
> > loop vectorization.
> >
> > This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
> > vect_analyze_loop, which will make sure the flag is always set propriately in
> > loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
> > only), if there is more reasonable method, I will revert the simple fix and try
> > that.
> >
> > This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions.
> > OK for trunk?
> >
> > ---
> > The patch is as following:
> >
> > [PATCH] SCEV: extend the chrec tree with a nonwrapping flag
> >  [PR112774]
> >
> > The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
> > "{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
> > Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
> > added to set and check the flag respectively.
> >
> > As resetting the SCEV cache (i.e., the chrec trees) may not reset the
> > loop->estimate_state, free_numbers_of_iterations_estimates is called
> > explicitly in loop vectorization to make sure the flag can be
> > calculated propriately by niter.
> >
> > gcc/ChangeLog:
> >
> >         PR tree-optimization/112774
> >         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
> >         printed with additional <nw> info.
> >         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
> >         nonwrapping_chrec_p to set and check the new flag respectively.
> >         * tree-scalar-evolution.h: Likewise.
> >         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
> >         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
> >         scev_probably_wraps_p): call record_nonwrapping_chrec before
> >         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
> >         return false from scev_probably_wraps_p.
> >         * tree-vect-loop.cc (vect_analyze_loop): call
> >         free_numbers_of_iterations_estimates explicitly.
> >         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
> >         to represent the nonwrapping info.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/tree-ssa/scev-16.c: New test.
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++
> >  gcc/tree-pretty-print.cc                |  2 +-
> >  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
> >  gcc/tree-scalar-evolution.h             |  2 ++
> >  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
> >  gcc/tree-vect-loop.cc                   |  4 ++++
> >  gcc/tree.h                              |  8 +++++---
> >  7 files changed, 69 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > new file mode 100644
> > index 00000000000..96ea36e4c65
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-vect-scev" } */
> > +
> > +int A[1024 * 2];
> > +
> > +int foo (unsigned offset, unsigned N)
> > +{
> > +  int sum = 0;
> > +
> > +  for (unsigned i = 0; i < N; i++)
> > +    sum += A[i + offset];
> > +
> > +  return sum;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "missed:  failed: evolution of offset is not affine" "vect" } } */
> > diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> > index 1fadd752d05..0dabb6d1580 100644
> > --- a/gcc/tree-pretty-print.cc
> > +++ b/gcc/tree-pretty-print.cc
> > @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
> >        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
> >        pp_string (pp, ", +, ");
> >        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> > -      pp_string (pp, "}_");
> > +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
> >        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
> >        is_stmt = false;
> >        break;
> > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > index 065bcd0743d..a9112572e0c 100644
> > --- a/gcc/tree-scalar-evolution.cc
> > +++ b/gcc/tree-scalar-evolution.cc
> > @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
> >    return res;
> >  }
> >
> > +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> > +
> > +void record_nonwrapping_chrec (tree chrec)
> > +{
> > +  CHREC_NOWRAP(chrec) = 1;
> > +
> > +  if (dump_file && (dump_flags & TDF_SCEV))
> > +    {
> > +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> > +      print_generic_expr (dump_file, chrec);
> > +      fprintf (dump_file, ")\n");
> > +    }
> > +}
> > +
> > +/* Return true if CHREC's nonwrapping flag is set.  */
> > +
> > +bool nonwrapping_chrec_p (tree chrec)
> > +{
> > +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> > +    return false;
> > +
> > +  return CHREC_NOWRAP(chrec);
> > +}
> > +
> >  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
> >
> >  static tree
> > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> > index a64ed78fe63..f57fde12ee2 100644
> > --- a/gcc/tree-scalar-evolution.h
> > +++ b/gcc/tree-scalar-evolution.h
> > @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
> >                        bool);
> >  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
> >  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> > +extern void record_nonwrapping_chrec (tree);
> > +extern bool nonwrapping_chrec_p (tree);
> >
> >  /* Returns the basic block preceding LOOP, or the CFG entry block when
> >     the loop is function's body.  */
> > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > index 2098bef9a97..d465e0ed7e1 100644
> > --- a/gcc/tree-ssa-loop-niter.cc
> > +++ b/gcc/tree-ssa-loop-niter.cc
> > @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
> >
> >    /* If access is not executed on every iteration, we must ensure that overlow
> >       may not make the access valid later.  */
> > -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> > -      && scev_probably_wraps_p (NULL_TREE,
> > -                               initial_condition_in_loop_num (ev, loop->num),
> > -                               step, data->stmt, loop, true))
> > -    upper = false;
> > +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> > +    {
> > +      if (scev_probably_wraps_p (NULL_TREE,
> > +                                initial_condition_in_loop_num (ev, loop->num),
> > +                                step, data->stmt, loop, true))
> > +       upper = false;
> > +    }
> > +  else
> > +    record_nonwrapping_chrec (ev);
> >
> >    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
> >    return true;
> > @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
> >    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
> >      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
> >
> > +  record_nonwrapping_chrec (scev);
> >    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
> >  }
> >
> > @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
> >        high = wide_int_to_tree (type, r.upper_bound ());
> >      }
> >
> > +  record_nonwrapping_chrec (scev);
> >    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
> >  }
> >
> > @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
> >    if (loop_exits_before_overflow (base, step, at_stmt, loop))
> >      return false;
> >
> > +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> > +     above loop exits before overflow).  */
> > +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> > +    return false;
> > +
> >    /* At this point we still don't have a proof that the iv does not
> >       overflow: give up.  */
> >    return true;
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index dd584ab4a42..6261cd1be1d 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
> >          analysis are done under the assumptions.  */
> >        loop_constraint_set (loop, LOOP_C_FINITE);
> >      }
> > +  else
> > +    /* Clear the existing niter information to make sure the nonwrapping flag
> > +       will be calculated and set propriately.  */
> > +    free_numbers_of_iterations_estimates (loop);
> >
> >    auto_vector_modes vector_modes;
> >    /* Autodetect first vector size we try.  */
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index 086b55f0375..59af8920f02 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
> >  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
> >
> >  /* Accessors for the chains of recurrences.  */
> > -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> > -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> > -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> > +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> > +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> > +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> > +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> > +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
> >
> >  /* LABEL_EXPR accessor. This gives access to the label associated with
> >     the given label expression.  */
> > --
> > 2.34.1
> >
  

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
new file mode 100644
index 00000000000..96ea36e4c65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-vect-scev" } */
+
+int A[1024 * 2];
+
+int foo (unsigned offset, unsigned N)
+{
+  int sum = 0;
+
+  for (unsigned i = 0; i < N; i++)
+    sum += A[i + offset];
+
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
+/* { dg-final { scan-tree-dump-not "missed:  failed: evolution of offset is not affine" "vect" } } */
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 1fadd752d05..0dabb6d1580 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -3488,7 +3488,7 @@  dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
       dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
       pp_string (pp, ", +, ");
       dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
-      pp_string (pp, "}_");
+      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
       pp_scalar (pp, "%u", CHREC_VARIABLE (node));
       is_stmt = false;
       break;
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 065bcd0743d..a9112572e0c 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -2050,6 +2050,30 @@  analyze_scalar_evolution (class loop *loop, tree var)
   return res;
 }
 
+/* If CHREC doesn't overflow, set the nonwrapping flag.  */
+
+void record_nonwrapping_chrec (tree chrec)
+{
+  CHREC_NOWRAP(chrec) = 1;
+
+  if (dump_file && (dump_flags & TDF_SCEV))
+    {
+      fprintf (dump_file, "(record_nonwrapping_chrec: ");
+      print_generic_expr (dump_file, chrec);
+      fprintf (dump_file, ")\n");
+    }
+}
+
+/* Return true if CHREC's nonwrapping flag is set.  */
+
+bool nonwrapping_chrec_p (tree chrec)
+{
+  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
+    return false;
+
+  return CHREC_NOWRAP(chrec);
+}
+
 /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
 
 static tree
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index a64ed78fe63..f57fde12ee2 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -43,6 +43,8 @@  extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
 		       bool);
 extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
+extern void record_nonwrapping_chrec (tree);
+extern bool nonwrapping_chrec_p (tree);
 
 /* Returns the basic block preceding LOOP, or the CFG entry block when
    the loop is function's body.  */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 2098bef9a97..d465e0ed7e1 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4206,11 +4206,15 @@  idx_infer_loop_bounds (tree base, tree *idx, void *dta)
 
   /* If access is not executed on every iteration, we must ensure that overlow
      may not make the access valid later.  */
-  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
-      && scev_probably_wraps_p (NULL_TREE,
-				initial_condition_in_loop_num (ev, loop->num),
-				step, data->stmt, loop, true))
-    upper = false;
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
+    {
+      if (scev_probably_wraps_p (NULL_TREE,
+				 initial_condition_in_loop_num (ev, loop->num),
+				 step, data->stmt, loop, true))
+	upper = false;
+    }
+  else
+    record_nonwrapping_chrec (ev);
 
   record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
   return true;
@@ -4324,6 +4328,7 @@  infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
   if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
     low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
 
+  record_nonwrapping_chrec (scev);
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }
 
@@ -4371,6 +4376,7 @@  infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
       high = wide_int_to_tree (type, r.upper_bound ());
     }
 
+  record_nonwrapping_chrec (scev);
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }
 
@@ -5505,6 +5511,11 @@  scev_probably_wraps_p (tree var, tree base, tree step,
   if (loop_exits_before_overflow (base, step, at_stmt, loop))
     return false;
 
+  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
+     above loop exits before overflow).  */
+  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
+    return false;
+
   /* At this point we still don't have a proof that the iv does not
      overflow: give up.  */
   return true;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index dd584ab4a42..6261cd1be1d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3570,6 +3570,10 @@  vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 	 analysis are done under the assumptions.  */
       loop_constraint_set (loop, LOOP_C_FINITE);
     }
+  else
+    /* Clear the existing niter information to make sure the nonwrapping flag
+       will be calculated and set propriately.  */
+    free_numbers_of_iterations_estimates (loop);
 
   auto_vector_modes vector_modes;
   /* Autodetect first vector size we try.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index 086b55f0375..59af8920f02 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1438,9 +1438,11 @@  class auto_suppress_location_wrappers
 #define COND_EXPR_ELSE(NODE)	(TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
 
 /* Accessors for the chains of recurrences.  */
-#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
-#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
-#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
+#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
+#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
+#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
 
 /* LABEL_EXPR accessor. This gives access to the label associated with
    the given label expression.  */