tree-optimization/65206 - dependence analysis on mixed pointer/array

Message ID q7r48q9p-134p-15oo-04-28ponssqo0s7@fhfr.qr
State New
Headers
Series tree-optimization/65206 - dependence analysis on mixed pointer/array |

Commit Message

Richard Biener Sept. 16, 2021, 12:02 p.m. UTC
  This adds the capability to analyze the dependence of mixed
pointer/array accesses.  The example is from where using a masked
load/store creates the pointer-based access when an otherwise
unconditional access is array based.  Other examples would include
accesses to an array mixed with accesses from inlined helpers
that work on pointers.

The idea is quite simple and old - analyze the data-ref indices
as if the reference was pointer-based.  The following change does
this by changing dr_analyze_indices to work on the indices
sub-structure and storing an alternate indices substructure in
each data reference.  That alternate set of indices is analyzed
lazily by initialize_data_dependence_relation when it fails to
match-up the main set of indices of two data references.
initialize_data_dependence_relation is refactored into a head
and a tail worker and changed to work on one of the indices
structures and thus away from using DR_* access macros which
continue to reference the main indices substructure.

There are quite some vectorization and loop distribution opportunities
unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
544.nab_r see amendments in what they report with -fopt-info-loop while
the rest of the specrate set sees no changes there.  Measuring runtime
for the set where changes were reported reveals nothing off-noise
besides 511.povray_r which seems to regress slightly for me
(on a Zen2 machine with -Ofast -march=native).

Changes from the [RFC] version are properly handling bitfields
that we cannot take the address of and optimization of refs
that already are MEM_REFs and thus won't see any change.  I've
also elided changing the set of vect_masked_stores targets in
favor of explicitely listing avx (but I did not verify if the
testcase passes on aarch64-sve or amdgcn).

The improves cases like the following from Povray:

   for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
     {
        VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
        Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
     }

where there is a plain array access mixed with abstraction
using T[] or T* arguments.  That should be a not too uncommon
situation in the wild.  The loop above is now vectorized and was not
without the change.

Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
built and run SPEC CPU 2017 successfully.

OK?

Thanks,
Richard.

2021-09-08  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/65206
	* tree-data-ref.h (struct data_reference): Add alt_indices,
	order it last.
	* tree-data-ref.c (dr_analyze_indices): Work on
	struct indices and get DR_REF as tree.
	(create_data_ref): Adjust.
	(initialize_data_dependence_relation): Split into head
	and tail.  When the base objects fail to match up try
	again with pointer-based analysis of indices.
	* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
	not compare the lazily computed alternate set of indices.

	* gcc.dg/torture/20210916.c: New testcase.
	* gcc.dg/vect/pr65206.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
 gcc/testsuite/gcc.dg/vect/pr65206.c     |  22 +++
 gcc/tree-data-ref.c                     | 173 ++++++++++++++++--------
 gcc/tree-data-ref.h                     |   9 +-
 gcc/tree-vectorizer.c                   |   3 +-
 5 files changed, 167 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c
  

Comments

Richard Sandiford Sept. 17, 2021, 9:40 a.m. UTC | #1
Richard Biener <rguenther@suse.de> writes:
> This adds the capability to analyze the dependence of mixed
> pointer/array accesses.  The example is from where using a masked
> load/store creates the pointer-based access when an otherwise
> unconditional access is array based.  Other examples would include
> accesses to an array mixed with accesses from inlined helpers
> that work on pointers.
>
> The idea is quite simple and old - analyze the data-ref indices
> as if the reference was pointer-based.  The following change does
> this by changing dr_analyze_indices to work on the indices
> sub-structure and storing an alternate indices substructure in
> each data reference.  That alternate set of indices is analyzed
> lazily by initialize_data_dependence_relation when it fails to
> match-up the main set of indices of two data references.
> initialize_data_dependence_relation is refactored into a head
> and a tail worker and changed to work on one of the indices
> structures and thus away from using DR_* access macros which
> continue to reference the main indices substructure.
>
> There are quite some vectorization and loop distribution opportunities
> unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
> 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
> 544.nab_r see amendments in what they report with -fopt-info-loop while
> the rest of the specrate set sees no changes there.  Measuring runtime
> for the set where changes were reported reveals nothing off-noise
> besides 511.povray_r which seems to regress slightly for me
> (on a Zen2 machine with -Ofast -march=native).
>
> Changes from the [RFC] version are properly handling bitfields
> that we cannot take the address of and optimization of refs
> that already are MEM_REFs and thus won't see any change.  I've
> also elided changing the set of vect_masked_stores targets in
> favor of explicitely listing avx (but I did not verify if the
> testcase passes on aarch64-sve or amdgcn).
>
> The improves cases like the following from Povray:
>
>    for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
>      {
>         VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
>         Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
>      }
>
> where there is a plain array access mixed with abstraction
> using T[] or T* arguments.  That should be a not too uncommon
> situation in the wild.  The loop above is now vectorized and was not
> without the change.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
> built and run SPEC CPU 2017 successfully.
>
> OK?

Took a while to page this stuff back in :-/

I guess if we're adding alt_indices to the main data_reference,
we'll need to free the access_fns in free_data_ref.  It looked like
the patch as posted might have a memory leak.

Perhaps instead we could use local indices structures for the
alt_indices and pass them in to initialize_data_dependence_relation?
Not that that's very elegant…

> Thanks,
> Richard.
>
> 2021-09-08  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/65206
> 	* tree-data-ref.h (struct data_reference): Add alt_indices,
> 	order it last.
> 	* tree-data-ref.c (dr_analyze_indices): Work on
> 	struct indices and get DR_REF as tree.
> 	(create_data_ref): Adjust.
> 	(initialize_data_dependence_relation): Split into head
> 	and tail.  When the base objects fail to match up try
> 	again with pointer-based analysis of indices.
> 	* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
> 	not compare the lazily computed alternate set of indices.
>
> 	* gcc.dg/torture/20210916.c: New testcase.
> 	* gcc.dg/vect/pr65206.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
>  gcc/testsuite/gcc.dg/vect/pr65206.c     |  22 +++
>  gcc/tree-data-ref.c                     | 173 ++++++++++++++++--------
>  gcc/tree-data-ref.h                     |   9 +-
>  gcc/tree-vectorizer.c                   |   3 +-
>  5 files changed, 167 insertions(+), 60 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c
> new file mode 100644
> index 00000000000..0ea6d45e463
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/20210916.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +
> +typedef union tree_node *tree;
> +struct tree_base {
> +  unsigned : 1;
> +  unsigned lang_flag_2 : 1;
> +};
> +struct tree_type {
> +  tree main_variant;
> +};
> +union tree_node {
> +  struct tree_base base;
> +  struct tree_type type;
> +};
> +tree finish_struct_t, finish_struct_x;
> +void finish_struct()
> +{
> +  for (; finish_struct_t->type.main_variant;)
> +    finish_struct_x->base.lang_flag_2 = 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c
> new file mode 100644
> index 00000000000..3b6262622c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */
> +/* { dg-additional-options "-mavx" { target avx } } */
> +
> +#define N 1024
> +
> +double a[N], b[N];
> +
> +void foo ()
> +{
> +  for (int i = 0; i < N; ++i)
> +    if (b[i] < 3.)
> +      a[i] += b[i];
> +}
> +
> +/* We get a .MASK_STORE because while the load of a[i] does not trap
> +   the store would introduce store data races.  Make sure we still
> +   can handle the data dependence with zero distance.  */
> +
> +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */
> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> index e061baa7c20..3656b7ba80e 100644
> --- a/gcc/tree-data-ref.c
> +++ b/gcc/tree-data-ref.c
> @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "internal-fn.h"
>  #include "vr-values.h"
>  #include "range-op.h"
> +#include "tree-ssa-loop-ivopts.h"
>  
>  static struct datadep_stats
>  {
> @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base)
>     DR, analyzed in LOOP and instantiated before NEST.  */
>  
>  static void
> -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop)
>  {
> -  vec<tree> access_fns = vNULL;
> -  tree ref, op;
> -  tree base, off, access_fn;
> -
>    /* If analyzing a basic-block there are no indices to analyze
>       and thus no access functions.  */
>    if (!nest)
>      {
> -      DR_BASE_OBJECT (dr) = DR_REF (dr);
> -      DR_ACCESS_FNS (dr).create (0);
> +      dri->base_object = ref;
> +      dri->access_fns.create (0);
>        return;
>      }
>  
> -  ref = DR_REF (dr);
> +  vec<tree> access_fns = vNULL;
>  
>    /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses
>       into a two element array with a constant index.  The base is
> @@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
>      {
>        if (TREE_CODE (ref) == ARRAY_REF)
>  	{
> -	  op = TREE_OPERAND (ref, 1);
> -	  access_fn = analyze_scalar_evolution (loop, op);
> +	  tree op = TREE_OPERAND (ref, 1);
> +	  tree access_fn = analyze_scalar_evolution (loop, op);
>  	  access_fn = instantiate_scev (nest, loop, access_fn);
>  	  access_fns.safe_push (access_fn);
>  	}
> @@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
>       analyzed nest, add it as an additional independent access-function.  */
>    if (TREE_CODE (ref) == MEM_REF)
>      {
> -      op = TREE_OPERAND (ref, 0);
> -      access_fn = analyze_scalar_evolution (loop, op);
> +      tree op = TREE_OPERAND (ref, 0);
> +      tree access_fn = analyze_scalar_evolution (loop, op);
>        access_fn = instantiate_scev (nest, loop, access_fn);
>        if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
>  	{
> -	  tree orig_type;
>  	  tree memoff = TREE_OPERAND (ref, 1);
> -	  base = initial_condition (access_fn);
> -	  orig_type = TREE_TYPE (base);
> +	  tree base = initial_condition (access_fn);
> +	  tree orig_type = TREE_TYPE (base);
>  	  STRIP_USELESS_TYPE_CONVERSION (base);
> +	  tree off;
>  	  split_constant_offset (base, &base, &off);
>  	  STRIP_USELESS_TYPE_CONVERSION (base);
>  	  /* Fold the MEM_REF offset into the evolutions initial
> @@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
>  				 base, memoff);
>  	  MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old);
>  	  MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
> -	  DR_UNCONSTRAINED_BASE (dr) = true;
> +	  dri->unconstrained_base = true;
>  	  access_fns.safe_push (access_fn);
>  	}
>      }
> @@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
>  		    build_int_cst (reference_alias_ptr_type (ref), 0));
>      }
>  
> -  DR_BASE_OBJECT (dr) = ref;
> -  DR_ACCESS_FNS (dr) = access_fns;
> +  dri->base_object = ref;
> +  dri->access_fns = access_fns;
>  }
>  
>  /* Extracts the alias analysis information from the memory reference DR.  */
> @@ -1497,7 +1494,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
>  
>    dr_analyze_innermost (&DR_INNERMOST (dr), memref,
>  			nest != NULL ? loop : NULL, stmt);
> -  dr_analyze_indices (dr, nest, loop);
> +  dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop);
>    dr_analyze_alias (dr);
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -3066,41 +3063,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b)
>  			     TREE_TYPE (TREE_OPERAND (ref_b, 0)));
>  }
>  
> -/* Initialize a data dependence relation between data accesses A and
> -   B.  NB_LOOPS is the number of loops surrounding the references: the
> -   size of the classic distance/direction vectors.  */
> +/* Initialize a data dependence relation RES in LOOP_NEST.  USE_ALT_INDICES
> +   is true when the main indices of A and B were not comparable so we try again
> +   with alternate indices computed on an indirect reference.  */
>  
>  struct data_dependence_relation *
> -initialize_data_dependence_relation (struct data_reference *a,
> -				     struct data_reference *b,
> - 				     vec<loop_p> loop_nest)
> +initialize_data_dependence_relation (struct data_dependence_relation *res,
> +				     vec<loop_p> loop_nest,
> +				     bool use_alt_indices)
>  {
> -  struct data_dependence_relation *res;
> +  struct data_reference *a = DDR_A (res);
> +  struct data_reference *b = DDR_B (res);
>    unsigned int i;
>  
> -  res = XCNEW (struct data_dependence_relation);
> -  DDR_A (res) = a;
> -  DDR_B (res) = b;
> -  DDR_LOOP_NEST (res).create (0);
> -  DDR_SUBSCRIPTS (res).create (0);
> -  DDR_DIR_VECTS (res).create (0);
> -  DDR_DIST_VECTS (res).create (0);
> -
> -  if (a == NULL || b == NULL)
> +  struct indices *indices_a = &a->indices;
> +  struct indices *indices_b = &b->indices;
> +  if (use_alt_indices)
>      {
> -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> -      return res;
> +      if (TREE_CODE (DR_REF (a)) != MEM_REF)
> +	indices_a = &a->alt_indices;
> +      if (TREE_CODE (DR_REF (b)) != MEM_REF)
> +	indices_b = &b->alt_indices;
>      }

BTW, I didn't audit this to check that every DR_* macro access had
been updated :-)

> -
> -  /* If the data references do not alias, then they are independent.  */
> -  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
> -    {
> -      DDR_ARE_DEPENDENT (res) = chrec_known;
> -      return res;
> -    }
> -
> -  unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
> -  unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
> +  unsigned int num_dimensions_a = indices_a->access_fns.length ();
> +  unsigned int num_dimensions_b = indices_b->access_fns.length ();
>    if (num_dimensions_a == 0 || num_dimensions_b == 0)
>      {
>        DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> @@ -3125,9 +3111,9 @@ initialize_data_dependence_relation (struct data_reference *a,
>  
>       the a and b accesses have a single ARRAY_REF component reference [0]
>       but have two subscripts.  */
> -  if (DR_UNCONSTRAINED_BASE (a))
> +  if (indices_a->unconstrained_base)
>      num_dimensions_a -= 1;
> -  if (DR_UNCONSTRAINED_BASE (b))
> +  if (indices_b->unconstrained_base)
>      num_dimensions_b -= 1;
>  
>    /* These structures describe sequences of component references in
> @@ -3210,6 +3196,10 @@ initialize_data_dependence_relation (struct data_reference *a,
>          B: [3, 4]  (i.e. s.e)  */
>    while (index_a < num_dimensions_a && index_b < num_dimensions_b)
>      {
> +      /* The alternate indices form always has a single dimension
> +	 with unconstrained base.  */
> +      gcc_assert (!use_alt_indices);
> +
>        /* REF_A and REF_B must be one of the component access types
>  	 allowed by dr_analyze_indices.  */
>        gcc_checking_assert (access_fn_component_p (ref_a));
> @@ -3280,11 +3270,12 @@ initialize_data_dependence_relation (struct data_reference *a,
>    /* See whether FULL_SEQ ends at the base and whether the two bases
>       are equal.  We do not care about TBAA or alignment info so we can
>       use OEP_ADDRESS_OF to avoid false negatives.  */
> -  tree base_a = DR_BASE_OBJECT (a);
> -  tree base_b = DR_BASE_OBJECT (b);
> +  tree base_a = indices_a->base_object;
> +  tree base_b = indices_b->base_object;
>    bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
>  		      && full_seq.start_b + full_seq.length == num_dimensions_b
> -		      && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
> +		      && (indices_a->unconstrained_base
> +			  == indices_b->unconstrained_base)
>  		      && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>  		      && (types_compatible_p (TREE_TYPE (base_a),
>  					      TREE_TYPE (base_b))
> @@ -3323,7 +3314,7 @@ initialize_data_dependence_relation (struct data_reference *a,
>       both lvalues are distinct from the object's declared type.  */
>    if (same_base_p)
>      {
> -      if (DR_UNCONSTRAINED_BASE (a))
> +      if (indices_a->unconstrained_base)
>  	full_seq.length += 1;
>      }
>    else
> @@ -3332,8 +3323,42 @@ initialize_data_dependence_relation (struct data_reference *a,
>    /* Punt if we didn't find a suitable sequence.  */
>    if (full_seq.length == 0)
>      {
> -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> -      return res;
> +      if (use_alt_indices
> +	  || !param_data_dep_alt_indices

Didn't look like the patch has a definition of this.  Did you decide
to add a --param, or to ditch an earlier one?

> +	  || (TREE_CODE (DR_REF (a)) == MEM_REF
> +	      && TREE_CODE (DR_REF (b)) == MEM_REF)

Might be a daft question, but do we gain anything by doing this
when neither reference is a MEM_REF?  I.e. could the && be a !=?

Looks OK to me otherwise.

Thanks,
Richard

> +	  || may_be_nonaddressable_p (DR_REF (a))
> +	  || may_be_nonaddressable_p (DR_REF (b)))
> +	{
> +	  /* Fully exhausted possibilities.  */
> +	  DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> +	  return res;
> +	}
> +
> +      /* Try evaluating both DRs as dereferences of pointers.  */
> +      if (!a->alt_indices.base_object
> +	  && TREE_CODE (DR_REF (a)) != MEM_REF)
> +	{
> +	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
> +				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
> +				 build_int_cst
> +				   (reference_alias_ptr_type (DR_REF (a)), 0));
> +	  dr_analyze_indices (&a->alt_indices, alt_ref,
> +			      loop_preheader_edge (loop_nest[0]),
> +			      loop_containing_stmt (DR_STMT (a)));
> +	}
> +      if (!b->alt_indices.base_object
> +	  && TREE_CODE (DR_REF (b)) != MEM_REF)
> +	{
> +	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
> +				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
> +				 build_int_cst
> +				   (reference_alias_ptr_type (DR_REF (b)), 0));
> +	  dr_analyze_indices (&b->alt_indices, alt_ref,
> +			      loop_preheader_edge (loop_nest[0]),
> +			      loop_containing_stmt (DR_STMT (b)));
> +	}
> +      return initialize_data_dependence_relation (res, loop_nest, true);
>      }
>  
>    if (!same_base_p)
> @@ -3381,8 +3406,8 @@ initialize_data_dependence_relation (struct data_reference *a,
>        struct subscript *subscript;
>  
>        subscript = XNEW (struct subscript);
> -      SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
> -      SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
> +      SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i];
> +      SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i];
>        SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
>        SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
>        SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
> @@ -3393,6 +3418,40 @@ initialize_data_dependence_relation (struct data_reference *a,
>    return res;
>  }
>  
> +/* Initialize a data dependence relation between data accesses A and
> +   B.  NB_LOOPS is the number of loops surrounding the references: the
> +   size of the classic distance/direction vectors.  */
> +
> +struct data_dependence_relation *
> +initialize_data_dependence_relation (struct data_reference *a,
> +				     struct data_reference *b,
> +				     vec<loop_p> loop_nest)
> +{
> +  data_dependence_relation *res = XCNEW (struct data_dependence_relation);
> +  DDR_A (res) = a;
> +  DDR_B (res) = b;
> +  DDR_LOOP_NEST (res).create (0);
> +  DDR_SUBSCRIPTS (res).create (0);
> +  DDR_DIR_VECTS (res).create (0);
> +  DDR_DIST_VECTS (res).create (0);
> +
> +  if (a == NULL || b == NULL)
> +    {
> +      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> +      return res;
> +    }
> +
> +  /* If the data references do not alias, then they are independent.  */
> +  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
> +    {
> +      DDR_ARE_DEPENDENT (res) = chrec_known;
> +      return res;
> +    }
> +
> +  return initialize_data_dependence_relation (res, loop_nest, false);
> +}
> +
> +
>  /* Frees memory used by the conflict function F.  */
>  
>  static void
> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> index 685f33d85ae..74f579c9f3f 100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -166,14 +166,19 @@ struct data_reference
>       and runs to completion.  */
>    bool is_conditional_in_stmt;
>  
> +  /* Alias information for the data reference.  */
> +  struct dr_alias alias;
> +
>    /* Behavior of the memory reference in the innermost loop.  */
>    struct innermost_loop_behavior innermost;
>  
>    /* Subscripts of this data reference.  */
>    struct indices indices;
>  
> -  /* Alias information for the data reference.  */
> -  struct dr_alias alias;
> +  /* Alternate subscripts initialized lazily and used by data-dependence
> +     analysis only when the main indices of two DRs are not comparable.
> +     Keep last to keep vec_info_shared::check_datarefs happy.  */
> +  struct indices alt_indices;
>  };
>  
>  #define DR_STMT(DR)                (DR)->stmt
> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index 3aa3e2a6783..20daa31187d 100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs ()
>      return;
>    gcc_assert (datarefs.length () == datarefs_copy.length ());
>    for (unsigned i = 0; i < datarefs.length (); ++i)
> -    if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0)
> +    if (memcmp (&datarefs_copy[i], datarefs[i],
> +		offsetof (data_reference, alt_indices)) != 0)
>        gcc_unreachable ();
>  }
  
Richard Biener Sept. 17, 2021, 12:19 p.m. UTC | #2
On Fri, 17 Sep 2021, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > This adds the capability to analyze the dependence of mixed
> > pointer/array accesses.  The example is from where using a masked
> > load/store creates the pointer-based access when an otherwise
> > unconditional access is array based.  Other examples would include
> > accesses to an array mixed with accesses from inlined helpers
> > that work on pointers.
> >
> > The idea is quite simple and old - analyze the data-ref indices
> > as if the reference was pointer-based.  The following change does
> > this by changing dr_analyze_indices to work on the indices
> > sub-structure and storing an alternate indices substructure in
> > each data reference.  That alternate set of indices is analyzed
> > lazily by initialize_data_dependence_relation when it fails to
> > match-up the main set of indices of two data references.
> > initialize_data_dependence_relation is refactored into a head
> > and a tail worker and changed to work on one of the indices
> > structures and thus away from using DR_* access macros which
> > continue to reference the main indices substructure.
> >
> > There are quite some vectorization and loop distribution opportunities
> > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
> > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
> > 544.nab_r see amendments in what they report with -fopt-info-loop while
> > the rest of the specrate set sees no changes there.  Measuring runtime
> > for the set where changes were reported reveals nothing off-noise
> > besides 511.povray_r which seems to regress slightly for me
> > (on a Zen2 machine with -Ofast -march=native).
> >
> > Changes from the [RFC] version are properly handling bitfields
> > that we cannot take the address of and optimization of refs
> > that already are MEM_REFs and thus won't see any change.  I've
> > also elided changing the set of vect_masked_stores targets in
> > favor of explicitely listing avx (but I did not verify if the
> > testcase passes on aarch64-sve or amdgcn).
> >
> > The improves cases like the following from Povray:
> >
> >    for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
> >      {
> >         VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
> >         Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
> >      }
> >
> > where there is a plain array access mixed with abstraction
> > using T[] or T* arguments.  That should be a not too uncommon
> > situation in the wild.  The loop above is now vectorized and was not
> > without the change.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
> > built and run SPEC CPU 2017 successfully.
> >
> > OK?
> 
> Took a while to page this stuff back in :-/
> 
> I guess if we're adding alt_indices to the main data_reference,
> we'll need to free the access_fns in free_data_ref.  It looked like
> the patch as posted might have a memory leak.

Doh, yes - thanks for noticing.

> Perhaps instead we could use local indices structures for the
> alt_indices and pass them in to initialize_data_dependence_relation?
> Not that that's very elegant…

Yeah, I had that but then since for N data references we possibly
call initialize_data_dependence_relation N^2 times we'd do this
alternate analysis N^2 times at worst instead of N so it looked worth
caching it in the data reference.  Since we have no idea why the
first try fails we're going to try this fallback in the majority
of cases that we cannot figure out otherwise so I didn't manage
to hand-wave the quadraticness away ;)  OTOH one might argue
it's just a constant factor ontop of the number of
initialize_data_dependence_relation invocations.

So I can probably be convinced either way - guess I'll gather
some statistics.

Richard.

> 
> > Thanks,
> > Richard.
> >
> > 2021-09-08  Richard Biener  <rguenther@suse.de>
> >
> > 	PR tree-optimization/65206
> > 	* tree-data-ref.h (struct data_reference): Add alt_indices,
> > 	order it last.
> > 	* tree-data-ref.c (dr_analyze_indices): Work on
> > 	struct indices and get DR_REF as tree.
> > 	(create_data_ref): Adjust.
> > 	(initialize_data_dependence_relation): Split into head
> > 	and tail.  When the base objects fail to match up try
> > 	again with pointer-based analysis of indices.
> > 	* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
> > 	not compare the lazily computed alternate set of indices.
> >
> > 	* gcc.dg/torture/20210916.c: New testcase.
> > 	* gcc.dg/vect/pr65206.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
> >  gcc/testsuite/gcc.dg/vect/pr65206.c     |  22 +++
> >  gcc/tree-data-ref.c                     | 173 ++++++++++++++++--------
> >  gcc/tree-data-ref.h                     |   9 +-
> >  gcc/tree-vectorizer.c                   |   3 +-
> >  5 files changed, 167 insertions(+), 60 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c
> > new file mode 100644
> > index 00000000000..0ea6d45e463
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/20210916.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +
> > +typedef union tree_node *tree;
> > +struct tree_base {
> > +  unsigned : 1;
> > +  unsigned lang_flag_2 : 1;
> > +};
> > +struct tree_type {
> > +  tree main_variant;
> > +};
> > +union tree_node {
> > +  struct tree_base base;
> > +  struct tree_type type;
> > +};
> > +tree finish_struct_t, finish_struct_x;
> > +void finish_struct()
> > +{
> > +  for (; finish_struct_t->type.main_variant;)
> > +    finish_struct_x->base.lang_flag_2 = 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c
> > new file mode 100644
> > index 00000000000..3b6262622c0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
> > @@ -0,0 +1,22 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_double } */
> > +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */
> > +/* { dg-additional-options "-mavx" { target avx } } */
> > +
> > +#define N 1024
> > +
> > +double a[N], b[N];
> > +
> > +void foo ()
> > +{
> > +  for (int i = 0; i < N; ++i)
> > +    if (b[i] < 3.)
> > +      a[i] += b[i];
> > +}
> > +
> > +/* We get a .MASK_STORE because while the load of a[i] does not trap
> > +   the store would introduce store data races.  Make sure we still
> > +   can handle the data dependence with zero distance.  */
> > +
> > +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */
> > diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> > index e061baa7c20..3656b7ba80e 100644
> > --- a/gcc/tree-data-ref.c
> > +++ b/gcc/tree-data-ref.c
> > @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "internal-fn.h"
> >  #include "vr-values.h"
> >  #include "range-op.h"
> > +#include "tree-ssa-loop-ivopts.h"
> >  
> >  static struct datadep_stats
> >  {
> > @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base)
> >     DR, analyzed in LOOP and instantiated before NEST.  */
> >  
> >  static void
> > -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> > +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop)
> >  {
> > -  vec<tree> access_fns = vNULL;
> > -  tree ref, op;
> > -  tree base, off, access_fn;
> > -
> >    /* If analyzing a basic-block there are no indices to analyze
> >       and thus no access functions.  */
> >    if (!nest)
> >      {
> > -      DR_BASE_OBJECT (dr) = DR_REF (dr);
> > -      DR_ACCESS_FNS (dr).create (0);
> > +      dri->base_object = ref;
> > +      dri->access_fns.create (0);
> >        return;
> >      }
> >  
> > -  ref = DR_REF (dr);
> > +  vec<tree> access_fns = vNULL;
> >  
> >    /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses
> >       into a two element array with a constant index.  The base is
> > @@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> >      {
> >        if (TREE_CODE (ref) == ARRAY_REF)
> >  	{
> > -	  op = TREE_OPERAND (ref, 1);
> > -	  access_fn = analyze_scalar_evolution (loop, op);
> > +	  tree op = TREE_OPERAND (ref, 1);
> > +	  tree access_fn = analyze_scalar_evolution (loop, op);
> >  	  access_fn = instantiate_scev (nest, loop, access_fn);
> >  	  access_fns.safe_push (access_fn);
> >  	}
> > @@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> >       analyzed nest, add it as an additional independent access-function.  */
> >    if (TREE_CODE (ref) == MEM_REF)
> >      {
> > -      op = TREE_OPERAND (ref, 0);
> > -      access_fn = analyze_scalar_evolution (loop, op);
> > +      tree op = TREE_OPERAND (ref, 0);
> > +      tree access_fn = analyze_scalar_evolution (loop, op);
> >        access_fn = instantiate_scev (nest, loop, access_fn);
> >        if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
> >  	{
> > -	  tree orig_type;
> >  	  tree memoff = TREE_OPERAND (ref, 1);
> > -	  base = initial_condition (access_fn);
> > -	  orig_type = TREE_TYPE (base);
> > +	  tree base = initial_condition (access_fn);
> > +	  tree orig_type = TREE_TYPE (base);
> >  	  STRIP_USELESS_TYPE_CONVERSION (base);
> > +	  tree off;
> >  	  split_constant_offset (base, &base, &off);
> >  	  STRIP_USELESS_TYPE_CONVERSION (base);
> >  	  /* Fold the MEM_REF offset into the evolutions initial
> > @@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> >  				 base, memoff);
> >  	  MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old);
> >  	  MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
> > -	  DR_UNCONSTRAINED_BASE (dr) = true;
> > +	  dri->unconstrained_base = true;
> >  	  access_fns.safe_push (access_fn);
> >  	}
> >      }
> > @@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> >  		    build_int_cst (reference_alias_ptr_type (ref), 0));
> >      }
> >  
> > -  DR_BASE_OBJECT (dr) = ref;
> > -  DR_ACCESS_FNS (dr) = access_fns;
> > +  dri->base_object = ref;
> > +  dri->access_fns = access_fns;
> >  }
> >  
> >  /* Extracts the alias analysis information from the memory reference DR.  */
> > @@ -1497,7 +1494,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
> >  
> >    dr_analyze_innermost (&DR_INNERMOST (dr), memref,
> >  			nest != NULL ? loop : NULL, stmt);
> > -  dr_analyze_indices (dr, nest, loop);
> > +  dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop);
> >    dr_analyze_alias (dr);
> >  
> >    if (dump_file && (dump_flags & TDF_DETAILS))
> > @@ -3066,41 +3063,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b)
> >  			     TREE_TYPE (TREE_OPERAND (ref_b, 0)));
> >  }
> >  
> > -/* Initialize a data dependence relation between data accesses A and
> > -   B.  NB_LOOPS is the number of loops surrounding the references: the
> > -   size of the classic distance/direction vectors.  */
> > +/* Initialize a data dependence relation RES in LOOP_NEST.  USE_ALT_INDICES
> > +   is true when the main indices of A and B were not comparable so we try again
> > +   with alternate indices computed on an indirect reference.  */
> >  
> >  struct data_dependence_relation *
> > -initialize_data_dependence_relation (struct data_reference *a,
> > -				     struct data_reference *b,
> > - 				     vec<loop_p> loop_nest)
> > +initialize_data_dependence_relation (struct data_dependence_relation *res,
> > +				     vec<loop_p> loop_nest,
> > +				     bool use_alt_indices)
> >  {
> > -  struct data_dependence_relation *res;
> > +  struct data_reference *a = DDR_A (res);
> > +  struct data_reference *b = DDR_B (res);
> >    unsigned int i;
> >  
> > -  res = XCNEW (struct data_dependence_relation);
> > -  DDR_A (res) = a;
> > -  DDR_B (res) = b;
> > -  DDR_LOOP_NEST (res).create (0);
> > -  DDR_SUBSCRIPTS (res).create (0);
> > -  DDR_DIR_VECTS (res).create (0);
> > -  DDR_DIST_VECTS (res).create (0);
> > -
> > -  if (a == NULL || b == NULL)
> > +  struct indices *indices_a = &a->indices;
> > +  struct indices *indices_b = &b->indices;
> > +  if (use_alt_indices)
> >      {
> > -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > -      return res;
> > +      if (TREE_CODE (DR_REF (a)) != MEM_REF)
> > +	indices_a = &a->alt_indices;
> > +      if (TREE_CODE (DR_REF (b)) != MEM_REF)
> > +	indices_b = &b->alt_indices;
> >      }
> 
> BTW, I didn't audit this to check that every DR_* macro access had
> been updated :-)
> 
> > -
> > -  /* If the data references do not alias, then they are independent.  */
> > -  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
> > -    {
> > -      DDR_ARE_DEPENDENT (res) = chrec_known;
> > -      return res;
> > -    }
> > -
> > -  unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
> > -  unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
> > +  unsigned int num_dimensions_a = indices_a->access_fns.length ();
> > +  unsigned int num_dimensions_b = indices_b->access_fns.length ();
> >    if (num_dimensions_a == 0 || num_dimensions_b == 0)
> >      {
> >        DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > @@ -3125,9 +3111,9 @@ initialize_data_dependence_relation (struct data_reference *a,
> >  
> >       the a and b accesses have a single ARRAY_REF component reference [0]
> >       but have two subscripts.  */
> > -  if (DR_UNCONSTRAINED_BASE (a))
> > +  if (indices_a->unconstrained_base)
> >      num_dimensions_a -= 1;
> > -  if (DR_UNCONSTRAINED_BASE (b))
> > +  if (indices_b->unconstrained_base)
> >      num_dimensions_b -= 1;
> >  
> >    /* These structures describe sequences of component references in
> > @@ -3210,6 +3196,10 @@ initialize_data_dependence_relation (struct data_reference *a,
> >          B: [3, 4]  (i.e. s.e)  */
> >    while (index_a < num_dimensions_a && index_b < num_dimensions_b)
> >      {
> > +      /* The alternate indices form always has a single dimension
> > +	 with unconstrained base.  */
> > +      gcc_assert (!use_alt_indices);
> > +
> >        /* REF_A and REF_B must be one of the component access types
> >  	 allowed by dr_analyze_indices.  */
> >        gcc_checking_assert (access_fn_component_p (ref_a));
> > @@ -3280,11 +3270,12 @@ initialize_data_dependence_relation (struct data_reference *a,
> >    /* See whether FULL_SEQ ends at the base and whether the two bases
> >       are equal.  We do not care about TBAA or alignment info so we can
> >       use OEP_ADDRESS_OF to avoid false negatives.  */
> > -  tree base_a = DR_BASE_OBJECT (a);
> > -  tree base_b = DR_BASE_OBJECT (b);
> > +  tree base_a = indices_a->base_object;
> > +  tree base_b = indices_b->base_object;
> >    bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
> >  		      && full_seq.start_b + full_seq.length == num_dimensions_b
> > -		      && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
> > +		      && (indices_a->unconstrained_base
> > +			  == indices_b->unconstrained_base)
> >  		      && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
> >  		      && (types_compatible_p (TREE_TYPE (base_a),
> >  					      TREE_TYPE (base_b))
> > @@ -3323,7 +3314,7 @@ initialize_data_dependence_relation (struct data_reference *a,
> >       both lvalues are distinct from the object's declared type.  */
> >    if (same_base_p)
> >      {
> > -      if (DR_UNCONSTRAINED_BASE (a))
> > +      if (indices_a->unconstrained_base)
> >  	full_seq.length += 1;
> >      }
> >    else
> > @@ -3332,8 +3323,42 @@ initialize_data_dependence_relation (struct data_reference *a,
> >    /* Punt if we didn't find a suitable sequence.  */
> >    if (full_seq.length == 0)
> >      {
> > -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > -      return res;
> > +      if (use_alt_indices
> > +	  || !param_data_dep_alt_indices
> 
> Didn't look like the patch has a definition of this.  Did you decide
> to add a --param, or to ditch an earlier one?
> 
> > +	  || (TREE_CODE (DR_REF (a)) == MEM_REF
> > +	      && TREE_CODE (DR_REF (b)) == MEM_REF)
> 
> Might be a daft question, but do we gain anything by doing this
> when neither reference is a MEM_REF?  I.e. could the && be a !=?
> 
> Looks OK to me otherwise.
> 
> Thanks,
> Richard
> 
> > +	  || may_be_nonaddressable_p (DR_REF (a))
> > +	  || may_be_nonaddressable_p (DR_REF (b)))
> > +	{
> > +	  /* Fully exhausted possibilities.  */
> > +	  DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > +	  return res;
> > +	}
> > +
> > +      /* Try evaluating both DRs as dereferences of pointers.  */
> > +      if (!a->alt_indices.base_object
> > +	  && TREE_CODE (DR_REF (a)) != MEM_REF)
> > +	{
> > +	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
> > +				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
> > +				 build_int_cst
> > +				   (reference_alias_ptr_type (DR_REF (a)), 0));
> > +	  dr_analyze_indices (&a->alt_indices, alt_ref,
> > +			      loop_preheader_edge (loop_nest[0]),
> > +			      loop_containing_stmt (DR_STMT (a)));
> > +	}
> > +      if (!b->alt_indices.base_object
> > +	  && TREE_CODE (DR_REF (b)) != MEM_REF)
> > +	{
> > +	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
> > +				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
> > +				 build_int_cst
> > +				   (reference_alias_ptr_type (DR_REF (b)), 0));
> > +	  dr_analyze_indices (&b->alt_indices, alt_ref,
> > +			      loop_preheader_edge (loop_nest[0]),
> > +			      loop_containing_stmt (DR_STMT (b)));
> > +	}
> > +      return initialize_data_dependence_relation (res, loop_nest, true);
> >      }
> >  
> >    if (!same_base_p)
> > @@ -3381,8 +3406,8 @@ initialize_data_dependence_relation (struct data_reference *a,
> >        struct subscript *subscript;
> >  
> >        subscript = XNEW (struct subscript);
> > -      SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
> > -      SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
> > +      SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i];
> > +      SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i];
> >        SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
> >        SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
> >        SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
> > @@ -3393,6 +3418,40 @@ initialize_data_dependence_relation (struct data_reference *a,
> >    return res;
> >  }
> >  
> > +/* Initialize a data dependence relation between data accesses A and
> > +   B.  NB_LOOPS is the number of loops surrounding the references: the
> > +   size of the classic distance/direction vectors.  */
> > +
> > +struct data_dependence_relation *
> > +initialize_data_dependence_relation (struct data_reference *a,
> > +				     struct data_reference *b,
> > +				     vec<loop_p> loop_nest)
> > +{
> > +  data_dependence_relation *res = XCNEW (struct data_dependence_relation);
> > +  DDR_A (res) = a;
> > +  DDR_B (res) = b;
> > +  DDR_LOOP_NEST (res).create (0);
> > +  DDR_SUBSCRIPTS (res).create (0);
> > +  DDR_DIR_VECTS (res).create (0);
> > +  DDR_DIST_VECTS (res).create (0);
> > +
> > +  if (a == NULL || b == NULL)
> > +    {
> > +      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > +      return res;
> > +    }
> > +
> > +  /* If the data references do not alias, then they are independent.  */
> > +  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
> > +    {
> > +      DDR_ARE_DEPENDENT (res) = chrec_known;
> > +      return res;
> > +    }
> > +
> > +  return initialize_data_dependence_relation (res, loop_nest, false);
> > +}
> > +
> > +
> >  /* Frees memory used by the conflict function F.  */
> >  
> >  static void
> > diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> > index 685f33d85ae..74f579c9f3f 100644
> > --- a/gcc/tree-data-ref.h
> > +++ b/gcc/tree-data-ref.h
> > @@ -166,14 +166,19 @@ struct data_reference
> >       and runs to completion.  */
> >    bool is_conditional_in_stmt;
> >  
> > +  /* Alias information for the data reference.  */
> > +  struct dr_alias alias;
> > +
> >    /* Behavior of the memory reference in the innermost loop.  */
> >    struct innermost_loop_behavior innermost;
> >  
> >    /* Subscripts of this data reference.  */
> >    struct indices indices;
> >  
> > -  /* Alias information for the data reference.  */
> > -  struct dr_alias alias;
> > +  /* Alternate subscripts initialized lazily and used by data-dependence
> > +     analysis only when the main indices of two DRs are not comparable.
> > +     Keep last to keep vec_info_shared::check_datarefs happy.  */
> > +  struct indices alt_indices;
> >  };
> >  
> >  #define DR_STMT(DR)                (DR)->stmt
> > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> > index 3aa3e2a6783..20daa31187d 100644
> > --- a/gcc/tree-vectorizer.c
> > +++ b/gcc/tree-vectorizer.c
> > @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs ()
> >      return;
> >    gcc_assert (datarefs.length () == datarefs_copy.length ());
> >    for (unsigned i = 0; i < datarefs.length (); ++i)
> > -    if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0)
> > +    if (memcmp (&datarefs_copy[i], datarefs[i],
> > +		offsetof (data_reference, alt_indices)) != 0)
> >        gcc_unreachable ();
> >  }
>
  
Richard Biener Sept. 17, 2021, 1:44 p.m. UTC | #3
On Fri, 17 Sep 2021, Richard Biener wrote:

> On Fri, 17 Sep 2021, Richard Sandiford wrote:
> 
> > Richard Biener <rguenther@suse.de> writes:
> > > This adds the capability to analyze the dependence of mixed
> > > pointer/array accesses.  The example is from where using a masked
> > > load/store creates the pointer-based access when an otherwise
> > > unconditional access is array based.  Other examples would include
> > > accesses to an array mixed with accesses from inlined helpers
> > > that work on pointers.
> > >
> > > The idea is quite simple and old - analyze the data-ref indices
> > > as if the reference was pointer-based.  The following change does
> > > this by changing dr_analyze_indices to work on the indices
> > > sub-structure and storing an alternate indices substructure in
> > > each data reference.  That alternate set of indices is analyzed
> > > lazily by initialize_data_dependence_relation when it fails to
> > > match-up the main set of indices of two data references.
> > > initialize_data_dependence_relation is refactored into a head
> > > and a tail worker and changed to work on one of the indices
> > > structures and thus away from using DR_* access macros which
> > > continue to reference the main indices substructure.
> > >
> > > There are quite some vectorization and loop distribution opportunities
> > > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
> > > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
> > > 544.nab_r see amendments in what they report with -fopt-info-loop while
> > > the rest of the specrate set sees no changes there.  Measuring runtime
> > > for the set where changes were reported reveals nothing off-noise
> > > besides 511.povray_r which seems to regress slightly for me
> > > (on a Zen2 machine with -Ofast -march=native).
> > >
> > > Changes from the [RFC] version are properly handling bitfields
> > > that we cannot take the address of and optimization of refs
> > > that already are MEM_REFs and thus won't see any change.  I've
> > > also elided changing the set of vect_masked_stores targets in
> > > favor of explicitely listing avx (but I did not verify if the
> > > testcase passes on aarch64-sve or amdgcn).
> > >
> > > The improves cases like the following from Povray:
> > >
> > >    for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
> > >      {
> > >         VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
> > >         Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
> > >      }
> > >
> > > where there is a plain array access mixed with abstraction
> > > using T[] or T* arguments.  That should be a not too uncommon
> > > situation in the wild.  The loop above is now vectorized and was not
> > > without the change.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
> > > built and run SPEC CPU 2017 successfully.
> > >
> > > OK?
> > 
> > Took a while to page this stuff back in :-/
> > 
> > I guess if we're adding alt_indices to the main data_reference,
> > we'll need to free the access_fns in free_data_ref.  It looked like
> > the patch as posted might have a memory leak.
> 
> Doh, yes - thanks for noticing.
> 
> > Perhaps instead we could use local indices structures for the
> > alt_indices and pass them in to initialize_data_dependence_relation?
> > Not that that's very elegant…
> 
> Yeah, I had that but then since for N data references we possibly
> call initialize_data_dependence_relation N^2 times we'd do this
> alternate analysis N^2 times at worst instead of N so it looked worth
> caching it in the data reference.  Since we have no idea why the
> first try fails we're going to try this fallback in the majority
> of cases that we cannot figure out otherwise so I didn't manage
> to hand-wave the quadraticness away ;)  OTOH one might argue
> it's just a constant factor ontop of the number of
> initialize_data_dependence_relation invocations.
> 
> So I can probably be convinced either way - guess I'll gather
> some statistics.

I built SPEC 2017 CPU rate with -Ofast -march=znver2, overall there
are

 4433976     calls to the first stage initialize_data_dependence_relation
             (skipping the cases dr_may_alias returned false)
 360248 (8%) ended up recursing with a set of alt_indices
 83512       times we computed alt_indices of a DR (that's with the cache)
 14905 (0.3%) times the recursive invocation ended with != chrec_dont_know

thus when not doing the caching we'd compute alt_indices about 10 times
more often.  I didn't collect the number of distinct DRs (that's difficult
at this point), but I'd estimate from the above that we have 3 times
more "unused" alt_indices than used.

OK, so that didn't really help me avoid flipping a coin ;)

Richard.

> Richard.
> 
> > 
> > > Thanks,
> > > Richard.
> > >
> > > 2021-09-08  Richard Biener  <rguenther@suse.de>
> > >
> > > 	PR tree-optimization/65206
> > > 	* tree-data-ref.h (struct data_reference): Add alt_indices,
> > > 	order it last.
> > > 	* tree-data-ref.c (dr_analyze_indices): Work on
> > > 	struct indices and get DR_REF as tree.
> > > 	(create_data_ref): Adjust.
> > > 	(initialize_data_dependence_relation): Split into head
> > > 	and tail.  When the base objects fail to match up try
> > > 	again with pointer-based analysis of indices.
> > > 	* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
> > > 	not compare the lazily computed alternate set of indices.
> > >
> > > 	* gcc.dg/torture/20210916.c: New testcase.
> > > 	* gcc.dg/vect/pr65206.c: Likewise.
> > > ---
> > >  gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
> > >  gcc/testsuite/gcc.dg/vect/pr65206.c     |  22 +++
> > >  gcc/tree-data-ref.c                     | 173 ++++++++++++++++--------
> > >  gcc/tree-data-ref.h                     |   9 +-
> > >  gcc/tree-vectorizer.c                   |   3 +-
> > >  5 files changed, 167 insertions(+), 60 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c
> > > new file mode 100644
> > > index 00000000000..0ea6d45e463
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/torture/20210916.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +
> > > +typedef union tree_node *tree;
> > > +struct tree_base {
> > > +  unsigned : 1;
> > > +  unsigned lang_flag_2 : 1;
> > > +};
> > > +struct tree_type {
> > > +  tree main_variant;
> > > +};
> > > +union tree_node {
> > > +  struct tree_base base;
> > > +  struct tree_type type;
> > > +};
> > > +tree finish_struct_t, finish_struct_x;
> > > +void finish_struct()
> > > +{
> > > +  for (; finish_struct_t->type.main_variant;)
> > > +    finish_struct_x->base.lang_flag_2 = 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c
> > > new file mode 100644
> > > index 00000000000..3b6262622c0
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
> > > @@ -0,0 +1,22 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_double } */
> > > +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */
> > > +/* { dg-additional-options "-mavx" { target avx } } */
> > > +
> > > +#define N 1024
> > > +
> > > +double a[N], b[N];
> > > +
> > > +void foo ()
> > > +{
> > > +  for (int i = 0; i < N; ++i)
> > > +    if (b[i] < 3.)
> > > +      a[i] += b[i];
> > > +}
> > > +
> > > +/* We get a .MASK_STORE because while the load of a[i] does not trap
> > > +   the store would introduce store data races.  Make sure we still
> > > +   can handle the data dependence with zero distance.  */
> > > +
> > > +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */
> > > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */
> > > diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> > > index e061baa7c20..3656b7ba80e 100644
> > > --- a/gcc/tree-data-ref.c
> > > +++ b/gcc/tree-data-ref.c
> > > @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "internal-fn.h"
> > >  #include "vr-values.h"
> > >  #include "range-op.h"
> > > +#include "tree-ssa-loop-ivopts.h"
> > >  
> > >  static struct datadep_stats
> > >  {
> > > @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base)
> > >     DR, analyzed in LOOP and instantiated before NEST.  */
> > >  
> > >  static void
> > > -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> > > +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop)
> > >  {
> > > -  vec<tree> access_fns = vNULL;
> > > -  tree ref, op;
> > > -  tree base, off, access_fn;
> > > -
> > >    /* If analyzing a basic-block there are no indices to analyze
> > >       and thus no access functions.  */
> > >    if (!nest)
> > >      {
> > > -      DR_BASE_OBJECT (dr) = DR_REF (dr);
> > > -      DR_ACCESS_FNS (dr).create (0);
> > > +      dri->base_object = ref;
> > > +      dri->access_fns.create (0);
> > >        return;
> > >      }
> > >  
> > > -  ref = DR_REF (dr);
> > > +  vec<tree> access_fns = vNULL;
> > >  
> > >    /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses
> > >       into a two element array with a constant index.  The base is
> > > @@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> > >      {
> > >        if (TREE_CODE (ref) == ARRAY_REF)
> > >  	{
> > > -	  op = TREE_OPERAND (ref, 1);
> > > -	  access_fn = analyze_scalar_evolution (loop, op);
> > > +	  tree op = TREE_OPERAND (ref, 1);
> > > +	  tree access_fn = analyze_scalar_evolution (loop, op);
> > >  	  access_fn = instantiate_scev (nest, loop, access_fn);
> > >  	  access_fns.safe_push (access_fn);
> > >  	}
> > > @@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> > >       analyzed nest, add it as an additional independent access-function.  */
> > >    if (TREE_CODE (ref) == MEM_REF)
> > >      {
> > > -      op = TREE_OPERAND (ref, 0);
> > > -      access_fn = analyze_scalar_evolution (loop, op);
> > > +      tree op = TREE_OPERAND (ref, 0);
> > > +      tree access_fn = analyze_scalar_evolution (loop, op);
> > >        access_fn = instantiate_scev (nest, loop, access_fn);
> > >        if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
> > >  	{
> > > -	  tree orig_type;
> > >  	  tree memoff = TREE_OPERAND (ref, 1);
> > > -	  base = initial_condition (access_fn);
> > > -	  orig_type = TREE_TYPE (base);
> > > +	  tree base = initial_condition (access_fn);
> > > +	  tree orig_type = TREE_TYPE (base);
> > >  	  STRIP_USELESS_TYPE_CONVERSION (base);
> > > +	  tree off;
> > >  	  split_constant_offset (base, &base, &off);
> > >  	  STRIP_USELESS_TYPE_CONVERSION (base);
> > >  	  /* Fold the MEM_REF offset into the evolutions initial
> > > @@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> > >  				 base, memoff);
> > >  	  MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old);
> > >  	  MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
> > > -	  DR_UNCONSTRAINED_BASE (dr) = true;
> > > +	  dri->unconstrained_base = true;
> > >  	  access_fns.safe_push (access_fn);
> > >  	}
> > >      }
> > > @@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> > >  		    build_int_cst (reference_alias_ptr_type (ref), 0));
> > >      }
> > >  
> > > -  DR_BASE_OBJECT (dr) = ref;
> > > -  DR_ACCESS_FNS (dr) = access_fns;
> > > +  dri->base_object = ref;
> > > +  dri->access_fns = access_fns;
> > >  }
> > >  
> > >  /* Extracts the alias analysis information from the memory reference DR.  */
> > > @@ -1497,7 +1494,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
> > >  
> > >    dr_analyze_innermost (&DR_INNERMOST (dr), memref,
> > >  			nest != NULL ? loop : NULL, stmt);
> > > -  dr_analyze_indices (dr, nest, loop);
> > > +  dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop);
> > >    dr_analyze_alias (dr);
> > >  
> > >    if (dump_file && (dump_flags & TDF_DETAILS))
> > > @@ -3066,41 +3063,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b)
> > >  			     TREE_TYPE (TREE_OPERAND (ref_b, 0)));
> > >  }
> > >  
> > > -/* Initialize a data dependence relation between data accesses A and
> > > -   B.  NB_LOOPS is the number of loops surrounding the references: the
> > > -   size of the classic distance/direction vectors.  */
> > > +/* Initialize a data dependence relation RES in LOOP_NEST.  USE_ALT_INDICES
> > > +   is true when the main indices of A and B were not comparable so we try again
> > > +   with alternate indices computed on an indirect reference.  */
> > >  
> > >  struct data_dependence_relation *
> > > -initialize_data_dependence_relation (struct data_reference *a,
> > > -				     struct data_reference *b,
> > > - 				     vec<loop_p> loop_nest)
> > > +initialize_data_dependence_relation (struct data_dependence_relation *res,
> > > +				     vec<loop_p> loop_nest,
> > > +				     bool use_alt_indices)
> > >  {
> > > -  struct data_dependence_relation *res;
> > > +  struct data_reference *a = DDR_A (res);
> > > +  struct data_reference *b = DDR_B (res);
> > >    unsigned int i;
> > >  
> > > -  res = XCNEW (struct data_dependence_relation);
> > > -  DDR_A (res) = a;
> > > -  DDR_B (res) = b;
> > > -  DDR_LOOP_NEST (res).create (0);
> > > -  DDR_SUBSCRIPTS (res).create (0);
> > > -  DDR_DIR_VECTS (res).create (0);
> > > -  DDR_DIST_VECTS (res).create (0);
> > > -
> > > -  if (a == NULL || b == NULL)
> > > +  struct indices *indices_a = &a->indices;
> > > +  struct indices *indices_b = &b->indices;
> > > +  if (use_alt_indices)
> > >      {
> > > -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > > -      return res;
> > > +      if (TREE_CODE (DR_REF (a)) != MEM_REF)
> > > +	indices_a = &a->alt_indices;
> > > +      if (TREE_CODE (DR_REF (b)) != MEM_REF)
> > > +	indices_b = &b->alt_indices;
> > >      }
> > 
> > BTW, I didn't audit this to check that every DR_* macro access had
> > been updated :-)
> > 
> > > -
> > > -  /* If the data references do not alias, then they are independent.  */
> > > -  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
> > > -    {
> > > -      DDR_ARE_DEPENDENT (res) = chrec_known;
> > > -      return res;
> > > -    }
> > > -
> > > -  unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
> > > -  unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
> > > +  unsigned int num_dimensions_a = indices_a->access_fns.length ();
> > > +  unsigned int num_dimensions_b = indices_b->access_fns.length ();
> > >    if (num_dimensions_a == 0 || num_dimensions_b == 0)
> > >      {
> > >        DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > > @@ -3125,9 +3111,9 @@ initialize_data_dependence_relation (struct data_reference *a,
> > >  
> > >       the a and b accesses have a single ARRAY_REF component reference [0]
> > >       but have two subscripts.  */
> > > -  if (DR_UNCONSTRAINED_BASE (a))
> > > +  if (indices_a->unconstrained_base)
> > >      num_dimensions_a -= 1;
> > > -  if (DR_UNCONSTRAINED_BASE (b))
> > > +  if (indices_b->unconstrained_base)
> > >      num_dimensions_b -= 1;
> > >  
> > >    /* These structures describe sequences of component references in
> > > @@ -3210,6 +3196,10 @@ initialize_data_dependence_relation (struct data_reference *a,
> > >          B: [3, 4]  (i.e. s.e)  */
> > >    while (index_a < num_dimensions_a && index_b < num_dimensions_b)
> > >      {
> > > +      /* The alternate indices form always has a single dimension
> > > +	 with unconstrained base.  */
> > > +      gcc_assert (!use_alt_indices);
> > > +
> > >        /* REF_A and REF_B must be one of the component access types
> > >  	 allowed by dr_analyze_indices.  */
> > >        gcc_checking_assert (access_fn_component_p (ref_a));
> > > @@ -3280,11 +3270,12 @@ initialize_data_dependence_relation (struct data_reference *a,
> > >    /* See whether FULL_SEQ ends at the base and whether the two bases
> > >       are equal.  We do not care about TBAA or alignment info so we can
> > >       use OEP_ADDRESS_OF to avoid false negatives.  */
> > > -  tree base_a = DR_BASE_OBJECT (a);
> > > -  tree base_b = DR_BASE_OBJECT (b);
> > > +  tree base_a = indices_a->base_object;
> > > +  tree base_b = indices_b->base_object;
> > >    bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
> > >  		      && full_seq.start_b + full_seq.length == num_dimensions_b
> > > -		      && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
> > > +		      && (indices_a->unconstrained_base
> > > +			  == indices_b->unconstrained_base)
> > >  		      && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
> > >  		      && (types_compatible_p (TREE_TYPE (base_a),
> > >  					      TREE_TYPE (base_b))
> > > @@ -3323,7 +3314,7 @@ initialize_data_dependence_relation (struct data_reference *a,
> > >       both lvalues are distinct from the object's declared type.  */
> > >    if (same_base_p)
> > >      {
> > > -      if (DR_UNCONSTRAINED_BASE (a))
> > > +      if (indices_a->unconstrained_base)
> > >  	full_seq.length += 1;
> > >      }
> > >    else
> > > @@ -3332,8 +3323,42 @@ initialize_data_dependence_relation (struct data_reference *a,
> > >    /* Punt if we didn't find a suitable sequence.  */
> > >    if (full_seq.length == 0)
> > >      {
> > > -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > > -      return res;
> > > +      if (use_alt_indices
> > > +	  || !param_data_dep_alt_indices
> > 
> > Didn't look like the patch has a definition of this.  Did you decide
> > to add a --param, or to ditch an earlier one?
> > 
> > > +	  || (TREE_CODE (DR_REF (a)) == MEM_REF
> > > +	      && TREE_CODE (DR_REF (b)) == MEM_REF)
> > 
> > Might be a daft question, but do we gain anything by doing this
> > when neither reference is a MEM_REF?  I.e. could the && be a !=?
> > 
> > Looks OK to me otherwise.
> > 
> > Thanks,
> > Richard
> > 
> > > +	  || may_be_nonaddressable_p (DR_REF (a))
> > > +	  || may_be_nonaddressable_p (DR_REF (b)))
> > > +	{
> > > +	  /* Fully exhausted possibilities.  */
> > > +	  DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > > +	  return res;
> > > +	}
> > > +
> > > +      /* Try evaluating both DRs as dereferences of pointers.  */
> > > +      if (!a->alt_indices.base_object
> > > +	  && TREE_CODE (DR_REF (a)) != MEM_REF)
> > > +	{
> > > +	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
> > > +				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
> > > +				 build_int_cst
> > > +				   (reference_alias_ptr_type (DR_REF (a)), 0));
> > > +	  dr_analyze_indices (&a->alt_indices, alt_ref,
> > > +			      loop_preheader_edge (loop_nest[0]),
> > > +			      loop_containing_stmt (DR_STMT (a)));
> > > +	}
> > > +      if (!b->alt_indices.base_object
> > > +	  && TREE_CODE (DR_REF (b)) != MEM_REF)
> > > +	{
> > > +	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
> > > +				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
> > > +				 build_int_cst
> > > +				   (reference_alias_ptr_type (DR_REF (b)), 0));
> > > +	  dr_analyze_indices (&b->alt_indices, alt_ref,
> > > +			      loop_preheader_edge (loop_nest[0]),
> > > +			      loop_containing_stmt (DR_STMT (b)));
> > > +	}
> > > +      return initialize_data_dependence_relation (res, loop_nest, true);
> > >      }
> > >  
> > >    if (!same_base_p)
> > > @@ -3381,8 +3406,8 @@ initialize_data_dependence_relation (struct data_reference *a,
> > >        struct subscript *subscript;
> > >  
> > >        subscript = XNEW (struct subscript);
> > > -      SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
> > > -      SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
> > > +      SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i];
> > > +      SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i];
> > >        SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
> > >        SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
> > >        SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
> > > @@ -3393,6 +3418,40 @@ initialize_data_dependence_relation (struct data_reference *a,
> > >    return res;
> > >  }
> > >  
> > > +/* Initialize a data dependence relation between data accesses A and
> > > +   B.  NB_LOOPS is the number of loops surrounding the references: the
> > > +   size of the classic distance/direction vectors.  */
> > > +
> > > +struct data_dependence_relation *
> > > +initialize_data_dependence_relation (struct data_reference *a,
> > > +				     struct data_reference *b,
> > > +				     vec<loop_p> loop_nest)
> > > +{
> > > +  data_dependence_relation *res = XCNEW (struct data_dependence_relation);
> > > +  DDR_A (res) = a;
> > > +  DDR_B (res) = b;
> > > +  DDR_LOOP_NEST (res).create (0);
> > > +  DDR_SUBSCRIPTS (res).create (0);
> > > +  DDR_DIR_VECTS (res).create (0);
> > > +  DDR_DIST_VECTS (res).create (0);
> > > +
> > > +  if (a == NULL || b == NULL)
> > > +    {
> > > +      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> > > +      return res;
> > > +    }
> > > +
> > > +  /* If the data references do not alias, then they are independent.  */
> > > +  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
> > > +    {
> > > +      DDR_ARE_DEPENDENT (res) = chrec_known;
> > > +      return res;
> > > +    }
> > > +
> > > +  return initialize_data_dependence_relation (res, loop_nest, false);
> > > +}
> > > +
> > > +
> > >  /* Frees memory used by the conflict function F.  */
> > >  
> > >  static void
> > > diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> > > index 685f33d85ae..74f579c9f3f 100644
> > > --- a/gcc/tree-data-ref.h
> > > +++ b/gcc/tree-data-ref.h
> > > @@ -166,14 +166,19 @@ struct data_reference
> > >       and runs to completion.  */
> > >    bool is_conditional_in_stmt;
> > >  
> > > +  /* Alias information for the data reference.  */
> > > +  struct dr_alias alias;
> > > +
> > >    /* Behavior of the memory reference in the innermost loop.  */
> > >    struct innermost_loop_behavior innermost;
> > >  
> > >    /* Subscripts of this data reference.  */
> > >    struct indices indices;
> > >  
> > > -  /* Alias information for the data reference.  */
> > > -  struct dr_alias alias;
> > > +  /* Alternate subscripts initialized lazily and used by data-dependence
> > > +     analysis only when the main indices of two DRs are not comparable.
> > > +     Keep last to keep vec_info_shared::check_datarefs happy.  */
> > > +  struct indices alt_indices;
> > >  };
> > >  
> > >  #define DR_STMT(DR)                (DR)->stmt
> > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> > > index 3aa3e2a6783..20daa31187d 100644
> > > --- a/gcc/tree-vectorizer.c
> > > +++ b/gcc/tree-vectorizer.c
> > > @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs ()
> > >      return;
> > >    gcc_assert (datarefs.length () == datarefs_copy.length ());
> > >    for (unsigned i = 0; i < datarefs.length (); ++i)
> > > -    if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0)
> > > +    if (memcmp (&datarefs_copy[i], datarefs[i],
> > > +		offsetof (data_reference, alt_indices)) != 0)
> > >        gcc_unreachable ();
> > >  }
> > 
> 
>
  
Richard Sandiford Sept. 17, 2021, 4:36 p.m. UTC | #4
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Fri, 17 Sep 2021, Richard Biener wrote:
>
>> On Fri, 17 Sep 2021, Richard Sandiford wrote:
>> 
>> > Richard Biener <rguenther@suse.de> writes:
>> > > This adds the capability to analyze the dependence of mixed
>> > > pointer/array accesses.  The example is from where using a masked
>> > > load/store creates the pointer-based access when an otherwise
>> > > unconditional access is array based.  Other examples would include
>> > > accesses to an array mixed with accesses from inlined helpers
>> > > that work on pointers.
>> > >
>> > > The idea is quite simple and old - analyze the data-ref indices
>> > > as if the reference was pointer-based.  The following change does
>> > > this by changing dr_analyze_indices to work on the indices
>> > > sub-structure and storing an alternate indices substructure in
>> > > each data reference.  That alternate set of indices is analyzed
>> > > lazily by initialize_data_dependence_relation when it fails to
>> > > match-up the main set of indices of two data references.
>> > > initialize_data_dependence_relation is refactored into a head
>> > > and a tail worker and changed to work on one of the indices
>> > > structures and thus away from using DR_* access macros which
>> > > continue to reference the main indices substructure.
>> > >
>> > > There are quite some vectorization and loop distribution opportunities
>> > > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
>> > > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
>> > > 544.nab_r see amendments in what they report with -fopt-info-loop while
>> > > the rest of the specrate set sees no changes there.  Measuring runtime
>> > > for the set where changes were reported reveals nothing off-noise
>> > > besides 511.povray_r which seems to regress slightly for me
>> > > (on a Zen2 machine with -Ofast -march=native).
>> > >
>> > > Changes from the [RFC] version are properly handling bitfields
>> > > that we cannot take the address of and optimization of refs
>> > > that already are MEM_REFs and thus won't see any change.  I've
>> > > also elided changing the set of vect_masked_stores targets in
>> > > favor of explicitely listing avx (but I did not verify if the
>> > > testcase passes on aarch64-sve or amdgcn).
>> > >
>> > > The improves cases like the following from Povray:
>> > >
>> > >    for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
>> > >      {
>> > >         VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
>> > >         Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
>> > >      }
>> > >
>> > > where there is a plain array access mixed with abstraction
>> > > using T[] or T* arguments.  That should be a not too uncommon
>> > > situation in the wild.  The loop above is now vectorized and was not
>> > > without the change.
>> > >
>> > > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
>> > > built and run SPEC CPU 2017 successfully.
>> > >
>> > > OK?
>> > 
>> > Took a while to page this stuff back in :-/
>> > 
>> > I guess if we're adding alt_indices to the main data_reference,
>> > we'll need to free the access_fns in free_data_ref.  It looked like
>> > the patch as posted might have a memory leak.
>> 
>> Doh, yes - thanks for noticing.
>> 
>> > Perhaps instead we could use local indices structures for the
>> > alt_indices and pass them in to initialize_data_dependence_relation?
>> > Not that that's very elegant…
>> 
>> Yeah, I had that but then since for N data references we possibly
>> call initialize_data_dependence_relation N^2 times we'd do this
>> alternate analysis N^2 times at worst instead of N so it looked worth
>> caching it in the data reference.  Since we have no idea why the
>> first try fails we're going to try this fallback in the majority
>> of cases that we cannot figure out otherwise so I didn't manage
>> to hand-wave the quadraticness away ;)  OTOH one might argue
>> it's just a constant factor ontop of the number of
>> initialize_data_dependence_relation invocations.
>> 
>> So I can probably be convinced either way - guess I'll gather
>> some statistics.
>
> I built SPEC 2017 CPU rate with -Ofast -march=znver2, overall there
> are
>
>  4433976     calls to the first stage initialize_data_dependence_relation
>              (skipping the cases dr_may_alias returned false)
>  360248 (8%) ended up recursing with a set of alt_indices
>  83512       times we computed alt_indices of a DR (that's with the cache)
>  14905 (0.3%) times the recursive invocation ended with != chrec_dont_know
>
> thus when not doing the caching we'd compute alt_indices about 10 times
> more often.  I didn't collect the number of distinct DRs (that's difficult
> at this point), but I'd estimate from the above that we have 3 times
> more "unused" alt_indices than used.
>
> OK, so that didn't really help me avoid flipping a coin ;)

Sounds like a good justification for keeping the caching to me :-)

Richard
  
Richard Biener Sept. 17, 2021, 4:46 p.m. UTC | #5
On September 17, 2021 6:36:10 PM GMT+02:00, Richard Sandiford <richard.sandiford@arm.com> wrote:
>Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On Fri, 17 Sep 2021, Richard Biener wrote:
>>
>>> On Fri, 17 Sep 2021, Richard Sandiford wrote:
>>> 
>>> > Richard Biener <rguenther@suse.de> writes:
>>> > > This adds the capability to analyze the dependence of mixed
>>> > > pointer/array accesses.  The example is from where using a masked
>>> > > load/store creates the pointer-based access when an otherwise
>>> > > unconditional access is array based.  Other examples would include
>>> > > accesses to an array mixed with accesses from inlined helpers
>>> > > that work on pointers.
>>> > >
>>> > > The idea is quite simple and old - analyze the data-ref indices
>>> > > as if the reference was pointer-based.  The following change does
>>> > > this by changing dr_analyze_indices to work on the indices
>>> > > sub-structure and storing an alternate indices substructure in
>>> > > each data reference.  That alternate set of indices is analyzed
>>> > > lazily by initialize_data_dependence_relation when it fails to
>>> > > match-up the main set of indices of two data references.
>>> > > initialize_data_dependence_relation is refactored into a head
>>> > > and a tail worker and changed to work on one of the indices
>>> > > structures and thus away from using DR_* access macros which
>>> > > continue to reference the main indices substructure.
>>> > >
>>> > > There are quite some vectorization and loop distribution opportunities
>>> > > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
>>> > > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
>>> > > 544.nab_r see amendments in what they report with -fopt-info-loop while
>>> > > the rest of the specrate set sees no changes there.  Measuring runtime
>>> > > for the set where changes were reported reveals nothing off-noise
>>> > > besides 511.povray_r which seems to regress slightly for me
>>> > > (on a Zen2 machine with -Ofast -march=native).
>>> > >
>>> > > Changes from the [RFC] version are properly handling bitfields
>>> > > that we cannot take the address of and optimization of refs
>>> > > that already are MEM_REFs and thus won't see any change.  I've
>>> > > also elided changing the set of vect_masked_stores targets in
>>> > > favor of explicitely listing avx (but I did not verify if the
>>> > > testcase passes on aarch64-sve or amdgcn).
>>> > >
>>> > > The improves cases like the following from Povray:
>>> > >
>>> > >    for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
>>> > >      {
>>> > >         VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
>>> > >         Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
>>> > >      }
>>> > >
>>> > > where there is a plain array access mixed with abstraction
>>> > > using T[] or T* arguments.  That should be a not too uncommon
>>> > > situation in the wild.  The loop above is now vectorized and was not
>>> > > without the change.
>>> > >
>>> > > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
>>> > > built and run SPEC CPU 2017 successfully.
>>> > >
>>> > > OK?
>>> > 
>>> > Took a while to page this stuff back in :-/
>>> > 
>>> > I guess if we're adding alt_indices to the main data_reference,
>>> > we'll need to free the access_fns in free_data_ref.  It looked like
>>> > the patch as posted might have a memory leak.
>>> 
>>> Doh, yes - thanks for noticing.
>>> 
>>> > Perhaps instead we could use local indices structures for the
>>> > alt_indices and pass them in to initialize_data_dependence_relation?
>>> > Not that that's very elegant…
>>> 
>>> Yeah, I had that but then since for N data references we possibly
>>> call initialize_data_dependence_relation N^2 times we'd do this
>>> alternate analysis N^2 times at worst instead of N so it looked worth
>>> caching it in the data reference.  Since we have no idea why the
>>> first try fails we're going to try this fallback in the majority
>>> of cases that we cannot figure out otherwise so I didn't manage
>>> to hand-wave the quadraticness away ;)  OTOH one might argue
>>> it's just a constant factor ontop of the number of
>>> initialize_data_dependence_relation invocations.
>>> 
>>> So I can probably be convinced either way - guess I'll gather
>>> some statistics.
>>
>> I built SPEC 2017 CPU rate with -Ofast -march=znver2, overall there
>> are
>>
>>  4433976     calls to the first stage initialize_data_dependence_relation
>>              (skipping the cases dr_may_alias returned false)
>>  360248 (8%) ended up recursing with a set of alt_indices
>>  83512       times we computed alt_indices of a DR (that's with the cache)
>>  14905 (0.3%) times the recursive invocation ended with != chrec_dont_know
>>
>> thus when not doing the caching we'd compute alt_indices about 10 times
>> more often.  I didn't collect the number of distinct DRs (that's difficult
>> at this point), but I'd estimate from the above that we have 3 times
>> more "unused" alt_indices than used.
>>
>> OK, so that didn't really help me avoid flipping a coin ;)
>
>Sounds like a good justification for keeping the caching to me :-)

It also occurred to me that we currently build a (GCable) ADDR_EXPR AND MEM_REF for each computation which likely more than offsets any win by not caching the result. 

Richard. 

>Richard
  
Richard Biener Sept. 20, 2021, 6:50 a.m. UTC | #6
On Fri, 17 Sep 2021, Richard Biener wrote:

> On September 17, 2021 6:36:10 PM GMT+02:00, Richard Sandiford <richard.sandiford@arm.com> wrote:
> >Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> On Fri, 17 Sep 2021, Richard Biener wrote:
> >>
> >>> On Fri, 17 Sep 2021, Richard Sandiford wrote:
> >>> 
> >>> > Richard Biener <rguenther@suse.de> writes:
> >>> > > This adds the capability to analyze the dependence of mixed
> >>> > > pointer/array accesses.  The example is from where using a masked
> >>> > > load/store creates the pointer-based access when an otherwise
> >>> > > unconditional access is array based.  Other examples would include
> >>> > > accesses to an array mixed with accesses from inlined helpers
> >>> > > that work on pointers.
> >>> > >
> >>> > > The idea is quite simple and old - analyze the data-ref indices
> >>> > > as if the reference was pointer-based.  The following change does
> >>> > > this by changing dr_analyze_indices to work on the indices
> >>> > > sub-structure and storing an alternate indices substructure in
> >>> > > each data reference.  That alternate set of indices is analyzed
> >>> > > lazily by initialize_data_dependence_relation when it fails to
> >>> > > match-up the main set of indices of two data references.
> >>> > > initialize_data_dependence_relation is refactored into a head
> >>> > > and a tail worker and changed to work on one of the indices
> >>> > > structures and thus away from using DR_* access macros which
> >>> > > continue to reference the main indices substructure.
> >>> > >
> >>> > > There are quite some vectorization and loop distribution opportunities
> >>> > > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
> >>> > > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
> >>> > > 544.nab_r see amendments in what they report with -fopt-info-loop while
> >>> > > the rest of the specrate set sees no changes there.  Measuring runtime
> >>> > > for the set where changes were reported reveals nothing off-noise
> >>> > > besides 511.povray_r which seems to regress slightly for me
> >>> > > (on a Zen2 machine with -Ofast -march=native).
> >>> > >
> >>> > > Changes from the [RFC] version are properly handling bitfields
> >>> > > that we cannot take the address of and optimization of refs
> >>> > > that already are MEM_REFs and thus won't see any change.  I've
> >>> > > also elided changing the set of vect_masked_stores targets in
> >>> > > favor of explicitely listing avx (but I did not verify if the
> >>> > > testcase passes on aarch64-sve or amdgcn).
> >>> > >
> >>> > > The improves cases like the following from Povray:
> >>> > >
> >>> > >    for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
> >>> > >      {
> >>> > >         VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
> >>> > >         Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
> >>> > >      }
> >>> > >
> >>> > > where there is a plain array access mixed with abstraction
> >>> > > using T[] or T* arguments.  That should be a not too uncommon
> >>> > > situation in the wild.  The loop above is now vectorized and was not
> >>> > > without the change.
> >>> > >
> >>> > > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
> >>> > > built and run SPEC CPU 2017 successfully.
> >>> > >
> >>> > > OK?
> >>> > 
> >>> > Took a while to page this stuff back in :-/
> >>> > 
> >>> > I guess if we're adding alt_indices to the main data_reference,
> >>> > we'll need to free the access_fns in free_data_ref.  It looked like
> >>> > the patch as posted might have a memory leak.
> >>> 
> >>> Doh, yes - thanks for noticing.
> >>> 
> >>> > Perhaps instead we could use local indices structures for the
> >>> > alt_indices and pass them in to initialize_data_dependence_relation?
> >>> > Not that that's very elegant?
> >>> 
> >>> Yeah, I had that but then since for N data references we possibly
> >>> call initialize_data_dependence_relation N^2 times we'd do this
> >>> alternate analysis N^2 times at worst instead of N so it looked worth
> >>> caching it in the data reference.  Since we have no idea why the
> >>> first try fails we're going to try this fallback in the majority
> >>> of cases that we cannot figure out otherwise so I didn't manage
> >>> to hand-wave the quadraticness away ;)  OTOH one might argue
> >>> it's just a constant factor ontop of the number of
> >>> initialize_data_dependence_relation invocations.
> >>> 
> >>> So I can probably be convinced either way - guess I'll gather
> >>> some statistics.
> >>
> >> I built SPEC 2017 CPU rate with -Ofast -march=znver2, overall there
> >> are
> >>
> >>  4433976     calls to the first stage initialize_data_dependence_relation
> >>              (skipping the cases dr_may_alias returned false)
> >>  360248 (8%) ended up recursing with a set of alt_indices
> >>  83512       times we computed alt_indices of a DR (that's with the cache)
> >>  14905 (0.3%) times the recursive invocation ended with != chrec_dont_know
> >>
> >> thus when not doing the caching we'd compute alt_indices about 10 times
> >> more often.  I didn't collect the number of distinct DRs (that's difficult
> >> at this point), but I'd estimate from the above that we have 3 times
> >> more "unused" alt_indices than used.
> >>
> >> OK, so that didn't really help me avoid flipping a coin ;)
> >
> >Sounds like a good justification for keeping the caching to me :-)
> 
> It also occurred to me that we currently build a (GCable) ADDR_EXPR AND MEM_REF for each computation which likely more than offsets any win by not caching the result. 

The following is what I have re-bootstrapped and tested on 
x86_64-unknown-linux-gnu and pushed.

Richard.

From 940c0ef59277c1072ebdfe0d3a63608847ffd84a Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Wed, 8 Sep 2021 14:42:31 +0200
Subject: [PATCH] tree-optimization/65206 - dependence analysis on mixed
 pointer/array
To: gcc-patches@gcc.gnu.org

This adds the capability to analyze the dependence of mixed
pointer/array accesses.  The example is from where using a masked
load/store creates the pointer-based access when an otherwise
unconditional access is array based.  Other examples would include
accesses to an array mixed with accesses from inlined helpers
that work on pointers.

The idea is quite simple and old - analyze the data-ref indices
as if the reference was pointer-based.  The following change does
this by changing dr_analyze_indices to work on the indices
sub-structure and storing an alternate indices substructure in
each data reference.  That alternate set of indices is analyzed
lazily by initialize_data_dependence_relation when it fails to
match-up the main set of indices of two data references.
initialize_data_dependence_relation is refactored into a head
and a tail worker and changed to work on one of the indices
structures and thus away from using DR_* access macros which
continue to reference the main indices substructure.

There are quite some vectorization and loop distribution opportunities
unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
544.nab_r see amendments in what they report with -fopt-info-loop while
the rest of the specrate set sees no changes there.  Measuring runtime
for the set where changes were reported reveals nothing off-noise
besides 511.povray_r which seems to regress slightly for me
(on a Zen2 machine with -Ofast -march=native).

2021-09-08  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/65206
	* tree-data-ref.h (struct data_reference): Add alt_indices,
	order it last.
	* tree-data-ref.c (free_data_ref): Release alt_indices.
	(dr_analyze_indices): Work on struct indices and get DR_REF as tree.
	(create_data_ref): Adjust.
	(initialize_data_dependence_relation): Split into head
	and tail.  When the base objects fail to match up try
	again with pointer-based analysis of indices.
	* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
	not compare the lazily computed alternate set of indices.

	* gcc.dg/torture/20210916.c: New testcase.
	* gcc.dg/vect/pr65206.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
 gcc/testsuite/gcc.dg/vect/pr65206.c     |  22 +++
 gcc/tree-data-ref.c                     | 174 ++++++++++++++++--------
 gcc/tree-data-ref.h                     |   9 +-
 gcc/tree-vectorizer.c                   |   3 +-
 5 files changed, 168 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c

diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c
new file mode 100644
index 00000000000..0ea6d45e463
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/20210916.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+typedef union tree_node *tree;
+struct tree_base {
+  unsigned : 1;
+  unsigned lang_flag_2 : 1;
+};
+struct tree_type {
+  tree main_variant;
+};
+union tree_node {
+  struct tree_base base;
+  struct tree_type type;
+};
+tree finish_struct_t, finish_struct_x;
+void finish_struct()
+{
+  for (; finish_struct_t->type.main_variant;)
+    finish_struct_x->base.lang_flag_2 = 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c
new file mode 100644
index 00000000000..3b6262622c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */
+/* { dg-additional-options "-mavx" { target avx } } */
+
+#define N 1024
+
+double a[N], b[N];
+
+void foo ()
+{
+  for (int i = 0; i < N; ++i)
+    if (b[i] < 3.)
+      a[i] += b[i];
+}
+
+/* We get a .MASK_STORE because while the load of a[i] does not trap
+   the store would introduce store data races.  Make sure we still
+   can handle the data dependence with zero distance.  */
+
+/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e061baa7c20..18307a554fc 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "vr-values.h"
 #include "range-op.h"
+#include "tree-ssa-loop-ivopts.h"
 
 static struct datadep_stats
 {
@@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base)
    DR, analyzed in LOOP and instantiated before NEST.  */
 
 static void
-dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
+dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop)
 {
-  vec<tree> access_fns = vNULL;
-  tree ref, op;
-  tree base, off, access_fn;
-
   /* If analyzing a basic-block there are no indices to analyze
      and thus no access functions.  */
   if (!nest)
     {
-      DR_BASE_OBJECT (dr) = DR_REF (dr);
-      DR_ACCESS_FNS (dr).create (0);
+      dri->base_object = ref;
+      dri->access_fns.create (0);
       return;
     }
 
-  ref = DR_REF (dr);
+  vec<tree> access_fns = vNULL;
 
   /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses
      into a two element array with a constant index.  The base is
@@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
     {
       if (TREE_CODE (ref) == ARRAY_REF)
 	{
-	  op = TREE_OPERAND (ref, 1);
-	  access_fn = analyze_scalar_evolution (loop, op);
+	  tree op = TREE_OPERAND (ref, 1);
+	  tree access_fn = analyze_scalar_evolution (loop, op);
 	  access_fn = instantiate_scev (nest, loop, access_fn);
 	  access_fns.safe_push (access_fn);
 	}
@@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
      analyzed nest, add it as an additional independent access-function.  */
   if (TREE_CODE (ref) == MEM_REF)
     {
-      op = TREE_OPERAND (ref, 0);
-      access_fn = analyze_scalar_evolution (loop, op);
+      tree op = TREE_OPERAND (ref, 0);
+      tree access_fn = analyze_scalar_evolution (loop, op);
       access_fn = instantiate_scev (nest, loop, access_fn);
       if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
 	{
-	  tree orig_type;
 	  tree memoff = TREE_OPERAND (ref, 1);
-	  base = initial_condition (access_fn);
-	  orig_type = TREE_TYPE (base);
+	  tree base = initial_condition (access_fn);
+	  tree orig_type = TREE_TYPE (base);
 	  STRIP_USELESS_TYPE_CONVERSION (base);
+	  tree off;
 	  split_constant_offset (base, &base, &off);
 	  STRIP_USELESS_TYPE_CONVERSION (base);
 	  /* Fold the MEM_REF offset into the evolutions initial
@@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
 				 base, memoff);
 	  MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old);
 	  MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
-	  DR_UNCONSTRAINED_BASE (dr) = true;
+	  dri->unconstrained_base = true;
 	  access_fns.safe_push (access_fn);
 	}
     }
@@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
 		    build_int_cst (reference_alias_ptr_type (ref), 0));
     }
 
-  DR_BASE_OBJECT (dr) = ref;
-  DR_ACCESS_FNS (dr) = access_fns;
+  dri->base_object = ref;
+  dri->access_fns = access_fns;
 }
 
 /* Extracts the alias analysis information from the memory reference DR.  */
@@ -1463,6 +1460,8 @@ void
 free_data_ref (data_reference_p dr)
 {
   DR_ACCESS_FNS (dr).release ();
+  if (dr->alt_indices.base_object)
+    dr->alt_indices.access_fns.release ();
   free (dr);
 }
 
@@ -1497,7 +1496,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
 
   dr_analyze_innermost (&DR_INNERMOST (dr), memref,
 			nest != NULL ? loop : NULL, stmt);
-  dr_analyze_indices (dr, nest, loop);
+  dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop);
   dr_analyze_alias (dr);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3066,41 +3065,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b)
 			     TREE_TYPE (TREE_OPERAND (ref_b, 0)));
 }
 
-/* Initialize a data dependence relation between data accesses A and
-   B.  NB_LOOPS is the number of loops surrounding the references: the
-   size of the classic distance/direction vectors.  */
+/* Initialize a data dependence relation RES in LOOP_NEST.  USE_ALT_INDICES
+   is true when the main indices of A and B were not comparable so we try again
+   with alternate indices computed on an indirect reference.  */
 
 struct data_dependence_relation *
-initialize_data_dependence_relation (struct data_reference *a,
-				     struct data_reference *b,
- 				     vec<loop_p> loop_nest)
+initialize_data_dependence_relation (struct data_dependence_relation *res,
+				     vec<loop_p> loop_nest,
+				     bool use_alt_indices)
 {
-  struct data_dependence_relation *res;
+  struct data_reference *a = DDR_A (res);
+  struct data_reference *b = DDR_B (res);
   unsigned int i;
 
-  res = XCNEW (struct data_dependence_relation);
-  DDR_A (res) = a;
-  DDR_B (res) = b;
-  DDR_LOOP_NEST (res).create (0);
-  DDR_SUBSCRIPTS (res).create (0);
-  DDR_DIR_VECTS (res).create (0);
-  DDR_DIST_VECTS (res).create (0);
-
-  if (a == NULL || b == NULL)
+  struct indices *indices_a = &a->indices;
+  struct indices *indices_b = &b->indices;
+  if (use_alt_indices)
     {
-      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
-      return res;
+      if (TREE_CODE (DR_REF (a)) != MEM_REF)
+	indices_a = &a->alt_indices;
+      if (TREE_CODE (DR_REF (b)) != MEM_REF)
+	indices_b = &b->alt_indices;
     }
-
-  /* If the data references do not alias, then they are independent.  */
-  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
-    {
-      DDR_ARE_DEPENDENT (res) = chrec_known;
-      return res;
-    }
-
-  unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
-  unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
+  unsigned int num_dimensions_a = indices_a->access_fns.length ();
+  unsigned int num_dimensions_b = indices_b->access_fns.length ();
   if (num_dimensions_a == 0 || num_dimensions_b == 0)
     {
       DDR_ARE_DEPENDENT (res) = chrec_dont_know;
@@ -3125,9 +3113,9 @@ initialize_data_dependence_relation (struct data_reference *a,
 
      the a and b accesses have a single ARRAY_REF component reference [0]
      but have two subscripts.  */
-  if (DR_UNCONSTRAINED_BASE (a))
+  if (indices_a->unconstrained_base)
     num_dimensions_a -= 1;
-  if (DR_UNCONSTRAINED_BASE (b))
+  if (indices_b->unconstrained_base)
     num_dimensions_b -= 1;
 
   /* These structures describe sequences of component references in
@@ -3210,6 +3198,10 @@ initialize_data_dependence_relation (struct data_reference *a,
         B: [3, 4]  (i.e. s.e)  */
   while (index_a < num_dimensions_a && index_b < num_dimensions_b)
     {
+      /* The alternate indices form always has a single dimension
+	 with unconstrained base.  */
+      gcc_assert (!use_alt_indices);
+
       /* REF_A and REF_B must be one of the component access types
 	 allowed by dr_analyze_indices.  */
       gcc_checking_assert (access_fn_component_p (ref_a));
@@ -3280,11 +3272,12 @@ initialize_data_dependence_relation (struct data_reference *a,
   /* See whether FULL_SEQ ends at the base and whether the two bases
      are equal.  We do not care about TBAA or alignment info so we can
      use OEP_ADDRESS_OF to avoid false negatives.  */
-  tree base_a = DR_BASE_OBJECT (a);
-  tree base_b = DR_BASE_OBJECT (b);
+  tree base_a = indices_a->base_object;
+  tree base_b = indices_b->base_object;
   bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
 		      && full_seq.start_b + full_seq.length == num_dimensions_b
-		      && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
+		      && (indices_a->unconstrained_base
+			  == indices_b->unconstrained_base)
 		      && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
 		      && (types_compatible_p (TREE_TYPE (base_a),
 					      TREE_TYPE (base_b))
@@ -3323,7 +3316,7 @@ initialize_data_dependence_relation (struct data_reference *a,
      both lvalues are distinct from the object's declared type.  */
   if (same_base_p)
     {
-      if (DR_UNCONSTRAINED_BASE (a))
+      if (indices_a->unconstrained_base)
 	full_seq.length += 1;
     }
   else
@@ -3332,8 +3325,41 @@ initialize_data_dependence_relation (struct data_reference *a,
   /* Punt if we didn't find a suitable sequence.  */
   if (full_seq.length == 0)
     {
-      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
-      return res;
+      if (use_alt_indices
+	  || (TREE_CODE (DR_REF (a)) == MEM_REF
+	      && TREE_CODE (DR_REF (b)) == MEM_REF)
+	  || may_be_nonaddressable_p (DR_REF (a))
+	  || may_be_nonaddressable_p (DR_REF (b)))
+	{
+	  /* Fully exhausted possibilities.  */
+	  DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+	  return res;
+	}
+
+      /* Try evaluating both DRs as dereferences of pointers.  */
+      if (!a->alt_indices.base_object
+	  && TREE_CODE (DR_REF (a)) != MEM_REF)
+	{
+	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
+				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
+				 build_int_cst
+				   (reference_alias_ptr_type (DR_REF (a)), 0));
+	  dr_analyze_indices (&a->alt_indices, alt_ref,
+			      loop_preheader_edge (loop_nest[0]),
+			      loop_containing_stmt (DR_STMT (a)));
+	}
+      if (!b->alt_indices.base_object
+	  && TREE_CODE (DR_REF (b)) != MEM_REF)
+	{
+	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
+				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
+				 build_int_cst
+				   (reference_alias_ptr_type (DR_REF (b)), 0));
+	  dr_analyze_indices (&b->alt_indices, alt_ref,
+			      loop_preheader_edge (loop_nest[0]),
+			      loop_containing_stmt (DR_STMT (b)));
+	}
+      return initialize_data_dependence_relation (res, loop_nest, true);
     }
 
   if (!same_base_p)
@@ -3381,8 +3407,8 @@ initialize_data_dependence_relation (struct data_reference *a,
       struct subscript *subscript;
 
       subscript = XNEW (struct subscript);
-      SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
-      SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
+      SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i];
+      SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i];
       SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
       SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
       SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
@@ -3393,6 +3419,40 @@ initialize_data_dependence_relation (struct data_reference *a,
   return res;
 }
 
+/* Initialize a data dependence relation between data accesses A and
+   B.  NB_LOOPS is the number of loops surrounding the references: the
+   size of the classic distance/direction vectors.  */
+
+struct data_dependence_relation *
+initialize_data_dependence_relation (struct data_reference *a,
+				     struct data_reference *b,
+				     vec<loop_p> loop_nest)
+{
+  data_dependence_relation *res = XCNEW (struct data_dependence_relation);
+  DDR_A (res) = a;
+  DDR_B (res) = b;
+  DDR_LOOP_NEST (res).create (0);
+  DDR_SUBSCRIPTS (res).create (0);
+  DDR_DIR_VECTS (res).create (0);
+  DDR_DIST_VECTS (res).create (0);
+
+  if (a == NULL || b == NULL)
+    {
+      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+      return res;
+    }
+
+  /* If the data references do not alias, then they are independent.  */
+  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
+    {
+      DDR_ARE_DEPENDENT (res) = chrec_known;
+      return res;
+    }
+
+  return initialize_data_dependence_relation (res, loop_nest, false);
+}
+
+
 /* Frees memory used by the conflict function F.  */
 
 static void
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 685f33d85ae..74f579c9f3f 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -166,14 +166,19 @@ struct data_reference
      and runs to completion.  */
   bool is_conditional_in_stmt;
 
+  /* Alias information for the data reference.  */
+  struct dr_alias alias;
+
   /* Behavior of the memory reference in the innermost loop.  */
   struct innermost_loop_behavior innermost;
 
   /* Subscripts of this data reference.  */
   struct indices indices;
 
-  /* Alias information for the data reference.  */
-  struct dr_alias alias;
+  /* Alternate subscripts initialized lazily and used by data-dependence
+     analysis only when the main indices of two DRs are not comparable.
+     Keep last to keep vec_info_shared::check_datarefs happy.  */
+  struct indices alt_indices;
 };
 
 #define DR_STMT(DR)                (DR)->stmt
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 3aa3e2a6783..20daa31187d 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -507,7 +507,8 @@ vec_info_shared::check_datarefs ()
     return;
   gcc_assert (datarefs.length () == datarefs_copy.length ());
   for (unsigned i = 0; i < datarefs.length (); ++i)
-    if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0)
+    if (memcmp (&datarefs_copy[i], datarefs[i],
+		offsetof (data_reference, alt_indices)) != 0)
       gcc_unreachable ();
 }
  

Patch

diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c
new file mode 100644
index 00000000000..0ea6d45e463
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/20210916.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+
+typedef union tree_node *tree;
+struct tree_base {
+  unsigned : 1;
+  unsigned lang_flag_2 : 1;
+};
+struct tree_type {
+  tree main_variant;
+};
+union tree_node {
+  struct tree_base base;
+  struct tree_type type;
+};
+tree finish_struct_t, finish_struct_x;
+void finish_struct()
+{
+  for (; finish_struct_t->type.main_variant;)
+    finish_struct_x->base.lang_flag_2 = 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c
new file mode 100644
index 00000000000..3b6262622c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */
+/* { dg-additional-options "-mavx" { target avx } } */
+
+#define N 1024
+
+double a[N], b[N];
+
+void foo ()
+{
+  for (int i = 0; i < N; ++i)
+    if (b[i] < 3.)
+      a[i] += b[i];
+}
+
+/* We get a .MASK_STORE because while the load of a[i] does not trap
+   the store would introduce store data races.  Make sure we still
+   can handle the data dependence with zero distance.  */
+
+/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e061baa7c20..3656b7ba80e 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -99,6 +99,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "vr-values.h"
 #include "range-op.h"
+#include "tree-ssa-loop-ivopts.h"
 
 static struct datadep_stats
 {
@@ -1300,22 +1301,18 @@  base_supports_access_fn_components_p (tree base)
    DR, analyzed in LOOP and instantiated before NEST.  */
 
 static void
-dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
+dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop)
 {
-  vec<tree> access_fns = vNULL;
-  tree ref, op;
-  tree base, off, access_fn;
-
   /* If analyzing a basic-block there are no indices to analyze
      and thus no access functions.  */
   if (!nest)
     {
-      DR_BASE_OBJECT (dr) = DR_REF (dr);
-      DR_ACCESS_FNS (dr).create (0);
+      dri->base_object = ref;
+      dri->access_fns.create (0);
       return;
     }
 
-  ref = DR_REF (dr);
+  vec<tree> access_fns = vNULL;
 
   /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses
      into a two element array with a constant index.  The base is
@@ -1338,8 +1335,8 @@  dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
     {
       if (TREE_CODE (ref) == ARRAY_REF)
 	{
-	  op = TREE_OPERAND (ref, 1);
-	  access_fn = analyze_scalar_evolution (loop, op);
+	  tree op = TREE_OPERAND (ref, 1);
+	  tree access_fn = analyze_scalar_evolution (loop, op);
 	  access_fn = instantiate_scev (nest, loop, access_fn);
 	  access_fns.safe_push (access_fn);
 	}
@@ -1370,16 +1367,16 @@  dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
      analyzed nest, add it as an additional independent access-function.  */
   if (TREE_CODE (ref) == MEM_REF)
     {
-      op = TREE_OPERAND (ref, 0);
-      access_fn = analyze_scalar_evolution (loop, op);
+      tree op = TREE_OPERAND (ref, 0);
+      tree access_fn = analyze_scalar_evolution (loop, op);
       access_fn = instantiate_scev (nest, loop, access_fn);
       if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
 	{
-	  tree orig_type;
 	  tree memoff = TREE_OPERAND (ref, 1);
-	  base = initial_condition (access_fn);
-	  orig_type = TREE_TYPE (base);
+	  tree base = initial_condition (access_fn);
+	  tree orig_type = TREE_TYPE (base);
 	  STRIP_USELESS_TYPE_CONVERSION (base);
+	  tree off;
 	  split_constant_offset (base, &base, &off);
 	  STRIP_USELESS_TYPE_CONVERSION (base);
 	  /* Fold the MEM_REF offset into the evolutions initial
@@ -1424,7 +1421,7 @@  dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
 				 base, memoff);
 	  MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old);
 	  MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
-	  DR_UNCONSTRAINED_BASE (dr) = true;
+	  dri->unconstrained_base = true;
 	  access_fns.safe_push (access_fn);
 	}
     }
@@ -1436,8 +1433,8 @@  dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
 		    build_int_cst (reference_alias_ptr_type (ref), 0));
     }
 
-  DR_BASE_OBJECT (dr) = ref;
-  DR_ACCESS_FNS (dr) = access_fns;
+  dri->base_object = ref;
+  dri->access_fns = access_fns;
 }
 
 /* Extracts the alias analysis information from the memory reference DR.  */
@@ -1497,7 +1494,7 @@  create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
 
   dr_analyze_innermost (&DR_INNERMOST (dr), memref,
 			nest != NULL ? loop : NULL, stmt);
-  dr_analyze_indices (dr, nest, loop);
+  dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop);
   dr_analyze_alias (dr);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3066,41 +3063,30 @@  access_fn_components_comparable_p (tree ref_a, tree ref_b)
 			     TREE_TYPE (TREE_OPERAND (ref_b, 0)));
 }
 
-/* Initialize a data dependence relation between data accesses A and
-   B.  NB_LOOPS is the number of loops surrounding the references: the
-   size of the classic distance/direction vectors.  */
+/* Initialize a data dependence relation RES in LOOP_NEST.  USE_ALT_INDICES
+   is true when the main indices of A and B were not comparable so we try again
+   with alternate indices computed on an indirect reference.  */
 
 struct data_dependence_relation *
-initialize_data_dependence_relation (struct data_reference *a,
-				     struct data_reference *b,
- 				     vec<loop_p> loop_nest)
+initialize_data_dependence_relation (struct data_dependence_relation *res,
+				     vec<loop_p> loop_nest,
+				     bool use_alt_indices)
 {
-  struct data_dependence_relation *res;
+  struct data_reference *a = DDR_A (res);
+  struct data_reference *b = DDR_B (res);
   unsigned int i;
 
-  res = XCNEW (struct data_dependence_relation);
-  DDR_A (res) = a;
-  DDR_B (res) = b;
-  DDR_LOOP_NEST (res).create (0);
-  DDR_SUBSCRIPTS (res).create (0);
-  DDR_DIR_VECTS (res).create (0);
-  DDR_DIST_VECTS (res).create (0);
-
-  if (a == NULL || b == NULL)
+  struct indices *indices_a = &a->indices;
+  struct indices *indices_b = &b->indices;
+  if (use_alt_indices)
     {
-      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
-      return res;
+      if (TREE_CODE (DR_REF (a)) != MEM_REF)
+	indices_a = &a->alt_indices;
+      if (TREE_CODE (DR_REF (b)) != MEM_REF)
+	indices_b = &b->alt_indices;
     }
-
-  /* If the data references do not alias, then they are independent.  */
-  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
-    {
-      DDR_ARE_DEPENDENT (res) = chrec_known;
-      return res;
-    }
-
-  unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
-  unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
+  unsigned int num_dimensions_a = indices_a->access_fns.length ();
+  unsigned int num_dimensions_b = indices_b->access_fns.length ();
   if (num_dimensions_a == 0 || num_dimensions_b == 0)
     {
       DDR_ARE_DEPENDENT (res) = chrec_dont_know;
@@ -3125,9 +3111,9 @@  initialize_data_dependence_relation (struct data_reference *a,
 
      the a and b accesses have a single ARRAY_REF component reference [0]
      but have two subscripts.  */
-  if (DR_UNCONSTRAINED_BASE (a))
+  if (indices_a->unconstrained_base)
     num_dimensions_a -= 1;
-  if (DR_UNCONSTRAINED_BASE (b))
+  if (indices_b->unconstrained_base)
     num_dimensions_b -= 1;
 
   /* These structures describe sequences of component references in
@@ -3210,6 +3196,10 @@  initialize_data_dependence_relation (struct data_reference *a,
         B: [3, 4]  (i.e. s.e)  */
   while (index_a < num_dimensions_a && index_b < num_dimensions_b)
     {
+      /* The alternate indices form always has a single dimension
+	 with unconstrained base.  */
+      gcc_assert (!use_alt_indices);
+
       /* REF_A and REF_B must be one of the component access types
 	 allowed by dr_analyze_indices.  */
       gcc_checking_assert (access_fn_component_p (ref_a));
@@ -3280,11 +3270,12 @@  initialize_data_dependence_relation (struct data_reference *a,
   /* See whether FULL_SEQ ends at the base and whether the two bases
      are equal.  We do not care about TBAA or alignment info so we can
      use OEP_ADDRESS_OF to avoid false negatives.  */
-  tree base_a = DR_BASE_OBJECT (a);
-  tree base_b = DR_BASE_OBJECT (b);
+  tree base_a = indices_a->base_object;
+  tree base_b = indices_b->base_object;
   bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
 		      && full_seq.start_b + full_seq.length == num_dimensions_b
-		      && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
+		      && (indices_a->unconstrained_base
+			  == indices_b->unconstrained_base)
 		      && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
 		      && (types_compatible_p (TREE_TYPE (base_a),
 					      TREE_TYPE (base_b))
@@ -3323,7 +3314,7 @@  initialize_data_dependence_relation (struct data_reference *a,
      both lvalues are distinct from the object's declared type.  */
   if (same_base_p)
     {
-      if (DR_UNCONSTRAINED_BASE (a))
+      if (indices_a->unconstrained_base)
 	full_seq.length += 1;
     }
   else
@@ -3332,8 +3323,42 @@  initialize_data_dependence_relation (struct data_reference *a,
   /* Punt if we didn't find a suitable sequence.  */
   if (full_seq.length == 0)
     {
-      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
-      return res;
+      if (use_alt_indices
+	  || !param_data_dep_alt_indices
+	  || (TREE_CODE (DR_REF (a)) == MEM_REF
+	      && TREE_CODE (DR_REF (b)) == MEM_REF)
+	  || may_be_nonaddressable_p (DR_REF (a))
+	  || may_be_nonaddressable_p (DR_REF (b)))
+	{
+	  /* Fully exhausted possibilities.  */
+	  DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+	  return res;
+	}
+
+      /* Try evaluating both DRs as dereferences of pointers.  */
+      if (!a->alt_indices.base_object
+	  && TREE_CODE (DR_REF (a)) != MEM_REF)
+	{
+	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
+				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
+				 build_int_cst
+				   (reference_alias_ptr_type (DR_REF (a)), 0));
+	  dr_analyze_indices (&a->alt_indices, alt_ref,
+			      loop_preheader_edge (loop_nest[0]),
+			      loop_containing_stmt (DR_STMT (a)));
+	}
+      if (!b->alt_indices.base_object
+	  && TREE_CODE (DR_REF (b)) != MEM_REF)
+	{
+	  tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
+				 build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
+				 build_int_cst
+				   (reference_alias_ptr_type (DR_REF (b)), 0));
+	  dr_analyze_indices (&b->alt_indices, alt_ref,
+			      loop_preheader_edge (loop_nest[0]),
+			      loop_containing_stmt (DR_STMT (b)));
+	}
+      return initialize_data_dependence_relation (res, loop_nest, true);
     }
 
   if (!same_base_p)
@@ -3381,8 +3406,8 @@  initialize_data_dependence_relation (struct data_reference *a,
       struct subscript *subscript;
 
       subscript = XNEW (struct subscript);
-      SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
-      SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
+      SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i];
+      SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i];
       SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
       SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
       SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
@@ -3393,6 +3418,40 @@  initialize_data_dependence_relation (struct data_reference *a,
   return res;
 }
 
+/* Initialize a data dependence relation between data accesses A and
+   B.  NB_LOOPS is the number of loops surrounding the references: the
+   size of the classic distance/direction vectors.  */
+
+struct data_dependence_relation *
+initialize_data_dependence_relation (struct data_reference *a,
+				     struct data_reference *b,
+				     vec<loop_p> loop_nest)
+{
+  data_dependence_relation *res = XCNEW (struct data_dependence_relation);
+  DDR_A (res) = a;
+  DDR_B (res) = b;
+  DDR_LOOP_NEST (res).create (0);
+  DDR_SUBSCRIPTS (res).create (0);
+  DDR_DIR_VECTS (res).create (0);
+  DDR_DIST_VECTS (res).create (0);
+
+  if (a == NULL || b == NULL)
+    {
+      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+      return res;
+    }
+
+  /* If the data references do not alias, then they are independent.  */
+  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
+    {
+      DDR_ARE_DEPENDENT (res) = chrec_known;
+      return res;
+    }
+
+  return initialize_data_dependence_relation (res, loop_nest, false);
+}
+
+
 /* Frees memory used by the conflict function F.  */
 
 static void
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 685f33d85ae..74f579c9f3f 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -166,14 +166,19 @@  struct data_reference
      and runs to completion.  */
   bool is_conditional_in_stmt;
 
+  /* Alias information for the data reference.  */
+  struct dr_alias alias;
+
   /* Behavior of the memory reference in the innermost loop.  */
   struct innermost_loop_behavior innermost;
 
   /* Subscripts of this data reference.  */
   struct indices indices;
 
-  /* Alias information for the data reference.  */
-  struct dr_alias alias;
+  /* Alternate subscripts initialized lazily and used by data-dependence
+     analysis only when the main indices of two DRs are not comparable.
+     Keep last to keep vec_info_shared::check_datarefs happy.  */
+  struct indices alt_indices;
 };
 
 #define DR_STMT(DR)                (DR)->stmt
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 3aa3e2a6783..20daa31187d 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -507,7 +507,8 @@  vec_info_shared::check_datarefs ()
     return;
   gcc_assert (datarefs.length () == datarefs_copy.length ());
   for (unsigned i = 0; i < datarefs.length (); ++i)
-    if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0)
+    if (memcmp (&datarefs_copy[i], datarefs[i],
+		offsetof (data_reference, alt_indices)) != 0)
       gcc_unreachable ();
 }