vect/aarch64: Fix various sve/cond*.c failures

Message ID mptsffxmhbj.fsf@arm.com
State Committed
Commit 553f8003ba5ecfdf0574a171692843ef838226b4
Headers
Series vect/aarch64: Fix various sve/cond*.c failures |

Commit Message

Richard Sandiford Jan. 26, 2023, 3:44 p.m. UTC
  Quite a few gcc.target/aarch64/sve/cond*.c tests started failing
after g:68e0063397ba820e71adc220b2da0581dce29ffa, but it turns out
that we were cheating passes before the patch.

The tests involve comparing the cost of N wide compares, a pack
sequence, and a narrow COND_EXPR with the cost of a single COND_EXPR
on fewer elements.  The costs for the former included all operations,
but the costs for the latter didn't model the comparison embedded in
the COND_EXPR.  The patch made us include the comparison on both sides,
making it apples-for-apples, but that's enough to tip the balance in
favour of using the wider types.

I think the new choice does reflect the current SVE cost model
correctly.  (Whether and how the model should be tweaked is a
different question.)  This patch therefore changes the tuning
vector length to one that makes the choice more obvious.

That in turn needs a tweak to compare_inside_loop_cost.
The function compares body_cost1/vf1 with body_cost2/vf2,
but for fully-amsked loops, it limits vf to the actual number
of iterations.  This is so that (say) an expensive 16-element
vector body doesn't win over a cheaper 8-element vector body
when there are only 7 elements to process.

However, the limit was applied using known_le, regardless of
the tuning target.  For a heuristic like this, it seems better
to use the likely minimum (which is a concept that was only
added after this code went in).

g:68e0063397ba820e71adc220b2da0581dce29ffa also fixed
vcond_4_costly.c.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
	* tree-vectorizer.cc (vector_costs::compare_inside_loop_cost):
	Use the likely minimum VF when bounding the denominators to
	the estimated number of iterations.

gcc/testsuite/
	* gcc.target/aarch64/sve/cond_asrd_1.c: Tune for a 256-bit
	vector length.
	* gcc.target/aarch64/sve/cond_cnot_4.c: Likewise.
	* gcc.target/aarch64/sve/cond_cnot_6.c: Likewise.
	* gcc.target/aarch64/sve/cond_unary_5.c: Likewise.
	* gcc.target/aarch64/sve/cond_unary_6.c: Likewise.
	* gcc.target/aarch64/sve/cond_uxt_5.c: Likewise.
	* gcc.target/aarch64/sve/vcond_4_costly.c: Remove XFAILs.
---
 gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c    | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c    | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c    | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c     | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c | 4 ++--
 gcc/tree-vectorizer.cc                                | 6 ++++--
 8 files changed, 12 insertions(+), 10 deletions(-)
  

Comments

Richard Biener Jan. 27, 2023, 7:38 a.m. UTC | #1
On Thu, 26 Jan 2023, Richard Sandiford wrote:

> Quite a few gcc.target/aarch64/sve/cond*.c tests started failing
> after g:68e0063397ba820e71adc220b2da0581dce29ffa, but it turns out
> that we were cheating passes before the patch.
> 
> The tests involve comparing the cost of N wide compares, a pack
> sequence, and a narrow COND_EXPR with the cost of a single COND_EXPR
> on fewer elements.  The costs for the former included all operations,
> but the costs for the latter didn't model the comparison embedded in
> the COND_EXPR.  The patch made us include the comparison on both sides,
> making it apples-for-apples, but that's enough to tip the balance in
> favour of using the wider types.
> 
> I think the new choice does reflect the current SVE cost model
> correctly.  (Whether and how the model should be tweaked is a
> different question.)  This patch therefore changes the tuning
> vector length to one that makes the choice more obvious.
> 
> That in turn needs a tweak to compare_inside_loop_cost.
> The function compares body_cost1/vf1 with body_cost2/vf2,
> but for fully-amsked loops, it limits vf to the actual number
> of iterations.  This is so that (say) an expensive 16-element
> vector body doesn't win over a cheaper 8-element vector body
> when there are only 7 elements to process.
> 
> However, the limit was applied using known_le, regardless of
> the tuning target.  For a heuristic like this, it seems better
> to use the likely minimum (which is a concept that was only
> added after this code went in).
> 
> g:68e0063397ba820e71adc220b2da0581dce29ffa also fixed
> vcond_4_costly.c.
> 
> Tested on aarch64-linux-gnu.  OK to install?

OK.

> Richard
> 
> 
> gcc/
> 	* tree-vectorizer.cc (vector_costs::compare_inside_loop_cost):
> 	Use the likely minimum VF when bounding the denominators to
> 	the estimated number of iterations.
> 
> gcc/testsuite/
> 	* gcc.target/aarch64/sve/cond_asrd_1.c: Tune for a 256-bit
> 	vector length.
> 	* gcc.target/aarch64/sve/cond_cnot_4.c: Likewise.
> 	* gcc.target/aarch64/sve/cond_cnot_6.c: Likewise.
> 	* gcc.target/aarch64/sve/cond_unary_5.c: Likewise.
> 	* gcc.target/aarch64/sve/cond_unary_6.c: Likewise.
> 	* gcc.target/aarch64/sve/cond_uxt_5.c: Likewise.
> 	* gcc.target/aarch64/sve/vcond_4_costly.c: Remove XFAILs.
> ---
>  gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c    | 2 +-
>  gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c    | 2 +-
>  gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c    | 2 +-
>  gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c   | 2 +-
>  gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c   | 2 +-
>  gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c     | 2 +-
>  gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c | 4 ++--
>  gcc/tree-vectorizer.cc                                | 6 ++++--
>  8 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
> index 478b52ac27c..aac06bd8093 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
>  
>  #include <stdint.h>
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
> index 729d3f4f2ac..f6278916e1a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
>  
>  #include <stdint.h>
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c
> index d44e357f44a..ef1b067172f 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
>  
>  #include <stdint.h>
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
> index 17b3f86c8c6..03a6636f2d2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
>  
>  #include <stdint.h>
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c
> index 1bd342b65d4..c49a3040b21 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
>  
>  #include <stdint.h>
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
> index 18866286b7f..9a2bd8f152f 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
>  
>  #include <stdint.h>
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c
> index 4aa567e3709..76d7a288612 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c
> @@ -61,8 +61,8 @@ TEST_CMP (nuge)
>  TEST_CMP (nugt)
>  
>  /* 2 each for: eq, ne, ueq, nueq.  */
> -/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 { xfail *-*-* } } } */
> -/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 } } */
> +/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 } } */
>  
>  /* 2 each for: olt, ult, nult, ogt, ugt, nugt.  */
>  /* { dg-final { scan-assembler-times {\tfcm[lg]t\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index 875acbbf948..89cd0b88b61 100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -1973,9 +1973,11 @@ vector_costs::compare_inside_loop_cost (const vector_costs *other) const
>    HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop);
>    if (estimated_max_niter != -1)
>      {
> -      if (known_le (estimated_max_niter, this_vf))
> +      if (estimated_poly_value (this_vf, POLY_VALUE_MIN)
> +	  >= estimated_max_niter)
>  	this_vf = estimated_max_niter;
> -      if (known_le (estimated_max_niter, other_vf))
> +      if (estimated_poly_value (other_vf, POLY_VALUE_MIN)
> +	  >= estimated_max_niter)
>  	other_vf = estimated_max_niter;
>      }
>  
>
  

Patch

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
index 478b52ac27c..aac06bd8093 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
index 729d3f4f2ac..f6278916e1a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c
index d44e357f44a..ef1b067172f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
index 17b3f86c8c6..03a6636f2d2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c
index 1bd342b65d4..c49a3040b21 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
index 18866286b7f..9a2bd8f152f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c
index 4aa567e3709..76d7a288612 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c
@@ -61,8 +61,8 @@  TEST_CMP (nuge)
 TEST_CMP (nugt)
 
 /* 2 each for: eq, ne, ueq, nueq.  */
-/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 { xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 } } */
+/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 } } */
 
 /* 2 each for: olt, ult, nult, ogt, ugt, nugt.  */
 /* { dg-final { scan-assembler-times {\tfcm[lg]t\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index 875acbbf948..89cd0b88b61 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -1973,9 +1973,11 @@  vector_costs::compare_inside_loop_cost (const vector_costs *other) const
   HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop);
   if (estimated_max_niter != -1)
     {
-      if (known_le (estimated_max_niter, this_vf))
+      if (estimated_poly_value (this_vf, POLY_VALUE_MIN)
+	  >= estimated_max_niter)
 	this_vf = estimated_max_niter;
-      if (known_le (estimated_max_niter, other_vf))
+      if (estimated_poly_value (other_vf, POLY_VALUE_MIN)
+	  >= estimated_max_niter)
 	other_vf = estimated_max_niter;
     }