[4/4] Testsuite updates

Message ID 20240521124822.E5A0313A1E@imap1.dmz-prg2.suse.org
State New
Headers
Series [1/4] Avoid requiring VEC_PERM represenatives |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm fail Testing failed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 fail Testing failed

Commit Message

Richard Biener May 21, 2024, 12:48 p.m. UTC
  The gcc.dg/vect/slp-12a.c case is interesting as we currently split
the 8 store group into lanes 0-5 which we SLP with an unroll factor
of two (on x86-64 with SSE) and the remaining two lanes are using
interleaving vectorization with a final unroll factor of four.  Thus
we're using hybrid SLP within a single store group.  After the change
we discover the same 0-5 lane SLP part as well as two single-lane
parts feeding the full store group.  But that results in a load
permutation that isn't supported (I have WIP patchs to rectify that).
So we end up cancelling SLP and vectorizing the whole loop with
interleaving which is IMO good and results in better code.

This is similar for gcc.target/i386/pr52252-atom.c where interleaving
generates much better code than hybrid SLP.  I'm unsure how to update
the testcase though.

gcc.dg/vect/slp-21.c runs into similar situations.  Note that when
when analyzing SLP operations we discard an instance we currently
force the full loop to have no SLP because hybrid detection is
broken.  It's probably not worth fixing this at this moment.

For gcc.dg/vect/pr97428.c we are not splitting the 16 store group
into two but merge the two 8 lane loads into one before doing the
store and thus have only a single SLP instance.  A similar situation
happens in gcc.dg/vect/slp-11c.c but the branches feeding the
single SLP store only have a single lane.  Likewise for
gcc.dg/vect/vect-complex-5.c and gcc.dg/vect/vect-gather-2.c.

gcc.dg/vect/slp-cond-1.c has an additional SLP vectorization
with a SLP store group of size two but two single-lane branches.

gcc.target/i386/pr98928.c ICEs in SLP permute optimization
because we don't expect a constant and internal branch to be
merged with a permute node in
vect_optimize_slp_pass::change_vec_perm_layout:4859 (the only
permutes merging two SLP nodes are two-operator nodes right now).
This still requires fixing.

The whole series has been bootstrapped and tested on 
x86_64-unknown-linux-gnu with the gcc.target/i386/pr98928.c FAIL
unfixed.

Comments welcome (and hello ARM CI), RISC-V and other arch
testing appreciated.  Unless there are comments to the contrary
I plan to push patch 1 and 2 tomorrow.

Thanks,
Richard.

	* gcc.dg/vect/pr97428.c: Expect a single store SLP group.
	* gcc.dg/vect/slp-11c.c: Likewise.
	* gcc.dg/vect/vect-complex-5.c: Likewise.
	* gcc.dg/vect/slp-12a.c: Do not expect SLP.
	* gcc.dg/vect/slp-21.c: Likewise.
	* gcc.dg/vect/slp-cond-1.c: Expect one more SLP.
	* gcc.dg/vect/vect-gather-2.c: Expect SLP to be used.
	* gcc.target/i386/pr52252-atom.c: XFAIL test for palignr.
---
 gcc/testsuite/gcc.dg/vect/pr97428.c          |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-11c.c          |  5 +++--
 gcc/testsuite/gcc.dg/vect/slp-12a.c          |  6 +++++-
 gcc/testsuite/gcc.dg/vect/slp-21.c           | 19 +++++--------------
 gcc/testsuite/gcc.dg/vect/slp-cond-1.c       |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-complex-5.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-gather-2.c    |  1 -
 gcc/testsuite/gcc.target/i386/pr52252-atom.c |  3 ++-
 8 files changed, 18 insertions(+), 22 deletions(-)
  

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c b/gcc/testsuite/gcc.dg/vect/pr97428.c
index 60dd984cfd3..3cc9976c00c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97428.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97428.c
@@ -44,5 +44,5 @@  void foo_i2(dcmlx4_t dst[], const dcmlx_t src[], int n)
 /* { dg-final { scan-tree-dump "Detected interleaving store of size 16" "vect" } } */
 /* We're not able to peel & apply re-aligning to make accesses well-aligned for !vect_hw_misalign,
    but we could by peeling the stores for alignment and applying re-aligning loads.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { ! vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { ! vect_hw_misalign } } } } */
 /* { dg-final { scan-tree-dump-not "gap of 6 elements" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c b/gcc/testsuite/gcc.dg/vect/slp-11c.c
index 0f680cd4e60..169b0d10eec 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c
@@ -13,7 +13,8 @@  main1 ()
   unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
   float out[N*8];
 
-  /* Different operations - not SLPable.  */
+  /* Different operations - we SLP the store and split the group to two
+     single-lane branches.  */
   for (i = 0; i < N*4; i++)
     {
       out[i*2] = ((float) in[i*2] * 2 + 6) ;
@@ -44,4 +45,4 @@  int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1  "vect"  } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuite/gcc.dg/vect/slp-12a.c
index 973de6ada21..2f98dc9da0b 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
@@ -40,6 +40,10 @@  main1 ()
       out[i*8 + 3] = b3 - 1;
       out[i*8 + 4] = b4 - 8;
       out[i*8 + 5] = b5 - 7;
+      /* Due to the use in the ia[i] store we keep the feeding expression
+         in the form ((in[i*8 + 6] + 11) * 3 - 3) while other expressions
+	 got associated as for example (in[i*5 + 5] * 4 + 33).  That
+	 causes SLP discovery to fail.  */
       out[i*8 + 6] = b6 - 3;
       out[i*8 + 7] = b7 - 7;
 
@@ -76,5 +80,5 @@  int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c
index 58751688414..dc153a53b47 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
@@ -12,6 +12,7 @@  main1 ()
   unsigned short out[N*8], out2[N*8], b0, b1, b2, b3, b4, a0, a1, a2, a3, b5;
   unsigned short in[N*8];
 
+#pragma GCC novector
   for (i = 0; i < N*8; i++)
     {
       in[i] = i;
@@ -202,18 +203,8 @@  int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target { vect_strided4 || vect_extract_even_odd } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided4 || vect_extract_even_odd } } } } } */
-/* Some targets can vectorize the second of the three main loops using
-   hybrid SLP.  For 128-bit vectors, the required 4->3 permutations are:
-
-   { 0, 1, 2, 4, 5, 6, 8, 9 }
-   { 2, 4, 5, 6, 8, 9, 10, 12 }
-   { 5, 6, 8, 9, 10, 12, 13, 14 }
-
-   Not all vect_perm targets support that, and it's a bit too specific to have
-   its own effective-target selector, so we just test targets directly.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided4 } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { vect_strided4 || vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  { target  { ! { vect_strided4 || vect_extract_even_odd } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
index 450c7141c96..16ab0cc7605 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
@@ -125,4 +125,4 @@  main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
index addcf60438c..ac562dc475c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
@@ -41,4 +41,4 @@  main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
index 4c23b808333..10e64e64d47 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
@@ -36,6 +36,5 @@  f3 (int *restrict y, int *restrict x, int *restrict indices)
     }
 }
 
-/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */
 /* { dg-final { scan-tree-dump "different gather base" vect { target { ! vect_gather_load_ifn } } } } */
 /* { dg-final { scan-tree-dump "different gather scale" vect { target { ! vect_gather_load_ifn } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom.c b/gcc/testsuite/gcc.target/i386/pr52252-atom.c
index 11f94411575..02736d56d31 100644
--- a/gcc/testsuite/gcc.target/i386/pr52252-atom.c
+++ b/gcc/testsuite/gcc.target/i386/pr52252-atom.c
@@ -25,4 +25,5 @@  matrix_mul (byte *in, byte *out, int size)
     }
 }
 
-/* { dg-final { scan-assembler "palignr" } } */
+/* We are no longer using hybrid SLP.  */
+/* { dg-final { scan-assembler "palignr" { xfail *-*-* } } } */