testsuite: i386: Fix expected vectoriziation in pr105493.c

Message ID 20250114123433.939789-1-christoph.muellner@vrull.eu
State New
Headers
Series testsuite: i386: Fix expected vectoriziation in pr105493.c |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed

Commit Message

Christoph Müllner Jan. 14, 2025, 12:34 p.m. UTC
  As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c.
When looking at the generated code, we can see that the generated code
is vectorized differently, resulting in a reduction from 225 instructions
down to 109. On the performance side, no changes were measured on a 5950X.

This patch adjusts the test condition to fit how the function gets
vectorized after ab18785840d7b8 (and probably further related changes).

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

	PR target/117079

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr105493.c: Fix expected vectorization
---
 gcc/testsuite/gcc.target/i386/pr105493.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Richard Biener Jan. 14, 2025, 12:46 p.m. UTC | #1
On Tue, 14 Jan 2025, Christoph Müllner wrote:

> As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c.
> When looking at the generated code, we can see that the generated code
> is vectorized differently, resulting in a reduction from 225 instructions
> down to 109. On the performance side, no changes were measured on a 5950X.
> 
> This patch adjusts the test condition to fit how the function gets
> vectorized after ab18785840d7b8 (and probably further related changes).

Can you adjust the comment?  We didn't vectorize the first loop in GCC 14
but now we do.  We should probably check that, dumping -vect-details
and checking for both loops being vectorized.

The slp1 scan should then be changed to scan there are no stores to tmp
left, like with maybe

 scan-tree-dump-not "MEM.*tmp.* = " "slp1"

OK with those changes.

Richard.

> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> 
> 	PR target/117079
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/i386/pr105493.c: Fix expected vectorization
> ---
>  gcc/testsuite/gcc.target/i386/pr105493.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c b/gcc/testsuite/gcc.target/i386/pr105493.c
> index c6fd16753cd9..fb0bf8aa0af7 100644
> --- a/gcc/testsuite/gcc.target/i386/pr105493.c
> +++ b/gcc/testsuite/gcc.target/i386/pr105493.c
> @@ -48,4 +48,4 @@ foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 )
>  
>  /* The first loop should be vectorized, which will eliminate redundant stores
>     and loads.  */
> -/* { dg-final { scan-tree-dump-times "  MEM <vector\\\(4\\\) unsigned int> \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */
> +/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> \\\[\[\^\]\]\*\\\]" "slp1" } } */
>
  
Christoph Müllner Jan. 14, 2025, 1:27 p.m. UTC | #2
On Tue, Jan 14, 2025 at 1:46 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 14 Jan 2025, Christoph Müllner wrote:
>
> > As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c.
> > When looking at the generated code, we can see that the generated code
> > is vectorized differently, resulting in a reduction from 225 instructions
> > down to 109. On the performance side, no changes were measured on a 5950X.
> >
> > This patch adjusts the test condition to fit how the function gets
> > vectorized after ab18785840d7b8 (and probably further related changes).
>
> Can you adjust the comment?  We didn't vectorize the first loop in GCC 14
> but now we do.  We should probably check that, dumping -vect-details
> and checking for both loops being vectorized.
>
> The slp1 scan should then be changed to scan there are no stores to tmp
> left, like with maybe
>
>  scan-tree-dump-not "MEM.*tmp.* = " "slp1"

Thanks for this idea!
The exact proposal won't work because of the "tmp" (we had before
"MEM[(uint8_t *)pix1_90(D) + 4B];"
and similar). The significant difference is that we don't have "MEM("
anymore, but "MEM <vector".
So this works:

/* All loops should be vectorized.  */
/* { dg-final { scan-tree-dump-not "MEM\\\[\\\(" "slp1" } } */
/* { dg-final { scan-tree-dump "MEM <vector\\\(8\\\) unsigned char>
\\\[\[\^\]\]\*\\\]" "slp1" } } */

>
> OK with those changes.
>
> Richard.
>
> > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> >
> >       PR target/117079
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/pr105493.c: Fix expected vectorization
> > ---
> >  gcc/testsuite/gcc.target/i386/pr105493.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c b/gcc/testsuite/gcc.target/i386/pr105493.c
> > index c6fd16753cd9..fb0bf8aa0af7 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr105493.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr105493.c
> > @@ -48,4 +48,4 @@ foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 )
> >
> >  /* The first loop should be vectorized, which will eliminate redundant stores
> >     and loads.  */
> > -/* { dg-final { scan-tree-dump-times "  MEM <vector\\\(4\\\) unsigned int> \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> \\\[\[\^\]\]\*\\\]" "slp1" } } */
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
  
Richard Biener Jan. 14, 2025, 1:35 p.m. UTC | #3
On Tue, 14 Jan 2025, Christoph Müllner wrote:

> On Tue, Jan 14, 2025 at 1:46 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 14 Jan 2025, Christoph Müllner wrote:
> >
> > > As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c.
> > > When looking at the generated code, we can see that the generated code
> > > is vectorized differently, resulting in a reduction from 225 instructions
> > > down to 109. On the performance side, no changes were measured on a 5950X.
> > >
> > > This patch adjusts the test condition to fit how the function gets
> > > vectorized after ab18785840d7b8 (and probably further related changes).
> >
> > Can you adjust the comment?  We didn't vectorize the first loop in GCC 14
> > but now we do.  We should probably check that, dumping -vect-details
> > and checking for both loops being vectorized.
> >
> > The slp1 scan should then be changed to scan there are no stores to tmp
> > left, like with maybe
> >
> >  scan-tree-dump-not "MEM.*tmp.* = " "slp1"
> 
> Thanks for this idea!
> The exact proposal won't work because of the "tmp" (we had before
> "MEM[(uint8_t *)pix1_90(D) + 4B];"
> and similar). The significant difference is that we don't have "MEM("
> anymore, but "MEM <vector".
> So this works:
> 
> /* All loops should be vectorized.  */
> /* { dg-final { scan-tree-dump-not "MEM\\\[\\\(" "slp1" } } */
> /* { dg-final { scan-tree-dump "MEM <vector\\\(8\\\) unsigned char>
> \\\[\[\^\]\]\*\\\]" "slp1" } } */

But this isn't a very meaningful test?  As the comment suggested
we now eliminate all loads + stores to 'tmp' as we vectorize both
the first loop and the final reduction.

In the dump I'm looking at there isn't any MEM accessing 'tmp'
anymore, so we should be able to assert this?

Richard.


> >
> > OK with those changes.
> >
> > Richard.
> >
> > > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> > >
> > >       PR target/117079
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/i386/pr105493.c: Fix expected vectorization
> > > ---
> > >  gcc/testsuite/gcc.target/i386/pr105493.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c b/gcc/testsuite/gcc.target/i386/pr105493.c
> > > index c6fd16753cd9..fb0bf8aa0af7 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr105493.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr105493.c
> > > @@ -48,4 +48,4 @@ foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 )
> > >
> > >  /* The first loop should be vectorized, which will eliminate redundant stores
> > >     and loads.  */
> > > -/* { dg-final { scan-tree-dump-times "  MEM <vector\\\(4\\\) unsigned int> \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */
> > > +/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> \\\[\[\^\]\]\*\\\]" "slp1" } } */
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
  
Christoph Müllner Jan. 14, 2025, 2:52 p.m. UTC | #4
On Tue, Jan 14, 2025 at 2:35 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 14 Jan 2025, Christoph Müllner wrote:
>
> > On Tue, Jan 14, 2025 at 1:46 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 14 Jan 2025, Christoph Müllner wrote:
> > >
> > > > As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c.
> > > > When looking at the generated code, we can see that the generated code
> > > > is vectorized differently, resulting in a reduction from 225 instructions
> > > > down to 109. On the performance side, no changes were measured on a 5950X.
> > > >
> > > > This patch adjusts the test condition to fit how the function gets
> > > > vectorized after ab18785840d7b8 (and probably further related changes).
> > >
> > > Can you adjust the comment?  We didn't vectorize the first loop in GCC 14
> > > but now we do.  We should probably check that, dumping -vect-details
> > > and checking for both loops being vectorized.
> > >
> > > The slp1 scan should then be changed to scan there are no stores to tmp
> > > left, like with maybe
> > >
> > >  scan-tree-dump-not "MEM.*tmp.* = " "slp1"
> >
> > Thanks for this idea!
> > The exact proposal won't work because of the "tmp" (we had before
> > "MEM[(uint8_t *)pix1_90(D) + 4B];"
> > and similar). The significant difference is that we don't have "MEM("
> > anymore, but "MEM <vector".
> > So this works:
> >
> > /* All loops should be vectorized.  */
> > /* { dg-final { scan-tree-dump-not "MEM\\\[\\\(" "slp1" } } */
> > /* { dg-final { scan-tree-dump "MEM <vector\\\(8\\\) unsigned char>
> > \\\[\[\^\]\]\*\\\]" "slp1" } } */
>
> But this isn't a very meaningful test?  As the comment suggested
> we now eliminate all loads + stores to 'tmp' as we vectorize both
> the first loop and the final reduction.
>
> In the dump I'm looking at there isn't any MEM accessing 'tmp'
> anymore, so we should be able to assert this?

Understood now. I was misled by ivtmp_291 and friends and I thought
this might lead to an issue, but that's not the case (and should not happen).
Testing for the absence of any stores to tmp is the right thing to do.

Thanks!

>
> Richard.
>
>
> > >
> > > OK with those changes.
> > >
> > > Richard.
> > >
> > > > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> > > >
> > > >       PR target/117079
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >       * gcc.target/i386/pr105493.c: Fix expected vectorization
> > > > ---
> > > >  gcc/testsuite/gcc.target/i386/pr105493.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c b/gcc/testsuite/gcc.target/i386/pr105493.c
> > > > index c6fd16753cd9..fb0bf8aa0af7 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/pr105493.c
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr105493.c
> > > > @@ -48,4 +48,4 @@ foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 )
> > > >
> > > >  /* The first loop should be vectorized, which will eliminate redundant stores
> > > >     and loads.  */
> > > > -/* { dg-final { scan-tree-dump-times "  MEM <vector\\\(4\\\) unsigned int> \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */
> > > > +/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> \\\[\[\^\]\]\*\\\]" "slp1" } } */
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
  

Patch

diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c b/gcc/testsuite/gcc.target/i386/pr105493.c
index c6fd16753cd9..fb0bf8aa0af7 100644
--- a/gcc/testsuite/gcc.target/i386/pr105493.c
+++ b/gcc/testsuite/gcc.target/i386/pr105493.c
@@ -48,4 +48,4 @@  foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 )
 
 /* The first loop should be vectorized, which will eliminate redundant stores
    and loads.  */
-/* { dg-final { scan-tree-dump-times "  MEM <vector\\\(4\\\) unsigned int> \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */
+/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> \\\[\[\^\]\]\*\\\]" "slp1" } } */