x86: Refine V4BF/V2BF FMA testcase

Message ID 20240905085655.1918785-1-admin@levyhsu.com
State New
Headers
Series x86: Refine V4BF/V2BF FMA testcase |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed

Commit Message

Levy Hsu Sept. 5, 2024, 8:55 a.m. UTC
  Simple testcase fix, ok for trunk?

This patch removes specific register checks to account for possible
register spills and disables tests in 32-bit mode. This adjustment
is necessary because V4BF operations in 32-bit mode require duplicating
instructions, which lead to unintended test failures. It fixed the
case when testing with --target_board='unix{-m32\ -march=cascadelake}'

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Remove specific
        register checks to account for potential register spills. Exclude tests
        in 32-bit mode to prevent incorrect failure reports due to the need for
        multiple instruction executions in handling V4BF operations.
---
 .../gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c     | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
  

Comments

Jiang, Haochen Sept. 6, 2024, 2:34 a.m. UTC | #1
> From: Levy Hsu <admin@levyhsu.com>
> Sent: Thursday, September 5, 2024 4:55 PM
> To: gcc-patches@gcc.gnu.org
> 
> Simple testcase fix, ok for trunk?
> 
> This patch removes specific register checks to account for possible
> register spills and disables tests in 32-bit mode. This adjustment
> is necessary because V4BF operations in 32-bit mode require duplicating
> instructions, which lead to unintended test failures. It fixed the
> case when testing with --target_board='unix{-m32\ -march=cascadelake}'
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Remove specific
>         register checks to account for potential register spills. Exclude tests
>         in 32-bit mode to prevent incorrect failure reports due to the need for
>         multiple instruction executions in handling V4BF operations.
> ---
>  .../gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c     | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> index 72e17e99603..17c32c1d36b 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile } */

You could simply add { target { ! ia32 } } here, but not each line of
scan-assembler-times.

I don't think we need this test been run for -m32 due to V4BF. Actually
the better choice is to split the testcase to two part, for V2BF, I suppose
it could be run under -m32.

Thx,
Haochen

>  /* { dg-options "-mavx10.2 -O2" } */
> -/* { dg-final { scan-assembler-times
> "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times
> "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times
> "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times
> "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-
> 9\]" 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-
> 9\]" 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times
> "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times
> "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
> 
>  typedef __bf16 v4bf __attribute__ ((__vector_size__ (8)));
>  typedef __bf16 v2bf __attribute__ ((__vector_size__ (4)));
> --
> 2.31.1
  
Hongtao Liu Sept. 6, 2024, 2:51 a.m. UTC | #2
On Fri, Sep 6, 2024 at 10:34 AM Jiang, Haochen <haochen.jiang@intel.com> wrote:
>
> > From: Levy Hsu <admin@levyhsu.com>
> > Sent: Thursday, September 5, 2024 4:55 PM
> > To: gcc-patches@gcc.gnu.org
> >
> > Simple testcase fix, ok for trunk?
> >
> > This patch removes specific register checks to account for possible
> > register spills and disables tests in 32-bit mode. This adjustment
> > is necessary because V4BF operations in 32-bit mode require duplicating
> > instructions, which lead to unintended test failures. It fixed the
> > case when testing with --target_board='unix{-m32\ -march=cascadelake}'
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Remove specific
> >         register checks to account for potential register spills. Exclude tests
> >         in 32-bit mode to prevent incorrect failure reports due to the need for
> >         multiple instruction executions in handling V4BF operations.
> > ---
> >  .../gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c     | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> > b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> > index 72e17e99603..17c32c1d36b 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> > @@ -1,9 +1,9 @@
> >  /* { dg-do compile } */
>
> You could simply add { target { ! ia32 } } here, but not each line of
> scan-assembler-times.
It can be compiled at target  ia32, I guess for ia32, fma instructions
can be scanned for 3 times(1 for original 32-bit vector fma, 2 from
split 64-bit vector fma to 2 32-bit vector fma)
So better to scan 2 fma for ! ia32, 3 fma for ia32?

>
> I don't think we need this test been run for -m32 due to V4BF. Actually
> the better choice is to split the testcase to two part, for V2BF, I suppose
> it could be run under -m32.
>
> Thx,
> Haochen
>
> >  /* { dg-options "-mavx10.2 -O2" } */
> > -/* { dg-final { scan-assembler-times
> > "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> > 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> > -/* { dg-final { scan-assembler-times
> > "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> > 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> > -/* { dg-final { scan-assembler-times
> > "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> > 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> > -/* { dg-final { scan-assembler-times
> > "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-
> > 9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
> > +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-
> > 9\]" 2 { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-
> > 9\]" 2 { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler-times
> > "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler-times
> > "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
> >
> >  typedef __bf16 v4bf __attribute__ ((__vector_size__ (8)));
> >  typedef __bf16 v2bf __attribute__ ((__vector_size__ (4)));
> > --
> > 2.31.1
>
  

Patch

diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
index 72e17e99603..17c32c1d36b 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
@@ -1,9 +1,9 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx10.2 -O2" } */
-/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
 
 typedef __bf16 v4bf __attribute__ ((__vector_size__ (8)));
 typedef __bf16 v2bf __attribute__ ((__vector_size__ (4)));