[vect] : gate COMPLEX_MUL on FP_CONTRACT_FAST [PR125431]

Message ID patch-20589-tamar@arm.com
State New
Headers
Series [vect] : gate COMPLEX_MUL on FP_CONTRACT_FAST [PR125431] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply

Commit Message

Tamar Christina June 3, 2026, 7:50 a.m. UTC
  The checks for FP_CONTRACT_FAST were in the wrong place for complex_mul.

The location it was in would only block FMA but not MUL.  It would also not
really reject the forming of the FMA, it would just create an invalid collection
of nodes which would fail analysis later on.

However complex multiplication is also a contraction, since it's doing

real = a*c - b*d
imag = a*d + b*c

This moves the checks up to earliest possible location and actually just returns
and adds the missing check for FMS.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Pushed.

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/125431
	* tree-vect-slp-patterns.cc (complex_mul_pattern::matches,
	complex_fms_pattern::matches): Gate on FP contraction.

gcc/testsuite/ChangeLog:

	PR tree-optimization/125431
	* gfortran.dg/vect/pr125431.f90: New test.

---


--
  

Comments

Richard Biener June 3, 2026, 8:01 a.m. UTC | #1
On Wed, 3 Jun 2026, Tamar Christina wrote:

> The checks for FP_CONTRACT_FAST were in the wrong place for complex_mul.
> 
> The location it was in would only block FMA but not MUL.  It would also not
> really reject the forming of the FMA, it would just create an invalid collection
> of nodes which would fail analysis later on.
> 
> However complex multiplication is also a contraction, since it's doing
> 
> real = a*c - b*d
> imag = a*d + b*c
> 
> This moves the checks up to earliest possible location and actually just returns
> and adds the missing check for FMS.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Pushed.

Thanks - can you also see to backport to 16 and 15 (this friday is RC
for 15.3), but leave a short while to figure where the
"loop vectorized" scan will fail?

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/125431
> 	* tree-vect-slp-patterns.cc (complex_mul_pattern::matches,
> 	complex_fms_pattern::matches): Gate on FP contraction.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/125431
> 	* gfortran.dg/vect/pr125431.f90: New test.
> 
> ---
> diff --git a/gcc/testsuite/gfortran.dg/vect/pr125431.f90 b/gcc/testsuite/gfortran.dg/vect/pr125431.f90
> new file mode 100644
> index 0000000000000000000000000000000000000000..fe11b5eface9978c916783619677a10a7759c43d
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/vect/pr125431.f90
> @@ -0,0 +1,16 @@
> +! { dg-do compile }
> +! { dg-additional-options "-O2 -ffp-contract=off" }
> +! { dg-additional-options "-march=armv8.3-a" { target { aarch64*-*-* } } }
> +
> +subroutine foo(a,b,c)
> +
> +  complex :: a(6,6)
> +  complex :: b(6,6)
> +  complex :: c(6,6)
> +
> +  c = MATMUL(a, b)
> +
> +end subroutine foo
> +
> +! { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { vect_float } }  } }
> +! { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" } }
> diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
> index c0031a129a445160998f70bf9c18e1ef70464eed..181281c2154bedfc8e8c6b1bde1226e194ba99e5 100644
> --- a/gcc/tree-vect-slp-patterns.cc
> +++ b/gcc/tree-vect-slp-patterns.cc
> @@ -1038,6 +1038,11 @@ complex_mul_pattern::matches (complex_operation_t op,
>    if (op != MINUS_PLUS)
>      return IFN_LAST;
>  
> +  /* It's only valid to form FMAs and MUL with -ffp-contract=fast.  */
> +  if (flag_fp_contract_mode != FP_CONTRACT_FAST
> +      && FLOAT_TYPE_P (SLP_TREE_VECTYPE (*node)))
> +    return IFN_LAST;
> +
>    auto childs = *ops;
>    auto l0node = SLP_TREE_CHILDREN (childs[0]);
>  
> @@ -1050,11 +1055,8 @@ complex_mul_pattern::matches (complex_operation_t op,
>    auto_vec<slp_tree> left_op, right_op;
>    slp_tree add0 = NULL;
>  
> -  /* Check if we may be a multiply add.  It's only valid to form FMAs
> -     with -ffp-contract=fast.  */
> +  /* Check if we may be a multiply add.  */
>    if (!mul0
> -      && (flag_fp_contract_mode == FP_CONTRACT_FAST
> -	  || !FLOAT_TYPE_P (SLP_TREE_VECTYPE (*node)))
>        && vect_match_expression_p (l0node[0], PLUS_EXPR))
>      {
>        auto vals = SLP_TREE_CHILDREN (l0node[0]);
> @@ -1285,6 +1287,11 @@ complex_fms_pattern::matches (complex_operation_t op,
>    if (!vect_match_expression_p (root, MINUS_EXPR))
>      return IFN_LAST;
>  
> +  /* It's only valid to form FMSs with -ffp-contract=fast.  */
> +  if (flag_fp_contract_mode != FP_CONTRACT_FAST
> +      && FLOAT_TYPE_P (SLP_TREE_VECTYPE (*ref_node)))
> +    return IFN_LAST;
> +
>    /* TODO: Support invariants here, with the new layout CADD now
>  	   can match before we get a chance to try CFMS.  */
>    auto nodes = SLP_TREE_CHILDREN (root);
> 
> 
>
  
Tamar Christina June 3, 2026, 8:32 a.m. UTC | #2
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: 03 June 2026 09:01
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rdapp.gcc@gmail.com
> Subject: Re: [PATCH][vect]: gate COMPLEX_MUL on FP_CONTRACT_FAST
> [PR125431]
> 
> On Wed, 3 Jun 2026, Tamar Christina wrote:
> 
> > The checks for FP_CONTRACT_FAST were in the wrong place for
> complex_mul.
> >
> > The location it was in would only block FMA but not MUL.  It would also not
> > really reject the forming of the FMA, it would just create an invalid collection
> > of nodes which would fail analysis later on.
> >
> > However complex multiplication is also a contraction, since it's doing
> >
> > real = a*c - b*d
> > imag = a*d + b*c
> >
> > This moves the checks up to earliest possible location and actually just
> returns
> > and adds the missing check for FMS.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Pushed.
> 
> Thanks - can you also see to backport to 16 and 15 (this friday is RC
> for 15.3), but leave a short while to figure where the
> "loop vectorized" scan will fail?

Sure, will do :)

Cheers,
Tamar

> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/125431
> > 	* tree-vect-slp-patterns.cc (complex_mul_pattern::matches,
> > 	complex_fms_pattern::matches): Gate on FP contraction.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	PR tree-optimization/125431
> > 	* gfortran.dg/vect/pr125431.f90: New test.
> >
> > ---
> > diff --git a/gcc/testsuite/gfortran.dg/vect/pr125431.f90
> b/gcc/testsuite/gfortran.dg/vect/pr125431.f90
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..fe11b5eface9978c916
> 783619677a10a7759c43d
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/vect/pr125431.f90
> > @@ -0,0 +1,16 @@
> > +! { dg-do compile }
> > +! { dg-additional-options "-O2 -ffp-contract=off" }
> > +! { dg-additional-options "-march=armv8.3-a" { target { aarch64*-*-* } } }
> > +
> > +subroutine foo(a,b,c)
> > +
> > +  complex :: a(6,6)
> > +  complex :: b(6,6)
> > +  complex :: c(6,6)
> > +
> > +  c = MATMUL(a, b)
> > +
> > +end subroutine foo
> > +
> > +! { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target {
> vect_float } }  } }
> > +! { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" } }
> > diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
> > index
> c0031a129a445160998f70bf9c18e1ef70464eed..181281c2154bedfc8e8c6
> b1bde1226e194ba99e5 100644
> > --- a/gcc/tree-vect-slp-patterns.cc
> > +++ b/gcc/tree-vect-slp-patterns.cc
> > @@ -1038,6 +1038,11 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> >    if (op != MINUS_PLUS)
> >      return IFN_LAST;
> >
> > +  /* It's only valid to form FMAs and MUL with -ffp-contract=fast.  */
> > +  if (flag_fp_contract_mode != FP_CONTRACT_FAST
> > +      && FLOAT_TYPE_P (SLP_TREE_VECTYPE (*node)))
> > +    return IFN_LAST;
> > +
> >    auto childs = *ops;
> >    auto l0node = SLP_TREE_CHILDREN (childs[0]);
> >
> > @@ -1050,11 +1055,8 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> >    auto_vec<slp_tree> left_op, right_op;
> >    slp_tree add0 = NULL;
> >
> > -  /* Check if we may be a multiply add.  It's only valid to form FMAs
> > -     with -ffp-contract=fast.  */
> > +  /* Check if we may be a multiply add.  */
> >    if (!mul0
> > -      && (flag_fp_contract_mode == FP_CONTRACT_FAST
> > -	  || !FLOAT_TYPE_P (SLP_TREE_VECTYPE (*node)))
> >        && vect_match_expression_p (l0node[0], PLUS_EXPR))
> >      {
> >        auto vals = SLP_TREE_CHILDREN (l0node[0]);
> > @@ -1285,6 +1287,11 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> >    if (!vect_match_expression_p (root, MINUS_EXPR))
> >      return IFN_LAST;
> >
> > +  /* It's only valid to form FMSs with -ffp-contract=fast.  */
> > +  if (flag_fp_contract_mode != FP_CONTRACT_FAST
> > +      && FLOAT_TYPE_P (SLP_TREE_VECTYPE (*ref_node)))
> > +    return IFN_LAST;
> > +
> >    /* TODO: Support invariants here, with the new layout CADD now
> >  	   can match before we get a chance to try CFMS.  */
> >    auto nodes = SLP_TREE_CHILDREN (root);
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
  

Patch

diff --git a/gcc/testsuite/gfortran.dg/vect/pr125431.f90 b/gcc/testsuite/gfortran.dg/vect/pr125431.f90
new file mode 100644
index 0000000000000000000000000000000000000000..fe11b5eface9978c916783619677a10a7759c43d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/pr125431.f90
@@ -0,0 +1,16 @@ 
+! { dg-do compile }
+! { dg-additional-options "-O2 -ffp-contract=off" }
+! { dg-additional-options "-march=armv8.3-a" { target { aarch64*-*-* } } }
+
+subroutine foo(a,b,c)
+
+  complex :: a(6,6)
+  complex :: b(6,6)
+  complex :: c(6,6)
+
+  c = MATMUL(a, b)
+
+end subroutine foo
+
+! { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { vect_float } }  } }
+! { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" } }
diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
index c0031a129a445160998f70bf9c18e1ef70464eed..181281c2154bedfc8e8c6b1bde1226e194ba99e5 100644
--- a/gcc/tree-vect-slp-patterns.cc
+++ b/gcc/tree-vect-slp-patterns.cc
@@ -1038,6 +1038,11 @@  complex_mul_pattern::matches (complex_operation_t op,
   if (op != MINUS_PLUS)
     return IFN_LAST;
 
+  /* It's only valid to form FMAs and MUL with -ffp-contract=fast.  */
+  if (flag_fp_contract_mode != FP_CONTRACT_FAST
+      && FLOAT_TYPE_P (SLP_TREE_VECTYPE (*node)))
+    return IFN_LAST;
+
   auto childs = *ops;
   auto l0node = SLP_TREE_CHILDREN (childs[0]);
 
@@ -1050,11 +1055,8 @@  complex_mul_pattern::matches (complex_operation_t op,
   auto_vec<slp_tree> left_op, right_op;
   slp_tree add0 = NULL;
 
-  /* Check if we may be a multiply add.  It's only valid to form FMAs
-     with -ffp-contract=fast.  */
+  /* Check if we may be a multiply add.  */
   if (!mul0
-      && (flag_fp_contract_mode == FP_CONTRACT_FAST
-	  || !FLOAT_TYPE_P (SLP_TREE_VECTYPE (*node)))
       && vect_match_expression_p (l0node[0], PLUS_EXPR))
     {
       auto vals = SLP_TREE_CHILDREN (l0node[0]);
@@ -1285,6 +1287,11 @@  complex_fms_pattern::matches (complex_operation_t op,
   if (!vect_match_expression_p (root, MINUS_EXPR))
     return IFN_LAST;
 
+  /* It's only valid to form FMSs with -ffp-contract=fast.  */
+  if (flag_fp_contract_mode != FP_CONTRACT_FAST
+      && FLOAT_TYPE_P (SLP_TREE_VECTYPE (*ref_node)))
+    return IFN_LAST;
+
   /* TODO: Support invariants here, with the new layout CADD now
 	   can match before we get a chance to try CFMS.  */
   auto nodes = SLP_TREE_CHILDREN (root);