[v1,1/2] Match: Support branch form for unsigned SAT_ADD

Message ID 20240520110048.2827900-1-pan2.li@intel.com
State New
Headers
Series [v1,1/2] Match: Support branch form for unsigned SAT_ADD |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed

Commit Message

Li, Pan2 May 20, 2024, 11 a.m. UTC
  From: Pan Li <pan2.li@intel.com>

This patch would like to support the branch form for unsigned
SAT_ADD.  For example as below:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  return (uint64_t) (x + y) >= x ? (x + y) : -1;
}

Different to the branchless version,  we leverage the simplify to
convert the branch version of SAT_ADD into branchless if and only
if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
the ability to choose branch or branchless implementation of .SAT_ADD.
For example,  some target can take care of branches code more optimally.

When the target implement the IFN_SAT_ADD for unsigned and before this
patch:
uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  uint64_t _2;
  __complex__ long unsigned int _6;
  long unsigned int _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
  _1 = REALPART_EXPR <_6>;
  _7 = IMAGPART_EXPR <_6>;
  if (_7 == 0)
    goto <bb 4>; [65.00%]
  else
    goto <bb 3>; [35.00%]
;;    succ:       4
;;                3

;;   basic block 3, loop depth 0
;;    pred:       2
;;    succ:       4

;;   basic block 4, loop depth 0
;;    pred:       3
;;                2
  # _2 = PHI <18446744073709551615(3), _1(2)>
  return _2;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _9;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
  return _9;
;;    succ:       EXIT
}

The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

	* match.pd: Add new simplify to convert branch SAT_ADD into
	branchless,  if and only if backend implement the IFN.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/match.pd | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
  

Comments

Tamar Christina May 20, 2024, 11:50 a.m. UTC | #1
Hi Pan,

> -----Original Message-----
> From: pan2.li@intel.com <pan2.li@intel.com>
> Sent: Monday, May 20, 2024 12:01 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>; richard.guenther@gmail.com; Pan Li
> <pan2.li@intel.com>
> Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
> 
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to support the branch form for unsigned
> SAT_ADD.  For example as below:
> 
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   return (uint64_t) (x + y) >= x ? (x + y) : -1;
> }
> 
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
> 
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
> uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   uint64_t _2;
>   __complex__ long unsigned int _6;
>   long unsigned int _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
>   _1 = REALPART_EXPR <_6>;
>   _7 = IMAGPART_EXPR <_6>;
>   if (_7 == 0)
>     goto <bb 4>; [65.00%]
>   else
>     goto <bb 3>; [35.00%]
> ;;    succ:       4
> ;;                3
> 
> ;;   basic block 3, loop depth 0
> ;;    pred:       2
> ;;    succ:       4
> 
> ;;   basic block 4, loop depth 0
> ;;    pred:       3
> ;;                2
>   # _2 = PHI <18446744073709551615(3), _1(2)>
>   return _2;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _9;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
>   return _9;
> ;;    succ:       EXIT
> }
> 
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
> 
> gcc/ChangeLog:
> 
> 	* match.pd: Add new simplify to convert branch SAT_ADD into
> 	branchless,  if and only if backend implement the IFN.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/match.pd | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..0547b57b3a3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> 
> +#if GIMPLE
> +
> +/* Simplify the branch version of SAT_ADD into branchless if and only if
> +   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
> +   ability to choose branch or branchless implementation of .SAT_ADD.  */
> +
> +(simplify
> + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> +
> +(simplify
> + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> +
> +#endif

Thanks, this looks good to me!

I'll leave it up to Richard to approve,
Richard: The reason for the direct_internal_fn_supported_p is because some
targets said that they currently handle the branch version better due to the lack
of some types.  At the time I reason it's just a target expansion bug but didn't hear anything.

To be honest, it feels to me like we should do this unconditionally, and just have the targets
that get faster branch version to handle it during expand? Since the patch series provides
a canonicalized version now.

This means we can also better support targets that have the vector optab but not the scalar one
as the above check would fail for these targets.

What do you think?

Thanks,
Tamar

> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
  
Richard Biener May 22, 2024, 1:23 p.m. UTC | #2
On Mon, May 20, 2024 at 1:50 PM Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> Hi Pan,
>
> > -----Original Message-----
> > From: pan2.li@intel.com <pan2.li@intel.com>
> > Sent: Monday, May 20, 2024 12:01 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>; richard.guenther@gmail.com; Pan Li
> > <pan2.li@intel.com>
> > Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to support the branch form for unsigned
> > SAT_ADD.  For example as below:
> >
> > uint64_t
> > sat_add (uint64_t x, uint64_t y)
> > {
> >   return (uint64_t) (x + y) >= x ? (x + y) : -1;
> > }
> >
> > Different to the branchless version,  we leverage the simplify to
> > convert the branch version of SAT_ADD into branchless if and only
> > if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> > the ability to choose branch or branchless implementation of .SAT_ADD.
> > For example,  some target can take care of branches code more optimally.
> >
> > When the target implement the IFN_SAT_ADD for unsigned and before this
> > patch:
> > uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   uint64_t _2;
> >   __complex__ long unsigned int _6;
> >   long unsigned int _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
> >   _1 = REALPART_EXPR <_6>;
> >   _7 = IMAGPART_EXPR <_6>;
> >   if (_7 == 0)
> >     goto <bb 4>; [65.00%]
> >   else
> >     goto <bb 3>; [35.00%]
> > ;;    succ:       4
> > ;;                3
> >
> > ;;   basic block 3, loop depth 0
> > ;;    pred:       2
> > ;;    succ:       4
> >
> > ;;   basic block 4, loop depth 0
> > ;;    pred:       3
> > ;;                2
> >   # _2 = PHI <18446744073709551615(3), _1(2)>
> >   return _2;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _9;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
> >   return _9;
> > ;;    succ:       EXIT
> > }
> >
> > The below test suites are passed for this patch:
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> > * The riscv fully regression test.
> >
> > gcc/ChangeLog:
> >
> >       * match.pd: Add new simplify to convert branch SAT_ADD into
> >       branchless,  if and only if backend implement the IFN.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/match.pd | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 0f9c34fa897..0547b57b3a3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_add @0 @1)
> >   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> >
> > +#if GIMPLE
> > +
> > +/* Simplify the branch version of SAT_ADD into branchless if and only if
> > +   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
> > +   ability to choose branch or branchless implementation of .SAT_ADD.  */

This comment or part of the description above should say this simplifies

   (x + y) >= x ? (x + y) : -1

as

  (x + y) | (-(typeof(x))((x + y) < x))

> > +(simplify
> > + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> > +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> > OPTIMIZE_FOR_BOTH))
> > +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> > +
> > +(simplify
> > + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> > +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> > OPTIMIZE_FOR_BOTH))
> > +   (bit_ior @2 (negate (convert (lt @2 @0))))))

and this should probably be (gt @2 @0)?

This misses INTEGER_TYPE_P constraints and it's supposed to be only
for TYPE_UNSIGNED?

> > +
> > +#endif
>
> Thanks, this looks good to me!
>
> I'll leave it up to Richard to approve,
> Richard: The reason for the direct_internal_fn_supported_p is because some
> targets said that they currently handle the branch version better due to the lack
> of some types.  At the time I reason it's just a target expansion bug but didn't hear anything.
>
> To be honest, it feels to me like we should do this unconditionally, and just have the targets
> that get faster branch version to handle it during expand? Since the patch series provides
> a canonicalized version now.

I'm not sure this is a good canonical form.

__imag .ADD_OVERFLOW (x, y) ? __real .ADD_OVERFLOW (x, y) : -1

would be better IMO.  It can be branch-less by using a COND_EXPR.

> This means we can also better support targets that have the vector optab but not the scalar one
> as the above check would fail for these targets.
>
> What do you think?
>
> Thanks,
> Tamar
>
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > --
> > 2.34.1
>
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..0547b57b3a3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3094,6 +3094,24 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (unsigned_integer_sat_add @0 @1)
  (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
 
+#if GIMPLE
+
+/* Simplify the branch version of SAT_ADD into branchless if and only if
+   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
+   ability to choose branch or branchless implementation of .SAT_ADD.  */
+
+(simplify
+ (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
+  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
+   (bit_ior @2 (negate (convert (lt @2 @0))))))
+
+(simplify
+ (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
+  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
+   (bit_ior @2 (negate (convert (lt @2 @0))))))
+
+#endif
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)