[1/2] Match: Simplify unsigned scalar sat_sub(x -1) to (x - x != 0)

Message ID 20241023053823.14653-1-xuli1@eswincomputing.com
State New
Headers
Series [1/2] Match: Simplify unsigned scalar sat_sub(x -1) to (x - x != 0) |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed

Commit Message

Li Xu Oct. 23, 2024, 5:38 a.m. UTC
  From: xuli <xuli1@eswincomputing.com>

When the imm operand op1=1 in the unsigned scalar sat_sub form2 below,
we can simplify (x != 0 ? x + max : 0) to (x - x != 0), thereby eliminating
a branch instruction.

Form2:
T __attribute__((noinline))             \
sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
{                                       \
  return x >= (T)IMM ? x - (T)IMM : 0;  \
}

Take below form 2 as example:
DEF_SAT_U_SUB_IMM_FMT_2(uint8_t, 1)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
{
  uint8_t _1;
  uint8_t _3;

  <bb 2> [local count: 1073741824]:
  if (x_2(D) != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870912]:
  _3 = x_2(D) + 255;

  <bb 4> [local count: 1073741824]:
  # _1 = PHI <x_2(D)(2), _3(3)>
  return _1;

}

Assembly code:
sat_u_sub_imm1_uint8_t_fmt_2:
	beq	a0,zero,.L2
	addiw	a0,a0,-1
	andi	a0,a0,0xff
.L2:
	ret

After this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
{
  _Bool _1;
  unsigned char _2;
  uint8_t _4;

  <bb 2> [local count: 1073741824]:
  _1 = x_3(D) != 0;
  _2 = (unsigned char) _1;
  _4 = x_3(D) - _2;
  return _4;

}

Assembly code:
sat_u_sub_imm1_uint8_t_fmt_2:
	snez	a5,a0
	subw	a0,a0,a5
	andi	a0,a0,0xff
	ret

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:

	* match.pd: Simplify (x != 0 ? x + max : 0) to (x - x != 0).
---
 gcc/match.pd | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)
  

Comments

Andrew Pinski Oct. 23, 2024, 5:45 a.m. UTC | #1
On Tue, Oct 22, 2024 at 10:39 PM Li Xu <xuli1@eswincomputing.com> wrote:
>
> From: xuli <xuli1@eswincomputing.com>
>
> When the imm operand op1=1 in the unsigned scalar sat_sub form2 below,
> we can simplify (x != 0 ? x + max : 0) to (x - x != 0), thereby eliminating
> a branch instruction.
>
> Form2:
> T __attribute__((noinline))             \
> sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
> {                                       \
>   return x >= (T)IMM ? x - (T)IMM : 0;  \
> }
>
> Take below form 2 as example:
> DEF_SAT_U_SUB_IMM_FMT_2(uint8_t, 1)
>
> Before this patch:
> __attribute__((noinline))
> uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
> {
>   uint8_t _1;
>   uint8_t _3;
>
>   <bb 2> [local count: 1073741824]:
>   if (x_2(D) != 0)
>     goto <bb 3>; [50.00%]
>   else
>     goto <bb 4>; [50.00%]
>
>   <bb 3> [local count: 536870912]:
>   _3 = x_2(D) + 255;
>
>   <bb 4> [local count: 1073741824]:
>   # _1 = PHI <x_2(D)(2), _3(3)>
>   return _1;
>
> }
>
> Assembly code:
> sat_u_sub_imm1_uint8_t_fmt_2:
>         beq     a0,zero,.L2
>         addiw   a0,a0,-1
>         andi    a0,a0,0xff
> .L2:
>         ret
>
> After this patch:
> __attribute__((noinline))
> uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
> {
>   _Bool _1;
>   unsigned char _2;
>   uint8_t _4;
>
>   <bb 2> [local count: 1073741824]:
>   _1 = x_3(D) != 0;
>   _2 = (unsigned char) _1;
>   _4 = x_3(D) - _2;
>   return _4;
>
> }
>
> Assembly code:
> sat_u_sub_imm1_uint8_t_fmt_2:
>         snez    a5,a0
>         subw    a0,a0,a5
>         andi    a0,a0,0xff
>         ret
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
>
> Signed-off-by: Li Xu <xuli1@eswincomputing.com>
> gcc/ChangeLog:
>
>         * match.pd: Simplify (x != 0 ? x + max : 0) to (x - x != 0).
> ---
>  gcc/match.pd | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0455dfa6993..9dcb9555cc0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3383,6 +3383,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    }
>    (if (wi::eq_p (sum, wi::uhwi (0, precision)))))))
>
> +/* The boundary condition for case 10: IMM = 1:
> +   SAT_U_SUB = X >= IMM ? (X - IMM) : 0.
> +   simplify (X != 0 ? X + max : 0) to (X - X != 0).  */
> +(simplify
> + (cond (ne @0 integer_zerop) (plus @0 INTEGER_CST@1) integer_zerop)

I think you could use integer_all_onesp@1 here rather than the check
you are doing on the INTEGER_CST.
Then you only need the INTEGRAL_TYPE_P check.

You should also add a few testcases for this. See
`gcc.dg/tree-ssa/phiopt*.c` for similar testcases.

Thanks,
Andrew Pinski


> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +     && types_match (type, @0))
> +  (with
> +   {
> +      unsigned precision = TYPE_PRECISION (type);
> +      wide_int max = wi::mask (precision, false, precision);
> +      wide_int c1 = wi::to_wide (@1);
> +   }
> +   (if (wi::eq_p (c1, max))
> +   (minus @0 (convert (ne @0 { build_zero_cst (type); })))))))
> +
>  /* Signed saturation sub, case 1:
>     T minus = (T)((UT)X - (UT)Y);
>     SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
> --
> 2.17.1
>
  
Li, Pan2 Oct. 23, 2024, 6:55 a.m. UTC | #2
Nit: for title, should be SAT_SUB(x, 1) instead of SAT_SUB(x - 1)?

Pan

-----Original Message-----
From: Andrew Pinski <pinskia@gmail.com> 
Sent: Wednesday, October 23, 2024 1:46 PM
To: Li Xu <xuli1@eswincomputing.com>
Cc: gcc-patches@gcc.gnu.org; kito.cheng@gmail.com; richard.guenther@gmail.com; Tamar.Christina@arm.com; juzhe.zhong@rivai.ai; Li, Pan2 <pan2.li@intel.com>; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com
Subject: Re: [PATCH 1/2] Match: Simplify unsigned scalar sat_sub(x -1) to (x - x != 0)

On Tue, Oct 22, 2024 at 10:39 PM Li Xu <xuli1@eswincomputing.com> wrote:
>
> From: xuli <xuli1@eswincomputing.com>
>
> When the imm operand op1=1 in the unsigned scalar sat_sub form2 below,
> we can simplify (x != 0 ? x + max : 0) to (x - x != 0), thereby eliminating
> a branch instruction.
>
> Form2:
> T __attribute__((noinline))             \
> sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
> {                                       \
>   return x >= (T)IMM ? x - (T)IMM : 0;  \
> }
>
> Take below form 2 as example:
> DEF_SAT_U_SUB_IMM_FMT_2(uint8_t, 1)
>
> Before this patch:
> __attribute__((noinline))
> uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
> {
>   uint8_t _1;
>   uint8_t _3;
>
>   <bb 2> [local count: 1073741824]:
>   if (x_2(D) != 0)
>     goto <bb 3>; [50.00%]
>   else
>     goto <bb 4>; [50.00%]
>
>   <bb 3> [local count: 536870912]:
>   _3 = x_2(D) + 255;
>
>   <bb 4> [local count: 1073741824]:
>   # _1 = PHI <x_2(D)(2), _3(3)>
>   return _1;
>
> }
>
> Assembly code:
> sat_u_sub_imm1_uint8_t_fmt_2:
>         beq     a0,zero,.L2
>         addiw   a0,a0,-1
>         andi    a0,a0,0xff
> .L2:
>         ret
>
> After this patch:
> __attribute__((noinline))
> uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
> {
>   _Bool _1;
>   unsigned char _2;
>   uint8_t _4;
>
>   <bb 2> [local count: 1073741824]:
>   _1 = x_3(D) != 0;
>   _2 = (unsigned char) _1;
>   _4 = x_3(D) - _2;
>   return _4;
>
> }
>
> Assembly code:
> sat_u_sub_imm1_uint8_t_fmt_2:
>         snez    a5,a0
>         subw    a0,a0,a5
>         andi    a0,a0,0xff
>         ret
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
>
> Signed-off-by: Li Xu <xuli1@eswincomputing.com>
> gcc/ChangeLog:
>
>         * match.pd: Simplify (x != 0 ? x + max : 0) to (x - x != 0).
> ---
>  gcc/match.pd | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0455dfa6993..9dcb9555cc0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3383,6 +3383,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    }
>    (if (wi::eq_p (sum, wi::uhwi (0, precision)))))))
>
> +/* The boundary condition for case 10: IMM = 1:
> +   SAT_U_SUB = X >= IMM ? (X - IMM) : 0.
> +   simplify (X != 0 ? X + max : 0) to (X - X != 0).  */
> +(simplify
> + (cond (ne @0 integer_zerop) (plus @0 INTEGER_CST@1) integer_zerop)

I think you could use integer_all_onesp@1 here rather than the check
you are doing on the INTEGER_CST.
Then you only need the INTEGRAL_TYPE_P check.

You should also add a few testcases for this. See
`gcc.dg/tree-ssa/phiopt*.c` for similar testcases.

Thanks,
Andrew Pinski


> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +     && types_match (type, @0))
> +  (with
> +   {
> +      unsigned precision = TYPE_PRECISION (type);
> +      wide_int max = wi::mask (precision, false, precision);
> +      wide_int c1 = wi::to_wide (@1);
> +   }
> +   (if (wi::eq_p (c1, max))
> +   (minus @0 (convert (ne @0 { build_zero_cst (type); })))))))
> +
>  /* Signed saturation sub, case 1:
>     T minus = (T)((UT)X - (UT)Y);
>     SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
> --
> 2.17.1
>
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 0455dfa6993..9dcb9555cc0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3383,6 +3383,22 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (wi::eq_p (sum, wi::uhwi (0, precision)))))))
 
+/* The boundary condition for case 10: IMM = 1:
+   SAT_U_SUB = X >= IMM ? (X - IMM) : 0.
+   simplify (X != 0 ? X + max : 0) to (X - X != 0).  */
+(simplify
+ (cond (ne @0 integer_zerop) (plus @0 INTEGER_CST@1) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+     && types_match (type, @0))
+  (with
+   {
+      unsigned precision = TYPE_PRECISION (type);
+      wide_int max = wi::mask (precision, false, precision);
+      wide_int c1 = wi::to_wide (@1);
+   }
+   (if (wi::eq_p (c1, max))
+   (minus @0 (convert (ne @0 { build_zero_cst (type); })))))))
+
 /* Signed saturation sub, case 1:
    T minus = (T)((UT)X - (UT)Y);
    SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;