match: Delay folding of 1/x into `(x+1u)<2u?x:0` until late [PR113301]

Message ID 20240111042343.1172849-1-quic_apinski@quicinc.com
State Committed
Commit 7f56a90269b393fcc55ef08e0990fafb7b1c24b4
Headers
Series match: Delay folding of 1/x into `(x+1u)<2u?x:0` until late [PR113301] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed

Commit Message

Andrew Pinski Jan. 11, 2024, 4:23 a.m. UTC
  Since currently ranger does not work with the complexity of COND_EXPR in
some cases so delaying the simplification of `1/x` for signed types
help code generation.
tree-ssa/divide-8.c is a new testcase where this can help.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR tree-optimization/113301

gcc/ChangeLog:

	* match.pd (`1/x`): Delay signed case until late.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/divide-8.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
---
 gcc/match.pd                             | 12 +++++++-----
 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c | 13 +++++++++++++
 2 files changed, 20 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
  

Comments

Richard Biener Jan. 11, 2024, 10:05 a.m. UTC | #1
On Thu, Jan 11, 2024 at 5:24 AM Andrew Pinski <quic_apinski@quicinc.com> wrote:
>
> Since currently ranger does not work with the complexity of COND_EXPR in
> some cases so delaying the simplification of `1/x` for signed types
> help code generation.
> tree-ssa/divide-8.c is a new testcase where this can help.

Huh.  It looks like this pattern was added for PR95424 (and not say,
for vectorization).  IMO the COND_EXPR isn't a good canonical
representation and the optimization should have been done
during either instruction selection or RTL expansion.

Of course ranger should be better in handling this as well and
we should be able to canonicalize (unsigned)(x+1)<=2 ? x : 0
to 1/x.

The patch works towards this, keeping it as possible vectorization
enabler.

Thus OK.

Thanks,
Richard.

> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
>         PR tree-optimization/113301
>
> gcc/ChangeLog:
>
>         * match.pd (`1/x`): Delay signed case until late.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/divide-8.c: New test.
>
> Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
> ---
>  gcc/match.pd                             | 12 +++++++-----
>  gcc/testsuite/gcc.dg/tree-ssa/divide-8.c | 13 +++++++++++++
>  2 files changed, 20 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d75babd86c2..81a389057cf 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -560,7 +560,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X.
>     But not for 1 / 0 so that we can get proper warnings and errors,
>     and not for 1-bit integers as they are edge cases better handled
> -   elsewhere.  */
> +   elsewhere.  Delay the conversion of the signed division until late
> +   because `1 / X` is simplier to handle than the resulting COND_EXPR.  */
>  (simplify
>   (trunc_div integer_onep@0 @1)
>   (if (INTEGRAL_TYPE_P (type)
> @@ -569,10 +570,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        && (!flag_non_call_exceptions || tree_expr_nonzero_p (@1)))
>    (if (TYPE_UNSIGNED (type))
>     (convert (eq:boolean_type_node @1 { build_one_cst (type); }))
> -   (with { tree utype = unsigned_type_for (type); }
> -    (cond (le (plus (convert:utype @1) { build_one_cst (utype); })
> -             { build_int_cst (utype, 2); })
> -     @1 { build_zero_cst (type); })))))
> +   (if (!canonicalize_math_p ())
> +    (with { tree utype = unsigned_type_for (type); }
> +     (cond (le (plus (convert:utype @1) { build_one_cst (utype); })
> +               { build_int_cst (utype, 2); })
> +      @1 { build_zero_cst (type); }))))))
>
>  /* Combine two successive divisions.  Note that combining ceil_div
>     and floor_div is trickier and combining round_div even more so.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
> new file mode 100644
> index 00000000000..b8149088177
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized" } */
> +/* PR tree-optimization/113301 */
> +/* We should figure out that 1/(x+1) range is [-1,1]
> +   and then /2 is always 0. */
> +
> +void link_error(void);
> +void func(int x){
> +    int c=(1/(x+1))/2;
> +    if (c != 0)
> +      link_error();
> +}
> +/* { dg-final { scan-tree-dump-not "link_error " "optimized" } } */
> --
> 2.39.3
>
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index d75babd86c2..81a389057cf 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -560,7 +560,8 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X.
    But not for 1 / 0 so that we can get proper warnings and errors,
    and not for 1-bit integers as they are edge cases better handled
-   elsewhere.  */
+   elsewhere.  Delay the conversion of the signed division until late
+   because `1 / X` is simplier to handle than the resulting COND_EXPR.  */
 (simplify
  (trunc_div integer_onep@0 @1)
  (if (INTEGRAL_TYPE_P (type)
@@ -569,10 +570,11 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       && (!flag_non_call_exceptions || tree_expr_nonzero_p (@1)))
   (if (TYPE_UNSIGNED (type))
    (convert (eq:boolean_type_node @1 { build_one_cst (type); }))
-   (with { tree utype = unsigned_type_for (type); }
-    (cond (le (plus (convert:utype @1) { build_one_cst (utype); })
-	      { build_int_cst (utype, 2); })
-     @1 { build_zero_cst (type); })))))
+   (if (!canonicalize_math_p ())
+    (with { tree utype = unsigned_type_for (type); }
+     (cond (le (plus (convert:utype @1) { build_one_cst (utype); })
+		{ build_int_cst (utype, 2); })
+      @1 { build_zero_cst (type); }))))))
 
 /* Combine two successive divisions.  Note that combining ceil_div
    and floor_div is trickier and combining round_div even more so.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
new file mode 100644
index 00000000000..b8149088177
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+/* PR tree-optimization/113301 */
+/* We should figure out that 1/(x+1) range is [-1,1]
+   and then /2 is always 0. */
+
+void link_error(void);
+void func(int x){
+    int c=(1/(x+1))/2;
+    if (c != 0)
+      link_error();
+}
+/* { dg-final { scan-tree-dump-not "link_error " "optimized" } } */