[v4,3/4] tree-optimization/116024 - simplify C1-X cmp C2 for wrapping signed types

Message ID 20240930080408.2501963-4-artemiy@synopsys.com
State New
Headers
Series tree-optimization/116024 - match.pd: add 4 int-compare simplifications |

Checks

Context Check Description
rivoscibot/toolchain-ci-rivos-lint success Lint passed
rivoscibot/toolchain-ci-rivos-apply-patch success Patch applied
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gc-lp64d-non-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc-lp64d-non-multilib success Build passed
rivoscibot/toolchain-ci-rivos-test success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed

Commit Message

Artemiy Volkov Sept. 30, 2024, 8:04 a.m. UTC
  Implement a match.pd transformation inverting the sign of X in
C1 - X cmp C2, where C1 and C2 are integer constants and X is
of a wrapping signed type, by observing that:

(a) If cmp is == or !=, simply move X and C2 to opposite sides of
the comparison to arrive at X cmp C1 - C2.

(b) If cmp is <:
	- C1 - X < C2 means that C1 - X spans the values of -INF,
	  -INF + 1, ..., C2 - 1;
        - Therefore, X is one of C1 - -INF, C1 - (-INF + 1), ...,
	  C1 - C2 + 1;
	- Subtracting (C1 + 1), X - (C1 + 1) is one of - (-INF) - 1,
          - (-INF) - 2, ..., -C2;
        - Using the fact that - (-INF) - 1 is +INF, derive that
          X - (C1 + 1) spans the values +INF, +INF - 1, ..., -C2;
        - Thus, the original expression can be simplified to
          X - (C1 + 1) > -C2 - 1.

(c) Similarly, C1 - X <= C2 is equivalent to X - (C1 + 1) >= -C2 - 1.

(d) The >= and > cases are negations of (b) and (c), respectively.

(e) In all cases, the expression -C2 - 1 can be shortened to
bit_not (C2).

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

10 - (int)f() >= 20;

now compiles to

addi    a0,a0,-11
slti    a0,a0,-20

instead of

li      a5,10
sub     a0,a5,a0
slti    t0,a0,20
xori    a0,t0,1

on 32-bit RISC-V when compiled with -fwrapv.

Additional examples can be found in the newly added test file.  This
patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
and additionally regtested on riscv32.

gcc/ChangeLog:

        PR tree-optimization/116024
        * match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr116024-1-fwrapv.c: New test.

Signed-off-by: Artemiy Volkov <artemiy@synopsys.com>
---
 gcc/match.pd                                  | 21 +++++-
 .../gcc.dg/tree-ssa/pr116024-1-fwrapv.c       | 65 +++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 46195a603d0..3b973887470 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9041,7 +9041,26 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	 (cmp (plus @1 (minus @2 @0)) @2))
        (if (cmp == LT_EXPR || cmp == GE_EXPR)
 	 (cmp (plus @1 (minus @2
-		   (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2)))))))
+		   (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2)))
+/* For wrapping signed types (-fwrapv), transform like so (using < as example):
+	 C1 - X < C2
+      ==>  C1 - X = { -INF, -INF + 1, ..., C2 - 1 }
+      ==>  X = { C1 - (-INF), C1 - (-INF + 1), ..., C1 - C2 + 1 }
+      ==>  X - (C1 + 1) = { - (-INF) - 1, - (-INF) - 2, ..., -C2 }
+      ==>  X - (C1 + 1) = { +INF, +INF - 1, ..., -C2 }
+      ==>  X - (C1 + 1) > -C2 - 1
+      ==>  X - (C1 + 1) > bit_not (C2)
+
+      Similarly,
+	 C1 - X <= C2 ==> X - (C1 + 1) >= bit_not (C2);
+	 C1 - X >= C2 ==> X - (C1 + 1) <= bit_not (C2);
+	 C1 - X > C2 ==> X - (C1 + 1) < bit_not (C2).  */
+   (if (!TYPE_UNSIGNED (TREE_TYPE (@1))
+	&& TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1)))
+     (if (cmp == EQ_EXPR || cmp == NE_EXPR)
+	(cmp @1 (minus @0 @2))
+     (rcmp (minus @1 (plus @0 { build_one_cst (TREE_TYPE (@1)); }))
+	     (bit_not @2))))))))
 
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
new file mode 100644
index 00000000000..24e1abef774
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
@@ -0,0 +1,65 @@ 
+/* PR tree-optimization/116024 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-details -fwrapv" } */
+
+#include <stdint.h>
+
+uint32_t f(void);
+
+int32_t i2(void)
+{
+  int32_t l = 10 - (int32_t)f();
+  return l <= 20; // f() - 11 >= -21
+}
+
+int32_t i2a(void)
+{
+  int32_t l = 10 - (int32_t)f();
+  return l < 30; // f() - 11 > -31
+}
+
+int32_t i2b(void)
+{
+  int32_t l = 200 - (int32_t)f();
+  return l <= 100; // f() - 201 >= -101
+}
+
+int32_t i2c(void)
+{
+  int32_t l = 300 - (int32_t)f();
+  return l < 100; // f() - 301 > -101
+}
+
+int32_t i2d(void)
+{
+  int32_t l = 1000 - (int32_t)f();
+  return l >= 2000; // f() - 1001 <= -2001
+}
+
+int32_t i2e(void)
+{
+  int32_t l = 1000 - (int32_t)f();
+  return l > 3000; // f() - 1001 < -3001
+}
+
+int32_t i2f(void)
+{
+  int32_t l = 20000 - (int32_t)f();
+  return l >= 10000; // f() - 20001 <= -10001
+}
+
+int32_t i2g(void)
+{
+  int32_t l = 30000 - (int32_t)f();
+  return l > 10000; // f() - 30001 < -10001
+}
+
+/* { dg-final { scan-tree-dump-times "Removing dead stmt:.*?- _" 8 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -11.*\n.*>= -21" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -11.*\n.*>= -30" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -201.*\n.*>= -101" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -301.*\n.*>= -100" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -1001.*\n.*< -2000" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -1001.*\n.*< -3001" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -20001.*\n.*< -10000" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -30001.*\n.*< -10001" 1 "forwprop1" } } */