[4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants

Message ID fbe8f1b2-1b4c-1e2f-4197-3c4a0407e971@yahoo.co.jp
State New
Headers
Series [1/4] xtensa: Improve shift operations more |

Commit Message

Takayuki 'January June' Suwa June 12, 2022, 6:41 a.m. UTC
  This patch offers several insn-and-split patterns for bitwise AND with
register and constant that cannot fit into a "MOVI Ax, simm12" instruction,
but can be represented as:

i.   1's least significant N bits and the others 0's (17 <= N <= 31)
ii.  1's most significant N bits and the others 0's (12 <= N <= 31)
iii. M 1's sequence of bits and trailing N 0's bits
	(1 <= M <= 16, 1 <= N <= 30)

And also offers shortcuts for conditional branch if each of the 
abovementioned
operations is (not) equal to zero.

gcc/ChangeLog:

	* config/xtensa/predicates.md (shifted_mask_operand):
	New predicate.
	* config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
	New insn-and-split pattern.
	(*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
	*masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
	*masktrue_const_shifted_mask): Ditto.
---
  gcc/config/xtensa/predicates.md |  11 +++
  gcc/config/xtensa/xtensa.md     | 165 ++++++++++++++++++++++++++++++++
  2 files changed, 176 insertions(+)
  

Comments

Max Filippov June 13, 2022, 3:49 a.m. UTC | #1
Hi Suwa-san,

On Sat, Jun 11, 2022 at 11:43 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> This patch offers several insn-and-split patterns for bitwise AND with
> register and constant that cannot fit into a "MOVI Ax, simm12" instruction,
> but can be represented as:
>
> i.   1's least significant N bits and the others 0's (17 <= N <= 31)
> ii.  1's most significant N bits and the others 0's (12 <= N <= 31)
> iii. M 1's sequence of bits and trailing N 0's bits
>         (1 <= M <= 16, 1 <= N <= 30)
>
> And also offers shortcuts for conditional branch if each of the
> abovementioned
> operations is (not) equal to zero.
>
> gcc/ChangeLog:
>
>         * config/xtensa/predicates.md (shifted_mask_operand):
>         New predicate.
>         * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
>         New insn-and-split pattern.
>         (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
>         *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
>         *masktrue_const_shifted_mask): Ditto.
> ---
>   gcc/config/xtensa/predicates.md |  11 +++
>   gcc/config/xtensa/xtensa.md     | 165 ++++++++++++++++++++++++++++++++
>   2 files changed, 176 insertions(+)

This change produces a bunch of regression test failures in big-endian
configuration:

FAIL: gcc.c-torture/execute/20020108-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20020108-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20020108-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20020108-1.c   -Os  execution test
FAIL: gcc.c-torture/execute/20020108-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/20040629-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20040629-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20040629-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20040629-1.c   -Os  execution test
FAIL: gcc.c-torture/execute/20040629-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/20040705-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20040705-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20040705-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20040705-1.c   -Os  execution test
FAIL: gcc.c-torture/execute/20040705-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/20040705-2.c   -O1  execution test
FAIL: gcc.c-torture/execute/20040705-2.c   -O2  execution test
FAIL: gcc.c-torture/execute/20040705-2.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20040705-2.c   -Os  execution test
FAIL: gcc.c-torture/execute/20040705-2.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/20040709-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20040709-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20040709-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.c-torture/execute/20040709-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20040709-1.c   -Os  execution test
FAIL: gcc.c-torture/execute/20040709-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/20040709-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/20180921-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20180921-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20180921-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.c-torture/execute/20180921-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20180921-1.c   -Os  execution test
FAIL: gcc.c-torture/execute/20180921-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/20180921-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/pr60454.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr60454.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr60454.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr60454.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr60454.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr61306-2.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr64718.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr65215-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr65215-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr65215-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr65215-1.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr65215-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr65215-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/pr65215-3.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr65215-4.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr65215-4.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr65215-4.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr65215-4.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr65215-4.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr65215-4.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/pr79388.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr79388.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr79388.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr79388.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr79388.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr93908.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr93908.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr93908.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr93908.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr93908.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr93908.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c   -O1  execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c   -O2  execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c   -Os  execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c   -O1  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c   -O2  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c   -O3 -g  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c   -Os  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/20050826-1.c execution test
FAIL: gcc.dg/sso/s3.c   -Wno-scalar-storage-order -O1 -fno-inline
output pattern test
FAIL: gcc.dg/sso/t2.c   -Wno-scalar-storage-order -O1 -fno-inline
output pattern test
FAIL: gcc.dg/sso/t2.c   -Wno-scalar-storage-order -O2  output pattern test
FAIL: gcc.dg/sso/t2.c   -Wno-scalar-storage-order -O3
-finline-functions  output pattern test
FAIL: gcc.dg/sso/t2.c   -Wno-scalar-storage-order -Os  output pattern test
FAIL: gcc.dg/sso/t2.c   -Wno-scalar-storage-order -Og -g  output pattern test
FAIL: gcc.dg/torture/pr30665-2.c   -O2  execution test
FAIL: gcc.dg/torture/pr30665-2.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr30665-2.c   -Os  execution test
FAIL: gcc.dg/torture/pr30665-2.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/pr30665-2.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/pr69714.c   -O1  execution test
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  execution test
FAIL: gcc.dg/tree-ssa/pr80803.c execution test
FAIL: gcc.dg/tree-ssa/pr80898-2.c execution test

E.g. for the test gcc.c-torture/execute/struct-ini-2.c
the following assembly code is generated now:

       .file   "struct-ini-2.c"
       .text
       .literal_position
       .literal .LC0, x
       .literal .LC2, 8192
       .literal .LC3, abort@PLT
       .literal .LC4, exit@PLT
       .align  4
       .global main
       .type   main, @function
main:
       entry   sp, 32
       l32r    a8, .LC0
       l16ui   a8, a8, 0
       l32r    a9, .LC2
       extui   a10, a8, 16, 4
       slli    a10, a10, 12
       extui   a9, a9, 0, 16
       beq     a10, a9, .L2
       l32r    a8, .LC3
       callx8  a8
.L2:
       movi    a9, 0xf0
       and     a9, a8, a9
       movi.n  a10, 0x30
       beq     a9, a10, .L3
       l32r    a8, .LC3
       callx8  a8
.L3:
       extui   a8, a8, 0, 4
       beqi    a8, 4, .L4
       l32r    a8, .LC3
       callx8  a8
.L4:
       movi.n  a10, 0
       l32r    a8, .LC4
       callx8  a8
       .size   main, .-main
       .global x
       .data
       .align  4
       .type   x, @object
       .size   x, 4
x:
       .byte   32
       .byte   52
       .zero   2
       .ident  "GCC: (GNU) 13.0.0 20220612 (experimental)"

and the following code was generated before this change:

       .file   "struct-ini-2.c"
       .text
       .literal_position
       .literal .LC0, x
       .literal .LC1, -4096
       .literal .LC2, 8192
       .literal .LC3, abort@PLT
       .literal .LC4, exit@PLT
       .align  4
       .global main
       .type   main, @function
main:
       entry   sp, 32
       l32r    a8, .LC0
       l16ui   a8, a8, 0
       l32r    a9, .LC2
       l32r    a10, .LC1
       and     a10, a8, a10
       extui   a9, a9, 0, 16
       beq     a10, a9, .L2
       l32r    a8, .LC3
       callx8  a8
.L2:
       movi    a9, 0xf0
       and     a9, a8, a9
       movi.n  a10, 0x30
       beq     a9, a10, .L3
       l32r    a8, .LC3
       callx8  a8
.L3:
       extui   a8, a8, 0, 4
       beqi    a8, 4, .L4
       l32r    a8, .LC3
       callx8  a8
.L4:
       movi.n  a10, 0
       l32r    a8, .LC4
       callx8  a8
       .size   main, .-main
       .global x
       .data
       .align  4
       .type   x, @object
       .size   x, 4
x:
       .byte   32
       .byte   52
       .zero   2
       .ident  "GCC: (GNU) 13.0.0 20220612 (experimental)"
  

Patch

diff --git a/gcc/config/xtensa/predicates.md 
b/gcc/config/xtensa/predicates.md
index bcc83ada0ae..24c77f343a0 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -52,6 +52,17 @@ 
  	    (match_test "xtensa_mask_immediate (INTVAL (op))"))
         (match_operand 0 "register_operand")))

+(define_predicate "shifted_mask_operand"
+  (and (match_code "const_int")
+       (match_test "!xtensa_simm12b (INTVAL (op))"))
+{
+  HOST_WIDE_INT mask = INTVAL (op);
+  int shift = ctz_hwi (mask);
+
+  return IN_RANGE (shift, 1, 31)
+	 && xtensa_mask_immediate ((uint32_t)mask >> shift);
+})
+
  (define_predicate "extui_fldsz_operand"
    (and (match_code "const_int")
         (match_test "IN_RANGE (INTVAL (op), 1, 16)")))
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 090a2939684..286a1d8c38e 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -645,6 +645,78 @@ 
     (set_attr "mode"	"SI")
     (set_attr "length"	"6")])

+(define_insn_and_split "*andsi3_const_pow2_minus_one"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+	(and:SI (match_operand:SI 1 "register_operand" "r")
+		(match_operand:SI 2 "const_int_operand" "i")))]
+  "IN_RANGE (exact_log2 (INTVAL (operands[2]) + 1), 17, 31)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(ashift:SI (match_dup 1)
+		   (match_dup 2)))
+   (set (match_dup 0)
+	(lshiftrt:SI (match_dup 0)
+		     (match_dup 2)))]
+{
+  operands[2] = GEN_INT (32 - floor_log2 (INTVAL (operands[2]) + 1));
+}
+  [(set_attr "type"	"arith")
+   (set_attr "mode"	"SI")
+   (set (attr "length")
+	(if_then_else (match_test "TARGET_DENSITY
+				   && INTVAL (operands[2]) == 0x7FFFFFFF")
+		      (const_int 5)
+		      (const_int 6)))])
+
+(define_insn_and_split "*andsi3_const_negative_pow2"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+	(and:SI (match_operand:SI 1 "register_operand" "r")
+		(match_operand:SI 2 "const_int_operand" "i")))]
+  "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 31)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(lshiftrt:SI (match_dup 1)
+		     (match_dup 2)))
+   (set (match_dup 0)
+	(ashift:SI (match_dup 0)
+		   (match_dup 2)))]
+{
+  operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2])));
+}
+  [(set_attr "type"	"arith")
+   (set_attr "mode"	"SI")
+   (set_attr "length"	"6")])
+
+(define_insn_and_split "*andsi3_const_shifted_mask"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+	(and:SI (match_operand:SI 1 "register_operand" "r")
+		(match_operand:SI 2 "shifted_mask_operand" "i")))]
+  ""
+  "#"
+  ""
+  [(set (match_dup 0)
+	(zero_extract:SI (match_dup 1)
+			 (match_dup 3)
+			 (match_dup 2)))
+   (set (match_dup 0)
+	(ashift:SI (match_dup 0)
+		   (match_dup 2)))]
+{
+  HOST_WIDE_INT mask = INTVAL (operands[2]);
+  int shift = ctz_hwi (mask);
+  operands[2] = GEN_INT (shift);
+  operands[3] = GEN_INT (floor_log2 (((uint32_t)mask >> shift) + 1));
+}
+  [(set_attr "type"	"arith")
+   (set_attr "mode"	"SI")
+   (set (attr "length")
+	(if_then_else (match_test "TARGET_DENSITY
+				   && ctz_hwi (INTVAL (operands[2])) == 1")
+		      (const_int 5)
+		      (const_int 6)))])
+
  (define_insn "iorsi3"
    [(set (match_operand:SI 0 "register_operand" "=a")
  	(ior:SI (match_operand:SI 1 "register_operand" "%r")
@@ -1648,6 +1720,99 @@ 
     (set_attr "mode"	"none")
     (set_attr "length"	"3")])

+(define_insn_and_split "*masktrue_const_pow2_minus_one"
+  [(set (pc)
+	(if_then_else (match_operator 4 "boolean_operator"
+			[(zero_extract:SI (match_operand:SI 1 "register_operand" "r")
+					  (match_operand:SI 2 "const_int_operand" "i")
+					  (const_int 0))
+			 (const_int 0)])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))
+   (clobber (match_scratch:SI 0 "=&a"))]
+  "IN_RANGE (INTVAL (operands[2]), 17, 31)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(ashift:SI (match_dup 1)
+		   (match_dup 2)))
+   (set (pc)
+	(if_then_else (match_op_dup 4
+			[(match_dup 0)
+			 (const_int 0)])
+		      (label_ref (match_dup 3))
+		      (pc)))]
+{
+  operands[2] = GEN_INT (32 - INTVAL (operands[2]));
+}
+  [(set_attr "type"	"jump")
+   (set_attr "mode"	"none")
+   (set (attr "length")
+	(if_then_else (match_test "TARGET_DENSITY
+				   && INTVAL (operands[2]) == 31")
+		      (const_int 5)
+		      (const_int 6)))])
+
+(define_insn_and_split "*masktrue_const_negative_pow2"
+  [(set (pc)
+	(if_then_else (match_operator 4 "boolean_operator"
+			[(and:SI (match_operand:SI 1 "register_operand" "r")
+				 (match_operand:SI 2 "const_int_operand" "i"))
+			 (const_int 0)])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))
+   (clobber (match_scratch:SI 0 "=&a"))]
+  "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 30)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(lshiftrt:SI (match_dup 1)
+		     (match_dup 2)))
+   (set (pc)
+	(if_then_else (match_op_dup 4
+			[(match_dup 0)
+			 (const_int 0)])
+		      (label_ref (match_dup 3))
+		      (pc)))]
+{
+  operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2])));
+}
+  [(set_attr "type"	"jump")
+   (set_attr "mode"	"none")
+   (set_attr "length"	"6")])
+
+(define_insn_and_split "*masktrue_const_shifted_mask"
+  [(set (pc)
+	(if_then_else (match_operator 4 "boolean_operator"
+			[(and:SI (match_operand:SI 1 "register_operand" "r")
+				 (match_operand:SI 2 "shifted_mask_operand" "i"))
+			 (const_int 0)])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))
+   (clobber (match_scratch:SI 0 "=&a"))]
+  ""
+  "#"
+  "reload_completed"
+  [(set (match_dup 0)
+	(zero_extract:SI (match_dup 1)
+			 (match_dup 5)
+			 (match_dup 2)))
+   (set (pc)
+	(if_then_else (match_op_dup 4
+			[(match_dup 0)
+			 (const_int 0)])
+		      (label_ref (match_dup 3))
+		      (pc)))]
+{
+  HOST_WIDE_INT mask = INTVAL (operands[2]);
+  int shift = ctz_hwi (mask);
+  operands[2] = GEN_INT (shift);
+  operands[5] = GEN_INT (floor_log2 (((uint32_t)mask >> shift) + 1));
+}
+  [(set_attr "type"	"jump")
+   (set_attr "mode"	"none")
+   (set_attr "length"	"6")])
+

  ;; Zero-overhead looping support.