[v2] xtensa: Prepare the transition from Reload to LRA

Message ID b1609279-d845-30a1-1ec6-ed0ca6c60a68@yahoo.co.jp
State New
Headers
Series [v2] xtensa: Prepare the transition from Reload to LRA |

Commit Message

Takayuki 'January June' Suwa Oct. 18, 2022, 2:57 a.m. UTC
  On 2022/10/16 14:03, Max Filippov wrote:
> Hi Suwa-san,
Hi!

> This change results in a few new regressions in the following tests caused by ICE even when running without -mlra option:
> 
> +FAIL: gcc.c-torture/execute/pr92904.c   -O1  (internal compiler error: in extract_insn, at recog.cc:2791)
> 
> The backtraces look like this in all of them:
> 
> gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: error:
> unrecognizable insn:
> (insn 10501 7 10502 2 (set (reg:SI 5913)
>        (const_int 1431655765 [0x55555555]))
> "gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c":239:9 -1
>     (nil))
> during RTL pass: subreg3
> gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: internal compiler error: in extract_insn, at recog.cc:2791

"expand" pass generates the below from referencing to the struct:

  ;; MEM <long long int> [(union Y *)&u] = 6148914691236517205;
  (set (reg:DI X) (mem:DI (symbol_ref:SI ("*.LC_u"))))

and then "fwprop1" transforms it by dereference:

  (set (reg:DI X) (const_int 0x5555555555555555))

finally "subreg3" (but not "split1") splits it into the two that don't satisfy the constraint:

  (set (reg:SI X0) (const_int 0x55555555))
  (set (reg:SI X1) (const_int 0x55555555))

> There's also the following runtime failures, but only on call0 configuration:
> 
> +FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test

both assembler outputs with and without this patch are identical on my side, but perhaps it can break runtime init and/or libraries due to my silly mistake:

-+      if (HARD_REGISTER_P (x)
++      if (! HARD_REGISTER_P (x)

===
This patch provides the first step in the transition from Reload to LRA
in Xtensa.

gcc/ChangeLog:

	* config/xtensa/xtensa-proto.h
	(xtensa_split1_finished_p, xtensa_split_DI_reg_imm): New prototypes.
	* config/xtensa/xtensa.cc
	(xtensa_split1_finished_p, xtensa_split_DI_reg_imm, xtensa_lra_p):
	New functions.
	(TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p.
	(xt_true_regnum): Rework.
	* gcc/config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS):
	Rename from CALL_USED_REGISTERS, and remove what correspond to
	FIXED_REGISTERS.
	* gcc/config/xtensa/constraints.md (Y):
	Use !xtensa_split1_finished_p() instead of can_create_pseudo_p().
	* gcc/config/xtensa/predicates.md (move_operand): Ditto.
	* gcc/config/xtensa/xtensa.md: Add two new split patterns:
	  - splits DImode immediate load into two SImode ones
	  - puts out-of-constraint SImode constants into the constant pool
	* gcc/config/xtensa/xtensa.opt (-mlra): New target-specific option
	for testing purpose.
---
 gcc/config/xtensa/constraints.md  |  2 +-
 gcc/config/xtensa/predicates.md   |  2 +-
 gcc/config/xtensa/xtensa-protos.h |  2 +
 gcc/config/xtensa/xtensa.cc       | 69 ++++++++++++++++++++++++++-----
 gcc/config/xtensa/xtensa.h        |  6 +--
 gcc/config/xtensa/xtensa.md       | 36 ++++++++++++----
 gcc/config/xtensa/xtensa.opt      |  4 ++
 7 files changed, 98 insertions(+), 23 deletions(-)
  

Comments

Max Filippov Oct. 18, 2022, 3:14 a.m. UTC | #1
On Mon, Oct 17, 2022 at 7:57 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
> On 2022/10/16 14:03, Max Filippov wrote:
> > There's also the following runtime failures, but only on call0 configuration:
> >
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
>
> both assembler outputs with and without this patch are identical on my side

Interesting. In -O1 test I see the following difference that is going to affect
the return value of the corresponding functions:

--- gcc-13-3308-gb4a4c6382b14-call0-le/20010122-1.s      2022-10-17
20:07:32.390363204 -0700
+++ gcc-13-3309-g851636ecd015-call0-le/20010122-1.s      2022-10-17
20:06:36.613785546 -0700
@@ -143,13 +143,10 @@
test2:
       addi    sp, sp, -16
       s32i.n  a0, sp, 12
-       s32i.n  a12, sp, 8
-       mov.n   a12, a0
       l32r    a2, .LC6
       callx0  a2
-       mov.n   a2, a12
+       mov.n   a2, a0
       l32i.n  a0, sp, 12
-       l32i.n  a12, sp, 8
       addi    sp, sp, 16
       ret.n
       .size   test2, .-test2
@@ -161,13 +158,10 @@
test3:
       addi    sp, sp, -16
       s32i.n  a0, sp, 12
-       s32i.n  a12, sp, 8
-       mov.n   a12, a0
       l32r    a2, .LC7
       callx0  a2
-       mov.n   a2, a12
+       mov.n   a2, a0
       l32i.n  a0, sp, 12
-       l32i.n  a12, sp, 8
       addi    sp, sp, 16
       ret.n
       .size   test3, .-test3
@@ -258,14 +252,11 @@
test8:
       addi    sp, sp, -16
       s32i.n  a0, sp, 12
-       s32i.n  a12, sp, 8
-       mov.n   a12, a0
       l32r    a2, .LC12
       callx0  a2
       l32r    a2, .LC13
-       s32i.n  a12, a2, 0
+       s32i.n  a0, a2, 0
       l32i.n  a0, sp, 12
-       l32i.n  a12, sp, 8
       addi    sp, sp, 16
       ret.n
       .size   test8, .-test8
  
Max Filippov Oct. 18, 2022, 12:16 p.m. UTC | #2
Hi Suwa-san,

v2 fixes the regressions caused by ICEs, but not the runtime failures.

On Mon, Oct 17, 2022 at 8:14 PM Max Filippov <jcmvbkbc@gmail.com> wrote:
> On Mon, Oct 17, 2022 at 7:57 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
> > On 2022/10/16 14:03, Max Filippov wrote:
> > > There's also the following runtime failures, but only on call0 configuration:
> > >
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
> >
> > both assembler outputs with and without this patch are identical on my side
>
> Interesting. In -O1 test I see the following difference that is going to affect
> the return value of the corresponding functions:
>
> --- gcc-13-3308-gb4a4c6382b14-call0-le/20010122-1.s      2022-10-17
> 20:07:32.390363204 -0700
> +++ gcc-13-3309-g851636ecd015-call0-le/20010122-1.s      2022-10-17
> 20:06:36.613785546 -0700
> @@ -143,13 +143,10 @@
> test2:
>        addi    sp, sp, -16
>        s32i.n  a0, sp, 12
> -       s32i.n  a12, sp, 8
> -       mov.n   a12, a0
>        l32r    a2, .LC6
>        callx0  a2
> -       mov.n   a2, a12
> +       mov.n   a2, a0
>        l32i.n  a0, sp, 12
> -       l32i.n  a12, sp, 8
>        addi    sp, sp, 16
>        ret.n
>        .size   test2, .-test2
> @@ -161,13 +158,10 @@
> test3:
>        addi    sp, sp, -16
>        s32i.n  a0, sp, 12
> -       s32i.n  a12, sp, 8
> -       mov.n   a12, a0
>        l32r    a2, .LC7
>        callx0  a2
> -       mov.n   a2, a12
> +       mov.n   a2, a0
>        l32i.n  a0, sp, 12
> -       l32i.n  a12, sp, 8
>        addi    sp, sp, 16
>        ret.n
>        .size   test3, .-test3
> @@ -258,14 +252,11 @@
> test8:
>        addi    sp, sp, -16
>        s32i.n  a0, sp, 12
> -       s32i.n  a12, sp, 8
> -       mov.n   a12, a0
>        l32r    a2, .LC12
>        callx0  a2
>        l32r    a2, .LC13
> -       s32i.n  a12, a2, 0
> +       s32i.n  a0, a2, 0
>        l32i.n  a0, sp, 12
> -       l32i.n  a12, sp, 8
>        addi    sp, sp, 16
>        ret.n
>        .size   test8, .-test8

I've noticed that this is related to the following hunk:

-#define CALL_USED_REGISTERS                                            \
+#define CALL_REALLY_USED_REGISTERS                                     \
 {                                                                      \
-  1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
-  1, 1, 1,                                                             \
+  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
+  0, 0, 1,                                                             \
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                      \
   1,                                                                   \
 }

And the following change on top of v2 fixes this regression for me:

diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 6b60e5960625..897f87f735da 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -244,7 +244,7 @@ along with GCC; see the file COPYING3.  If not see

#define CALL_REALLY_USED_REGISTERS                                     \
{                                                                      \
-  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
+  1, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
  0, 0, 1,                                                             \
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                      \
  1,                                                                   \
  

Patch

diff --git a/gcc/config/xtensa/constraints.md b/gcc/config/xtensa/constraints.md
index e4c314b267c..cd200d6d15a 100644
--- a/gcc/config/xtensa/constraints.md
+++ b/gcc/config/xtensa/constraints.md
@@ -121,7 +121,7 @@ 
  (ior (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
 	   (match_test "TARGET_AUTO_LITPOOLS"))
       (and (match_code "const_int")
-	   (match_test "can_create_pseudo_p ()"))))
+	   (match_test "! xtensa_split1_finished_p ()"))))
 
 ;; Memory constraints.  Do not use define_memory_constraint here.  Doing so
 ;; causes reload to force some constants into the constant pool, but since
diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index 0590c0f81a9..c11e8634dbe 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -149,7 +149,7 @@ 
      (ior (and (match_code "const_int")
 	       (match_test "(GET_MODE_CLASS (mode) == MODE_INT
 			     && xtensa_simm12b (INTVAL (op)))
-			    || can_create_pseudo_p ()"))
+			    || ! xtensa_split1_finished_p ()"))
 	  (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
 	       (match_test "(TARGET_CONST16 || TARGET_AUTO_LITPOOLS)
 			    && CONSTANT_P (op)
diff --git a/gcc/config/xtensa/xtensa-protos.h b/gcc/config/xtensa/xtensa-protos.h
index 459e2aac9fc..bc75ad9698a 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -58,6 +58,8 @@  extern char *xtensa_emit_call (int, rtx *);
 extern char *xtensa_emit_sibcall (int, rtx *);
 extern bool xtensa_tls_referenced_p (rtx);
 extern enum rtx_code xtensa_shlrd_which_direction (rtx, rtx);
+extern bool xtensa_split1_finished_p (void);
+extern void xtensa_split_DI_reg_imm (rtx *);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, int);
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 828c7642b7c..950eb5a59be 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -56,6 +56,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "hw-doloop.h"
 #include "rtl-iter.h"
 #include "insn-attr.h"
+#include "tree-pass.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -199,6 +200,7 @@  static void xtensa_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
 				    HOST_WIDE_INT delta,
 				    HOST_WIDE_INT vcall_offset,
 				    tree function);
+static bool xtensa_lra_p (void);
 
 static rtx xtensa_delegitimize_address (rtx);
 
@@ -295,7 +297,7 @@  static rtx xtensa_delegitimize_address (rtx);
 #define TARGET_CANNOT_FORCE_CONST_MEM xtensa_cannot_force_const_mem
 
 #undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P xtensa_lra_p
 
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P	xtensa_legitimate_address_p
@@ -492,21 +494,30 @@  xtensa_mask_immediate (HOST_WIDE_INT v)
 int
 xt_true_regnum (rtx x)
 {
-  if (GET_CODE (x) == REG)
+  if (REG_P (x))
     {
-      if (reg_renumber
-	  && REGNO (x) >= FIRST_PSEUDO_REGISTER
-	  && reg_renumber[REGNO (x)] >= 0)
+      if (! HARD_REGISTER_P (x)
+	  && reg_renumber
+	  && (lra_in_progress || reg_renumber[REGNO (x)] >= 0))
 	return reg_renumber[REGNO (x)];
       return REGNO (x);
     }
-  if (GET_CODE (x) == SUBREG)
+  if (SUBREG_P (x))
     {
       int base = xt_true_regnum (SUBREG_REG (x));
-      if (base >= 0 && base < FIRST_PSEUDO_REGISTER)
-        return base + subreg_regno_offset (REGNO (SUBREG_REG (x)),
-                                           GET_MODE (SUBREG_REG (x)),
-                                           SUBREG_BYTE (x), GET_MODE (x));
+
+      if (base >= 0
+	  && HARD_REGISTER_NUM_P (base))
+	{
+	  struct subreg_info info;
+
+	  subreg_get_info (lra_in_progress
+			   ? (unsigned) base : REGNO (SUBREG_REG (x)),
+			   GET_MODE (SUBREG_REG (x)),
+			   SUBREG_BYTE (x), GET_MODE (x), &info);
+	  if (info.representable_p)
+	    return base + info.offset;
+	}
     }
   return -1;
 }
@@ -2477,6 +2488,36 @@  xtensa_shlrd_which_direction (rtx op0, rtx op1)
 }
 
 
+/* Return true after "split1" pass has been finished.  */
+
+bool
+xtensa_split1_finished_p (void)
+{
+  return cfun && (cfun->curr_properties & PROP_rtl_split_insns);
+}
+
+
+/* Split a DImode pair of reg (operand[0]) and const_int (operand[1]) into
+   two SImode pairs, the low-part (operands[0] and [1]) and the high-part
+   (operands[2] and [3]).  */
+
+void
+xtensa_split_DI_reg_imm (rtx *operands)
+{
+  rtx lowpart, highpart;
+
+  if (WORDS_BIG_ENDIAN)
+    split_double (operands[1], &highpart, &lowpart);
+  else
+    split_double (operands[1], &lowpart, &highpart);
+
+  operands[3] = highpart;
+  operands[2] = gen_highpart (SImode, operands[0]);
+  operands[1] = lowpart;
+  operands[0] = gen_lowpart (SImode, operands[0]);
+}
+
+
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
 static bool
@@ -5119,4 +5160,12 @@  xtensa_delegitimize_address (rtx op)
   return op;
 }
 
+/* Implement TARGET_LRA_P.  */
+
+static bool
+xtensa_lra_p (void)
+{
+  return TARGET_LRA;
+}
+
 #include "gt-xtensa.h"
diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 16e3d55e896..6b60e596062 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -242,10 +242,10 @@  along with GCC; see the file COPYING3.  If not see
 
    Proper values are computed in TARGET_CONDITIONAL_REGISTER_USAGE.  */
 
-#define CALL_USED_REGISTERS						\
+#define CALL_REALLY_USED_REGISTERS					\
 {									\
-  1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,			\
-  1, 1, 1,								\
+  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,			\
+  0, 0, 1,								\
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,			\
   1,									\
 }
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 608110c20bc..2e7f76ada5c 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -940,14 +940,9 @@ 
 	 because of offering further optimization opportunities.  */
       if (register_operand (operands[0], DImode))
 	{
-	  rtx lowpart, highpart;
-
-	  if (TARGET_BIG_ENDIAN)
-	    split_double (operands[1], &highpart, &lowpart);
-	  else
-	    split_double (operands[1], &lowpart, &highpart);
-	  emit_insn (gen_movsi (gen_lowpart (SImode, operands[0]), lowpart));
-	  emit_insn (gen_movsi (gen_highpart (SImode, operands[0]), highpart));
+	  xtensa_split_DI_reg_imm (operands);
+	  emit_move_insn (operands[0], operands[1]);
+	  emit_move_insn (operands[2], operands[3]);
 	  DONE;
 	}
 
@@ -981,6 +976,19 @@ 
     }
 })
 
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+	(match_operand:DI 1 "const_int_operand"))]
+  "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+   && ! xtensa_split1_finished_p ()"
+  [(set (match_dup 0)
+	(match_dup 1))
+   (set (match_dup 2)
+	(match_dup 3))]
+{
+  xtensa_split_DI_reg_imm (operands);
+})
+
 ;; 32-bit Integer moves
 
 (define_expand "movsi"
@@ -1017,6 +1025,18 @@ 
    (set_attr "mode"	"SI")
    (set_attr "length"	"2,2,2,2,2,2,3,3,3,3,6,3,3,3,3,3")])
 
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+	(match_operand:SI 1 "const_int_operand"))]
+  "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+   && ! xtensa_split1_finished_p ()
+   && ! xtensa_simm12b (INTVAL (operands[1]))"
+  [(set (match_dup 0)
+	(match_dup 1))]
+{
+  operands[1] = force_const_mem (SImode, operands[1]);
+})
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
 	(match_operand:SI 1 "constantpool_operand"))]
diff --git a/gcc/config/xtensa/xtensa.opt b/gcc/config/xtensa/xtensa.opt
index 08338e39060..00d2db4eae1 100644
--- a/gcc/config/xtensa/xtensa.opt
+++ b/gcc/config/xtensa/xtensa.opt
@@ -34,6 +34,10 @@  mextra-l32r-costs=
 Target RejectNegative Joined UInteger Var(xtensa_extra_l32r_costs) Init(0)
 Set extra memory access cost for L32R instruction, in clock-cycle units.
 
+mlra
+Target Mask(LRA)
+Use LRA instead of reload (transitional).
+
 mtarget-align
 Target
 Automatically align branch targets to reduce branch penalties.