From patchwork Tue Nov  1 06:19:15 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Lulu Cheng <chenglulu@loongson.cn>
X-Patchwork-Id: 59705
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 85EF038576A5
	for <patchwork@sourceware.org>; Tue,  1 Nov 2022 06:20:01 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from loongson.cn (mail.loongson.cn [114.242.206.163])
 by sourceware.org (Postfix) with ESMTP id 58672385840E
 for <gcc-patches@gcc.gnu.org>; Tue,  1 Nov 2022 06:19:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 58672385840E
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=loongson.cn
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn
Received: from loongson.cn (unknown [10.2.5.5])
 by gateway (Coremail) with SMTP id _____8BxHdlwumBjPLsDAA--.13384S3;
 Tue, 01 Nov 2022 14:19:29 +0800 (CST)
Received: from 5.5.5 (unknown [10.2.5.5])
 by localhost.localdomain (Coremail) with SMTP id
 AQAAf8DxbuBlumBj4FwJAA--.28510S2;
 Tue, 01 Nov 2022 14:19:21 +0800 (CST)
From: Lulu Cheng <chenglulu@loongson.cn>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH v2] LoongArch: Optimize immediate load.
Date: Tue,  1 Nov 2022 14:19:15 +0800
Message-Id: <20221101061915.1677615-1-chenglulu@loongson.cn>
X-Mailer: git-send-email 2.31.1
MIME-Version: 1.0
X-CM-TRANSID: AQAAf8DxbuBlumBj4FwJAA--.28510S2
X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/
X-Coremail-Antispam: 1Uk129KBjvAXoW3Zw4DCFyxCF4fXr1ftFykAFb_yoW8JFWDGo
 Wrua9rZw1rGryj9a1DKrnxXrW0vw4Duw4kZa9xAw45uF1kJr45Kry7Cw4fCFyayFZ5Ww43
 C3s2g3yDJ3yxJwn5n29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7ZEXasCq-sGcSsGvf
 J3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU0xBIdaVrnRJU
 UUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2IYs7xG6rWj6s
 0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0
 Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84
 ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1U
 M2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx1l5I8CrVACY4
 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8
 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64vIr41l4I
 8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AK
 xVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcV
 AFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8I
 cIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r
 1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UWHqcUUUUU=
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: xuchenghua@loongson.cn, Lulu Cheng <chenglulu@loongson.cn>, i@xen0n.name
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

v1 -> v2:
1. Change the code format.
2. Fix bugs in the code.

Both regression tests and spec2006 passed.

The problem mentioned in the link does not move the four immediate load
instructions out of the loop. It has been optimized. Now, as in the test case,
four immediate load instructions are generated outside the loop.
(https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html)


--------------------------------------------------------------------
Fixed an issue where the compiler would not take four 64-bit immediate
load instructions out of the loop.

gcc/ChangeLog:

	* config/loongarch/constraints.md (x): New constraint.
	* config/loongarch/loongarch.cc (struct loongarch_integer_op):
	Define a new member curr_value, that records the value of
	the number stored in the destination register immediately
	after the current instruction has run.
	(loongarch_build_integer): Adds a method to load the immediate
	32-bit to 63-bit field.
	(loongarch_move_integer): Same as above.
	* config/loongarch/loongarch.h (HWIT_UC_0xFFFFFFFF):
	(HI32_OPERAND): NEW macro.
	* config/loongarch/loongarch.md (load_hi32):New template.
	* config/loongarch/predicates.md (const_hi32_operand): Determines
	whether the value is an immediate number that has a value of only
	the higher 32 bits.
	(hi32_mask_operand): Immediately counts the mask of 32 to 61 bits.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/imm-load.c: New test.
---
 gcc/config/loongarch/constraints.md           |   7 +-
 gcc/config/loongarch/loongarch.cc             | 102 +++++++++++-------
 gcc/config/loongarch/loongarch.h              |   6 ++
 gcc/config/loongarch/loongarch.md             |  31 ++++++
 gcc/config/loongarch/predicates.md            |   8 ++
 gcc/testsuite/gcc.target/loongarch/imm-load.c |  25 +++++
 6 files changed, 142 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c

diff --git a/gcc/config/loongarch/constraints.md b/gcc/config/loongarch/constraints.md
index d3addd02c0a..43b55d74c5a 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -46,7 +46,7 @@
 ;; "u" "A signed 52bit constant and low 32-bit is zero (for logic instructions)"
 ;; "v" "A signed 64-bit constant and low 44-bit is zero (for logic instructions)."
 ;; "w" "Matches any valid memory."
-;; "x" <-----unused
+;; "x" "A signed 64-bit constant and low 32-bit is zero (for logic instructions)."
 ;; "y" <-----unused
 ;; "z" FCC_REGS
 ;; "A" <-----unused
@@ -139,6 +139,11 @@ (define_constraint "v"
   (and (match_code "const_int")
        (match_test "LU52I_OPERAND (ival)")))
 
+(define_constraint "x"
+  "A signed 64-bit constant and low 32-bit is zero (for logic instructions)."
+  (and (match_code "const_int")
+       (match_test "HI32_OPERAND (ival)")))
+
 (define_register_constraint "z" "FCC_REGS"
   "A floating-point condition code register.")
 
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 1a637431503..acde9efce40 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -140,6 +140,9 @@ struct loongarch_address_info
    METHOD_LU52I:
      Load 52-63 bit of the immediate number.
 
+   METHOD_LD_HI32:
+     Load 32-63 bit of the immediate number.
+
    METHOD_INSV:
      immediate like 0xfff00000fffffxxx
    */
@@ -148,20 +151,26 @@ enum loongarch_load_imm_method
   METHOD_NORMAL,
   METHOD_LU32I,
   METHOD_LU52I,
+  METHOD_LD_HI32,
   METHOD_INSV
 };
 
 struct loongarch_integer_op
 {
   enum rtx_code code;
+  /* Current Immediate Count The immediate count of the load instruction.  */
   HOST_WIDE_INT value;
+  /* Represent the result of the immediate count of the load instruction at
+     each step.  */
+  HOST_WIDE_INT curr_value;
   enum loongarch_load_imm_method method;
 };
 
 /* The largest number of operations needed to load an integer constant.
-   The worst accepted case for 64-bit constants is LU12I.W,LU32I.D,LU52I.D,ORI
-   or LU12I.W,LU32I.D,LU52I.D,ADDI.D DECL_ASSEMBLER_NAME.  */
-#define LARCH_MAX_INTEGER_OPS 4
+   The worst accepted case for 64-bit constants is LU12I.W,
+   LOAD_HI32(LU32I.D,LU52I.D),ORI or LU12I.W,LOAD_HI32(LU32I.D,LU52I.D),
+   ADDI.D DECL_ASSEMBLER_NAME.  */
+#define LARCH_MAX_INTEGER_OPS 3
 
 /* Arrays that map GCC register numbers to debugger register numbers.  */
 int loongarch_dwarf_regno[FIRST_PSEUDO_REGISTER];
@@ -1475,24 +1484,27 @@ loongarch_build_integer (struct loongarch_integer_op *codes,
     {
       /* The value of the lower 32 bit be loaded with one instruction.
 	 lu12i.w.  */
-      codes[0].code = UNKNOWN;
-      codes[0].method = METHOD_NORMAL;
-      codes[0].value = low_part;
+      codes[cost].code = UNKNOWN;
+      codes[cost].method = METHOD_NORMAL;
+      codes[cost].value = low_part;
+      codes[cost].curr_value = low_part;
       cost++;
     }
   else
     {
       /* lu12i.w + ior.  */
-      codes[0].code = UNKNOWN;
-      codes[0].method = METHOD_NORMAL;
-      codes[0].value = low_part & ~(IMM_REACH - 1);
+      codes[cost].code = UNKNOWN;
+      codes[cost].method = METHOD_NORMAL;
+      codes[cost].value = low_part & ~(IMM_REACH - 1);
+      codes[cost].curr_value = codes[cost].value;
       cost++;
       HOST_WIDE_INT iorv = low_part & (IMM_REACH - 1);
       if (iorv != 0)
 	{
-	  codes[1].code = IOR;
-	  codes[1].method = METHOD_NORMAL;
-	  codes[1].value = iorv;
+	  codes[cost].code = IOR;
+	  codes[cost].method = METHOD_NORMAL;
+	  codes[cost].value = iorv;
+	  codes[cost].curr_value = low_part;
 	  cost++;
 	}
     }
@@ -1515,23 +1527,34 @@ loongarch_build_integer (struct loongarch_integer_op *codes,
 	{
 	  codes[cost].method = METHOD_LU52I;
 	  codes[cost].value = value & LU52I_B;
+	  codes[cost].curr_value = codes[cost].value
+	    | (codes[cost-1].curr_value & 0xfffffffffffff);
 	  return cost + 1;
 	}
 
-      codes[cost].method = METHOD_LU32I;
-      codes[cost].value = (value & LU32I_B) | (sign51 ? LU52I_B : 0);
-      cost++;
-
-      /* Determine whether the 52-61 bits are sign-extended from the low order,
-	 and if not, load the 52-61 bits.  */
-      if (!lu52i[(value & (HOST_WIDE_INT_1U << 51)) >> 51])
+      if (lu52i[sign51])
 	{
-	  codes[cost].method = METHOD_LU52I;
-	  codes[cost].value = value & LU52I_B;
+	  /* Determine whether the 52-63 bits are sign-extended from the low
+	     order.  If so, the 52-63 bits of the immediate number do not need
+	     to be loaded.  */
+	  codes[cost].method = METHOD_LU32I;
+	  codes[cost].value = (value & LU32I_B) | (sign51 ? LU52I_B : 0);
+	  codes[cost].curr_value = codes[cost].value
+	    | (codes[cost-1].curr_value & 0xffffffff);
+	  cost++;
+	}
+      else
+	{
+	  /* If the higher 32 bits of the 64bit immediate need to be loaded
+	     separately by two instructions, a false immediate load instruction
+	     load_hi32 is used to load them.  */
+	  codes[cost].method = METHOD_LD_HI32;
+	  codes[cost].value = value & 0xffffffff00000000;
+	  codes[cost].curr_value = codes[cost].value
+	    | (codes[cost-1].curr_value & 0xffffffff);
 	  cost++;
 	}
     }
-
   gcc_assert (cost <= LARCH_MAX_INTEGER_OPS);
 
   return cost;
@@ -2911,29 +2934,36 @@ loongarch_move_integer (rtx temp, rtx dest, unsigned HOST_WIDE_INT value)
       else
 	x = force_reg (mode, x);
 
+      set_unique_reg_note (get_last_insn (), REG_EQUAL,
+			   GEN_INT (codes[i-1].curr_value));
+
       switch (codes[i].method)
 	{
 	case METHOD_NORMAL:
+	  /* mov or ior.  */
 	  x = gen_rtx_fmt_ee (codes[i].code, mode, x,
 			      GEN_INT (codes[i].value));
 	  break;
 	case METHOD_LU32I:
-	  emit_insn (
-	    gen_rtx_SET (x,
-			 gen_rtx_IOR (DImode,
-				      gen_rtx_ZERO_EXTEND (
-					DImode, gen_rtx_SUBREG (SImode, x, 0)),
-				      GEN_INT (codes[i].value))));
+	  gcc_assert (mode == DImode);
+	  /* lu32i_d */
+	  x = gen_rtx_IOR (mode, gen_rtx_ZERO_EXTEND (mode,
+						gen_rtx_SUBREG (SImode, x, 0)),
+			   GEN_INT (codes[i].value));
 	  break;
 	case METHOD_LU52I:
-	  emit_insn (gen_lu52i_d (x, x, GEN_INT (0xfffffffffffff),
-				  GEN_INT (codes[i].value)));
+	  gcc_assert (mode == DImode);
+	  /* lu52i_d */
+	  x = gen_rtx_IOR (mode, gen_rtx_AND (mode, x,
+					      GEN_INT (0xfffffffffffff)),
+			   GEN_INT (codes[i].value));
 	  break;
-	case METHOD_INSV:
-	  emit_insn (
-	    gen_rtx_SET (gen_rtx_ZERO_EXTRACT (DImode, x, GEN_INT (20),
-					       GEN_INT (32)),
-			 gen_rtx_REG (DImode, 0)));
+	case METHOD_LD_HI32:
+	  /* Load the high 32 bits of the immediate number.  */
+	  gcc_assert (mode == DImode);
+	  /* load_hi32 */
+	  x = gen_rtx_IOR (mode, gen_rtx_AND (mode, x, GEN_INT (0xffffffff)),
+			   GEN_INT (codes[i].value));
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -4891,7 +4921,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part,
    'd'	Print CONST_INT OP in decimal.
    'F'	Print the FPU branch condition for comparison OP.
    'G'	Print a DBAR insn if the memory model requires a release.
-   'H'  Print address 52-61bit relocation associated with OP.
+   'H'  Print address 52-63bit relocation associated with OP.
    'h'  Print the high-part relocation associated with OP.
    'i'	Print i if the operand is not a register.
    'L'  Print the low-part relocation associated with OP.
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f4a9c329fef..cfc046f546e 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -605,6 +605,12 @@ enum reg_class
 #define LU52I_OPERAND(VALUE) \
   (((VALUE) | (HWIT_UC_0xFFF << 52)) == (HWIT_UC_0xFFF << 52))
 
+/* True if VALUE can be loaded into a register using load_hi32.  */
+
+#define HWIT_UC_0xFFFFFFFF HOST_WIDE_INT_UC(0xffffffff)
+#define HI32_OPERAND(VALUE) \
+  (((VALUE) | (HWIT_UC_0xFFFFFFFF << 32)) == (HWIT_UC_0xFFFFFFFF << 32))
+
 /* Return a value X with the low 12 bits clear, and such that
    VALUE - X is a signed 12-bit value.  */
 
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 8f0d587c2d9..8329bee41f6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1882,6 +1882,36 @@ (define_expand "mov<mode>cc"
   DONE;
 })
 
+
+;; Load immediate to the 32-63 bits of the source register.
+(define_insn_and_split "load_hi32"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(ior:DI
+	  (and:DI (match_operand:DI 1 "register_operand" "0")
+		  (match_operand 2 "hi32_mask_operand"))
+	(match_operand 3 "const_hi32_operand" "x")))]
+  "TARGET_64BIT"
+  "#"
+  ""
+  [(set (match_dup 0)
+	(ior:DI
+	  (zero_extend:DI
+	    (subreg:SI (match_dup 1) 0))
+	  (match_dup 4)))
+   (set (match_dup 0)
+	(ior:DI
+	  (and:DI (match_dup 0)
+		  (match_dup 6))
+	  (match_dup 5)))]
+{
+  operands[4] = GEN_INT (INTVAL (operands[3]) << 12 >> 12);
+  operands[5] = GEN_INT (INTVAL (operands[3]) & 0xfff0000000000000);
+  operands[6] = GEN_INT (0xfffffffffffff);
+}
+  [(set_attr "insn_count" "2")])
+
+;; Load immediately counts to 32-51 bits of the source register,
+;; with high bit symbol extensions.
 (define_insn "lu32i_d"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(ior:DI
@@ -1893,6 +1923,7 @@ (define_insn "lu32i_d"
   [(set_attr "type" "arith")
    (set_attr "mode" "DI")])
 
+;; Load immediately counts to bits 52-63 of the source register.
 (define_insn "lu52i_d"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(ior:DI
diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md
index 8bd0c1376c9..29d81ff0250 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -35,6 +35,10 @@ (define_predicate "const_lu52i_operand"
   (and (match_code "const_int")
        (match_test "LU52I_OPERAND (INTVAL (op))")))
 
+(define_predicate "const_hi32_operand"
+  (and (match_code "const_int")
+       (match_test "HI32_OPERAND (INTVAL (op))")))
+
 (define_predicate "const_arith_operand"
   (and (match_code "const_int")
        (match_test "IMM12_OPERAND (INTVAL (op))")))
@@ -103,6 +107,10 @@ (define_predicate "lu52i_mask_operand"
   (and (match_code "const_int")
        (match_test "UINTVAL (op) == 0xfffffffffffff")))
 
+(define_predicate "hi32_mask_operand"
+  (and (match_code "const_int")
+       (match_test "UINTVAL (op) == 0xffffffff")))
+
 (define_predicate "low_bitmask_operand"
   (and (match_code "const_int")
        (match_test "low_bitmask_len (mode, INTVAL (op)) > 12")))
diff --git a/gcc/testsuite/gcc.target/loongarch/imm-load.c b/gcc/testsuite/gcc.target/loongarch/imm-load.c
new file mode 100644
index 00000000000..91ceb33d058
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/imm-load.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64d -O2 -fdump-rtl-loop2_invariant" } */
+
+extern long long b[10];
+static inline long long
+repeat_bytes (void)
+{
+  long long r = 0x0101010101010101;
+
+  return r;
+}
+
+static inline long long
+highbit_mask (long long m)
+{
+  return m & repeat_bytes ();
+}
+
+void test(long long *a)
+{
+  for (int i = 0; i < 10; i++)
+    b[i] = highbit_mask (a[i]);
+
+}
+/* { dg-final { scan-rtl-dump-times "moved without introducing a new temporary register" 4 "loop2_invariant" } } */