From patchwork Thu Nov 17 09:59:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lulu Cheng X-Patchwork-Id: 60743 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3942D398643B for ; Thu, 17 Nov 2022 09:59:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 9B5A13839C5A for ; Thu, 17 Nov 2022 09:59:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9B5A13839C5A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8Dxfbf0BXZj3UEIAA--.18521S3; Thu, 17 Nov 2022 17:59:16 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8CxZ1fuBXZj+9wVAA--.39539S2; Thu, 17 Nov 2022 17:59:14 +0800 (CST) From: Lulu Cheng To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, xuchenghua@loongson.cn, Lulu Cheng Subject: [PATCH v4] LoongArch: Optimize immediate load. Date: Thu, 17 Nov 2022 17:59:09 +0800 Message-Id: <20221117095909.2896386-1-chenglulu@loongson.cn> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-CM-TRANSID: AQAAf8CxZ1fuBXZj+9wVAA--.39539S2 X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBjvJXoW3Cr15Aw4DAw47ur45Ar1fJFb_yoWkXFyxp3 y3u3ZIqr48Jrnakw1kJay3XwsxtF1fGF47uasxtryfAw43JrykXF10yr9avFy3u34YgryI qa1fWw1UuF1UK37anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUj1kv1TuYvTs0mT0YCTnIWj qI5I8CrVACY4xI64kE6c02F40Ex7xfYxn0WfASr-VFAUDa7-sFnT9fnUUIcSsGvfJTRUUU b7xYFVCjjxCrM7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4v20xvaj40_Wr0E3s 1l1IIY67AEw4v_Jrv_JF1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xv wVC0I7IYx2IY67AKxVW8JVW5JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwA2z4 x0Y4vEx4A2jsIE14v26F4UJVW0owA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Cr1j6rxdM2AI xVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx1l5I8CrVACY4xI64 kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64vIr41l4I8I3I 0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWU GVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI 0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0 rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r 4UYxBIdaVFxhVjvjDU0xZFpf9x07jUsqXUUUUU= X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" v1 -> v2: 1. Change the code format. 2. Fix bugs in the code. v2 -> v3: Modifying a code implementation of an undefined behavior. v3 -> v4: Move the part of the immediate number decomposition from expand pass to split pass. Both regression tests and spec2006 passed. The problem mentioned in the link does not move the four immediate load instructions out of the loop. It has been optimized. Now, as in the test case, four immediate load instructions are generated outside the loop. (https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html) -------------------------------------------------------------------- Because loop2_invariant pass will extract the instructions that do not change in the loop out of the loop, some instructions will not meet the extraction conditions if the machine performs immediate decomposition while expand pass, so the immediate decomposition will be transferred to the split process. gcc/ChangeLog: * config/loongarch/loongarch.cc (enum loongarch_load_imm_method): Remove the member METHOD_INSV that is not currently used. (struct loongarch_integer_op): Define a new member curr_value, that records the value of the number stored in the destination register immediately after the current instruction has run. (loongarch_build_integer): Assign a value to the curr_value member variable. (loongarch_move_integer): Adds information for the immediate load instruction. * config/loongarch/loongarch.md (*movdi_32bit): Redefine as define_insn_and_split. (*movdi_64bit): Likewise. (*movsi_internal): Likewise. (*movhi_internal): Likewise. * config/loongarch/predicates.md: Return true as long as it is CONST_INT, ensure that the immediate number is not optimized by decomposition during expand optimization loop. gcc/testsuite/ChangeLog: * gcc.target/loongarch/imm-load.c: New test. * gcc.target/loongarch/imm-load1.c: New test. --- gcc/config/loongarch/loongarch.cc | 62 ++++++++++--------- gcc/config/loongarch/loongarch.md | 44 +++++++++++-- gcc/config/loongarch/predicates.md | 2 +- gcc/testsuite/gcc.target/loongarch/imm-load.c | 10 +++ .../gcc.target/loongarch/imm-load1.c | 26 ++++++++ 5 files changed, 110 insertions(+), 34 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load1.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 8ee32c90573..9e0d6c7c3ea 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -139,22 +139,21 @@ struct loongarch_address_info METHOD_LU52I: Load 52-63 bit of the immediate number. - - METHOD_INSV: - immediate like 0xfff00000fffffxxx - */ +*/ enum loongarch_load_imm_method { METHOD_NORMAL, METHOD_LU32I, - METHOD_LU52I, - METHOD_INSV + METHOD_LU52I }; struct loongarch_integer_op { enum rtx_code code; HOST_WIDE_INT value; + /* Represent the result of the immediate count of the load instruction at + each step. */ + HOST_WIDE_INT curr_value; enum loongarch_load_imm_method method; }; @@ -1475,24 +1474,27 @@ loongarch_build_integer (struct loongarch_integer_op *codes, { /* The value of the lower 32 bit be loaded with one instruction. lu12i.w. */ - codes[0].code = UNKNOWN; - codes[0].method = METHOD_NORMAL; - codes[0].value = low_part; + codes[cost].code = UNKNOWN; + codes[cost].method = METHOD_NORMAL; + codes[cost].value = low_part; + codes[cost].curr_value = low_part; cost++; } else { /* lu12i.w + ior. */ - codes[0].code = UNKNOWN; - codes[0].method = METHOD_NORMAL; - codes[0].value = low_part & ~(IMM_REACH - 1); + codes[cost].code = UNKNOWN; + codes[cost].method = METHOD_NORMAL; + codes[cost].value = low_part & ~(IMM_REACH - 1); + codes[cost].curr_value = codes[cost].value; cost++; HOST_WIDE_INT iorv = low_part & (IMM_REACH - 1); if (iorv != 0) { - codes[1].code = IOR; - codes[1].method = METHOD_NORMAL; - codes[1].value = iorv; + codes[cost].code = IOR; + codes[cost].method = METHOD_NORMAL; + codes[cost].value = iorv; + codes[cost].curr_value = low_part; cost++; } } @@ -1515,11 +1517,14 @@ loongarch_build_integer (struct loongarch_integer_op *codes, { codes[cost].method = METHOD_LU52I; codes[cost].value = value & LU52I_B; + codes[cost].curr_value = value; return cost + 1; } codes[cost].method = METHOD_LU32I; codes[cost].value = (value & LU32I_B) | (sign51 ? LU52I_B : 0); + codes[cost].curr_value = (value & 0xfffffffffffff) + | (sign51 ? LU52I_B : 0); cost++; /* Determine whether the 52-61 bits are sign-extended from the low order, @@ -1528,6 +1533,7 @@ loongarch_build_integer (struct loongarch_integer_op *codes, { codes[cost].method = METHOD_LU52I; codes[cost].value = value & LU52I_B; + codes[cost].curr_value = value; cost++; } } @@ -2911,6 +2917,9 @@ loongarch_move_integer (rtx temp, rtx dest, unsigned HOST_WIDE_INT value) else x = force_reg (mode, x); + set_unique_reg_note (get_last_insn (), REG_EQUAL, + GEN_INT (codes[i-1].curr_value)); + switch (codes[i].method) { case METHOD_NORMAL: @@ -2918,22 +2927,17 @@ loongarch_move_integer (rtx temp, rtx dest, unsigned HOST_WIDE_INT value) GEN_INT (codes[i].value)); break; case METHOD_LU32I: - emit_insn ( - gen_rtx_SET (x, - gen_rtx_IOR (DImode, - gen_rtx_ZERO_EXTEND ( - DImode, gen_rtx_SUBREG (SImode, x, 0)), - GEN_INT (codes[i].value)))); + gcc_assert (mode == DImode); + x = gen_rtx_IOR (DImode, + gen_rtx_ZERO_EXTEND (DImode, + gen_rtx_SUBREG (SImode, x, 0)), + GEN_INT (codes[i].value)); break; case METHOD_LU52I: - emit_insn (gen_lu52i_d (x, x, GEN_INT (0xfffffffffffff), - GEN_INT (codes[i].value))); - break; - case METHOD_INSV: - emit_insn ( - gen_rtx_SET (gen_rtx_ZERO_EXTRACT (DImode, x, GEN_INT (20), - GEN_INT (32)), - gen_rtx_REG (DImode, 0))); + gcc_assert (mode == DImode); + x = gen_rtx_IOR (DImode, + gen_rtx_AND (DImode, x, GEN_INT (0xfffffffffffff)), + GEN_INT (codes[i].value)); break; default: gcc_unreachable (); diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 2fda5381904..f61db66d535 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -1718,23 +1718,41 @@ (define_expand "movdi" DONE; }) -(define_insn "*movdi_32bit" +(define_insn_and_split "*movdi_32bit" [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,w,*f,*f,*r,*m") (match_operand:DI 1 "move_operand" "r,i,w,r,*J*r,*m,*f,*f"))] "!TARGET_64BIT && (register_operand (operands[0], DImode) || reg_or_0_operand (operands[1], DImode))" { return loongarch_output_move (operands[0], operands[1]); } + "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO + (operands[0]))" + [(const_int 0)] + " +{ + loongarch_move_integer (operands[0], operands[0], INTVAL (operands[1])); + DONE; +} + " [(set_attr "move_type" "move,const,load,store,mgtf,fpload,mftg,fpstore") (set_attr "mode" "DI")]) -(define_insn "*movdi_64bit" +(define_insn_and_split "*movdi_64bit" [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,w,*f,*f,*r,*m") (match_operand:DI 1 "move_operand" "r,Yd,w,rJ,*r*J,*m,*f,*f"))] "TARGET_64BIT && (register_operand (operands[0], DImode) || reg_or_0_operand (operands[1], DImode))" { return loongarch_output_move (operands[0], operands[1]); } + "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO + (operands[0]))" + [(const_int 0)] + " +{ + loongarch_move_integer (operands[0], operands[0], INTVAL (operands[1])); + DONE; +} + " [(set_attr "move_type" "move,const,load,store,mgtf,fpload,mftg,fpstore") (set_attr "mode" "DI")]) @@ -1749,12 +1767,21 @@ (define_expand "movsi" DONE; }) -(define_insn "*movsi_internal" +(define_insn_and_split "*movsi_internal" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,w,*f,*f,*r,*m,*r,*z") (match_operand:SI 1 "move_operand" "r,Yd,w,rJ,*r*J,*m,*f,*f,*z,*r"))] "(register_operand (operands[0], SImode) || reg_or_0_operand (operands[1], SImode))" { return loongarch_output_move (operands[0], operands[1]); } + "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO + (operands[0]))" + [(const_int 0)] + " +{ + loongarch_move_integer (operands[0], operands[0], INTVAL (operands[1])); + DONE; +} + " [(set_attr "move_type" "move,const,load,store,mgtf,fpload,mftg,fpstore,mftg,mgtf") (set_attr "mode" "SI")]) @@ -1774,12 +1801,21 @@ (define_expand "movhi" DONE; }) -(define_insn "*movhi_internal" +(define_insn_and_split "*movhi_internal" [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,r,m,r,k") (match_operand:HI 1 "move_operand" "r,Yd,I,m,rJ,k,rJ"))] "(register_operand (operands[0], HImode) || reg_or_0_operand (operands[1], HImode))" { return loongarch_output_move (operands[0], operands[1]); } + "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO + (operands[0]))" + [(const_int 0)] + " +{ + loongarch_move_integer (operands[0], operands[0], INTVAL (operands[1])); + DONE; +} + " [(set_attr "move_type" "move,const,const,load,store,load,store") (set_attr "mode" "HI")]) diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index 8bd0c1376c9..58c3dc2261c 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -226,7 +226,7 @@ (define_predicate "move_operand" switch (GET_CODE (op)) { case CONST_INT: - return !splittable_const_int_operand (op, mode); + return true; case CONST: case SYMBOL_REF: diff --git a/gcc/testsuite/gcc.target/loongarch/imm-load.c b/gcc/testsuite/gcc.target/loongarch/imm-load.c new file mode 100644 index 00000000000..c04ca33996f --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/imm-load.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64d -O2 -fdump-rtl-split1" } */ + +long int +test (void) +{ + return 0x1234567890abcdef; +} +/* { dg-final { scan-rtl-dump-times "scanning new insn with uid" 6 "split1" } } */ + diff --git a/gcc/testsuite/gcc.target/loongarch/imm-load1.c b/gcc/testsuite/gcc.target/loongarch/imm-load1.c new file mode 100644 index 00000000000..2ff02971239 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/imm-load1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64d -O2" } */ +/* { dg-final { scan-assembler "test:.*lu52i\.d.*\n\taddi\.w.*\n\.L2:" } } */ + + +extern long long b[10]; +static inline long long +repeat_bytes (void) +{ + long long r = 0x0101010101010101; + + return r; +} + +static inline long long +highbit_mask (long long m) +{ + return m & repeat_bytes (); +} + +void test(long long *a) +{ + for (int i = 0; i < 10; i++) + b[i] = highbit_mask (a[i]); + +}