From patchwork Wed Dec 7 12:00:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 61656 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D6681380FAF0 for ; Wed, 7 Dec 2022 12:00:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D6681380FAF0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1670414457; bh=1/F5LIEEK0VNuSxMySY/j+4OCYxaVQvYLCi5CN8L/14=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=khzAsrXlBT2C02bioH5v6lN/k39ahQ0yLHsxeKvclT7jJjhSvxRv7o9oPPxBroEag ojJSkONuL3fvGH5TY7Qb7CLgGBVKiBESe+9rNO8t0hxmvJMVsCyhnS1ZMUKUzO1+Du NbcHPKlB4BYNb8ReM3g4Mg1G7ne58YY83V/oa9TY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8C51338F66C2; Wed, 7 Dec 2022 12:00:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8C51338F66C2 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2B7AUfPS020315; Wed, 7 Dec 2022 12:00:18 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3mas531x8g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 07 Dec 2022 12:00:18 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2B7BwUun001343; Wed, 7 Dec 2022 12:00:17 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3mas531x6h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 07 Dec 2022 12:00:17 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.17.1.19/8.16.1.2) with ESMTP id 2B79F9U1004896; Wed, 7 Dec 2022 12:00:14 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma01fra.de.ibm.com (PPS) with ESMTPS id 3m9pv9sy8e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 07 Dec 2022 12:00:14 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2B7C0C6r17695328 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 7 Dec 2022 12:00:12 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1F9FA20043; Wed, 7 Dec 2022 12:00:12 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 68F4320040; Wed, 7 Dec 2022 12:00:10 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 7 Dec 2022 12:00:10 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, guojiufu@linux.ibm.com, ebotcazou@libertysurf.fr, steven@gcc.gnu.org, rguenther@suse.de, jeffreyalaw@gmail.com Subject: [PATCH V3] Use reg mode to move sub blocks for parameters and returns Date: Wed, 7 Dec 2022 20:00:08 +0800 Message-Id: <20221207120008.126895-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: l6p6lcl2KG3FfhfnIlP0a--1W_iBUNJq X-Proofpoint-ORIG-GUID: 6x2jloKdAYqXPjLDR86v_Ym3DTP7OeJz X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-07_05,2022-12-07_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 lowpriorityscore=0 mlxlogscore=999 suspectscore=0 priorityscore=1501 malwarescore=0 bulkscore=0 adultscore=0 mlxscore=0 spamscore=0 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2212070099 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jiufu Guo via Gcc-patches From: Jiufu Guo Reply-To: Jiufu Guo Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, When assigning a parameter to a variable, or assigning a variable to return value with struct type, "block move" are used to expand the assignment. It would be better to use the register mode according to the target/ABI to move the blocks if the parameter/return is passed through registers. And then this would raise more opportunities for other optimization passes(cse/dse/xprop). As the example code (like code in PR65421): typedef struct SA {double a[3];} A; A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s) A ret_arg (A a) {return a;} // just empty fun body void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s) This patches check the "from" and "to" of an assignment in "expand_assignment", if it is about param/ret which may passing via register, then use the register mode to move sub-blocks for the assignning. This patches are based on the discussions for previous version: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606498.html https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607480.html Comparing with previous version, this patch uses a flag to mark if a parameter is passed via registers and stored to stack. And when expanding an assignment, this flag is checked. This could avoid inaccurate guess during expanding the assignment. In this version, testcase pr65421-1.c is updated to remove unnecessary checks. Bootstrap and regtest pass on ppc64{,le} and x86_64. Is this ok for trunk? BR, Jeff (Jiufu) PR target/65421 gcc/ChangeLog: * cfgexpand.cc (expand_used_vars): Update to mark DECL_USEDBY_RETURN_P for returns. * expr.cc (move_sub_blocks): New function. (expand_assignment): Update to call move_sub_blocks for returns or parameters. * function.cc (assign_parm_setup_block): Update to mark DECL_STACK_REGS_P for parameter. * tree-core.h (struct tree_decl_common): Add comment. * tree.h (DECL_USEDBY_RETURN_P): New define. (DECL_STACK_REGS_P): New define. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr65421-1.c: New test. * gcc.target/powerpc/pr65421.c: New test. --- gcc/cfgexpand.cc | 14 ++++ gcc/expr.cc | 81 ++++++++++++++++++++ gcc/function.cc | 3 + gcc/tree-core.h | 4 +- gcc/tree.h | 9 +++ gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 15 ++++ gcc/testsuite/gcc.target/powerpc/pr65421.c | 24 ++++++ 7 files changed, 149 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index dd29ffffc03..09b8ec64cea 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) frame_phase = off ? align - off : 0; } + /* Collect VARs on returns. */ + if (DECL_RESULT (current_function_decl)) + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) + { + tree val = gimple_return_retval (ret); + if (val && VAR_P (val)) + DECL_USEDBY_RETURN_P (val) = 1; + } + } + /* Set TREE_USED on all variables in the local_decls. */ FOR_EACH_LOCAL_DECL (cfun, i, var) TREE_USED (var) = 1; diff --git a/gcc/expr.cc b/gcc/expr.cc index d9407432ea5..d61669d5662 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -5559,6 +5559,55 @@ mem_ref_refers_to_non_mem_p (tree ref) return non_mem_decl_p (base); } +/* Sub routine of expand_assignment, invoked when assigning from a + parameter or assigning to a return val on struct type which may + be passed through registers. The mode of register is used to + move the content for the assignment. + + This routine generates code for expression FROM which is BLKmode, + and move the generated content to TO_RTX by su-blocks in SUB_MODE. */ + +static void +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode, bool nontemporal) +{ + HOST_WIDE_INT size, sub_size; + int len; + + gcc_assert (MEM_P (to_rtx)); + + size = MEM_SIZE (to_rtx).to_constant (); + sub_size = GET_MODE_SIZE (sub_mode).to_constant (); + len = size / sub_size; + + /* It would be not profitable to move through sub-modes, if the size does + not meet register mode. */ + if ((size % sub_size) != 0) + { + push_temp_slots (); + rtx result = store_expr (from, to_rtx, 0, nontemporal, false); + preserve_temp_slots (result); + pop_temp_slots (); + return; + } + + push_temp_slots (); + + rtx from_rtx; + from_rtx = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL); + for (int i = 0; i < len; i++) + { + rtx temp = gen_reg_rtx (sub_mode); + rtx src = adjust_address (from_rtx, sub_mode, sub_size * i); + rtx dest = adjust_address (to_rtx, sub_mode, sub_size * i); + emit_move_insn (temp, src); + emit_move_insn (dest, temp); + } + + preserve_temp_slots (to_rtx); + pop_temp_slots (); + return; +} + /* Expand an assignment that stores the value of FROM into TO. If NONTEMPORAL is true, try generating a nontemporal store. */ @@ -6045,6 +6094,38 @@ expand_assignment (tree to, tree from, bool nontemporal) return; } + /* If it is assigning from a struct param which may be passed via registers, + It would be better to use the register's mode to move sub-blocks for the + assignment. */ + if (TREE_CODE (from) == PARM_DECL && mode == BLKmode + && DECL_STACK_REGS_P (from)) + { + rtx parm = DECL_INCOMING_RTL (from); + gcc_assert (REG_P (parm) || GET_CODE (parm) == PARALLEL); + + machine_mode sub_mode; + if (REG_P (parm)) + sub_mode = word_mode; + else + sub_mode = GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0)); + + move_sub_blocks (to_rtx, from, sub_mode, nontemporal); + return; + } + + /* If it is assigning to a struct var which will be returned, and the + function is returning via registers, it would be better to use the + register's mode to move sub-blocks for the assignment. */ + if (VAR_P (to) && DECL_USEDBY_RETURN_P (to) && mode == BLKmode + && TREE_CODE (from) != CONSTRUCTOR + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) == PARALLEL) + { + rtx ret = DECL_RTL (DECL_RESULT (current_function_decl)); + machine_mode sub_mode = GET_MODE (XEXP (XVECEXP (ret, 0, 0), 0)); + move_sub_blocks (to_rtx, from, sub_mode, nontemporal); + return; + } + /* Compute FROM and store the value in the rtx we got. */ push_temp_slots (); diff --git a/gcc/function.cc b/gcc/function.cc index dc333c27e92..a118f98a203 100644 --- a/gcc/function.cc +++ b/gcc/function.cc @@ -2991,6 +2991,9 @@ assign_parm_setup_block (struct assign_parm_data_all *all, mem = validize_mem (copy_rtx (stack_parm)); + if (MEM_P (mem)) + DECL_STACK_REGS_P (parm) = 1; + /* Handle values in multiple non-contiguous locations. */ if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem)) emit_group_store (mem, entry_parm, data->arg.type, size); diff --git a/gcc/tree-core.h b/gcc/tree-core.h index e146b133dbd..1a2d6c4ca62 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1808,7 +1808,9 @@ struct GTY(()) tree_decl_common { In VAR_DECL, PARM_DECL and RESULT_DECL, this is DECL_HAS_VALUE_EXPR_P. */ unsigned decl_flag_2 : 1; - /* In FIELD_DECL, this is DECL_PADDING_P. */ + /* In FIELD_DECL, this is DECL_PADDING_P + In VAR_DECL, this is DECL_USEDBY_RETURN_P + In PARM_DECL, this is DECL_STACK_REGS_P. */ unsigned decl_flag_3 : 1; /* Logically, these two would go in a theoretical base shared by var and parm decl. */ diff --git a/gcc/tree.h b/gcc/tree.h index 4a19de1c94d..f9c2e8491a0 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -3007,6 +3007,15 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_PADDING_P(NODE) \ (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_3) +/* Used in a VAR_DECL to indicate that it is used by a return stmt. */ +#define DECL_USEDBY_RETURN_P(NODE) \ + (VAR_DECL_CHECK (NODE)->decl_common.decl_flag_3) + +/* Used in a PARM_DECL to indicate that it is struct parameter passed + by registers totally and stored to stack during setup. */ +#define DECL_STACK_REGS_P(NODE) \ + (PARM_DECL_CHECK (NODE)->decl_common.decl_flag_3) + /* Used in a FIELD_DECL to indicate whether this field is not a flexible array member. This is only valid for the last array type field of a structure. */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c new file mode 100644 index 00000000000..ceae0119c19 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c @@ -0,0 +1,15 @@ +/* PR target/65421 */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +typedef struct SA +{ + double a[2]; + long l; +} A; + +/* std 3 param regs to return slot */ +A ret_arg (A a) {return a;} +/* { dg-final { scan-assembler-times {\mstd 4,0\(3\)\s} 1 } } */ +/* { dg-final { scan-assembler-times {\mstd 5,8\(3\)\s} 1 } } * +/* { dg-final { scan-assembler-times {\mstd 6,16\(3\)\s} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c b/gcc/testsuite/gcc.target/powerpc/pr65421.c new file mode 100644 index 00000000000..db7c465b5a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c @@ -0,0 +1,24 @@ +/* PR target/65421 */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target powerpc_elfv2 } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +typedef struct SA +{ + double a[3]; +} A; + +/* 3 lfd */ +A ret_arg_pt (A *a){return *a;} + +/* blr */ +A ret_arg (A a) {return a;} + +/* 3 stfd */ +void st_arg (A a, A *p) {*p = a;} + +/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ + +/* count insns */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */