From patchwork Mon May 29 03:52:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 70201 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 44C8C3858434 for ; Mon, 29 May 2023 03:53:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 44C8C3858434 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685332380; bh=GOSoeHKikVYuwttqoUmKgJPRrzhvimGjENd78OVvOPE=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=v2vUuVNOTbre/YWKy3sYEOez0NN5H1NsOE8/KmmXl/EjN4K0qiJIzBCfp0wso/itz S/r/T3cY2iOWnYMjgCD+r93lO2IImvLTl3RP492qoY46IhD3mnHD/K62l8HA+VRuuh 19x75PRWDKH+iYM7jZVVlmt71jXs2f6OGxytVfJM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 071853858D32; Mon, 29 May 2023 03:52:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 071853858D32 Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34T3FAiv028592; Mon, 29 May 2023 03:52:26 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qvkyprmma-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 29 May 2023 03:52:26 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34T3jfPj008931; Mon, 29 May 2023 03:52:25 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qvkyprmky-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 29 May 2023 03:52:25 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34T3ZfXV002211; Mon, 29 May 2023 03:52:23 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3qu9g58t37-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 29 May 2023 03:52:22 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34T3qKe220185634 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 29 May 2023 03:52:20 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6AC902004B; Mon, 29 May 2023 03:52:20 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EA3D120040; Mon, 29 May 2023 03:52:18 +0000 (GMT) Received: from ltcden2-lp1.aus.stglabs.ibm.com (unknown [9.3.90.43]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 29 May 2023 03:52:18 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, jeffreyalaw@gmail.com, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, guojiufu@linux.ibm.com Subject: [RFC] light expander sra for parameters and returns Date: Mon, 29 May 2023 11:52:17 +0800 Message-Id: <20230529035217.2126346-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.39.1 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: eHbgQDYYt_p71bi_dI38lRNRmjmRyuzH X-Proofpoint-GUID: 0h4mXfBhOY8aqeDFPrYF8q5XwZ5vQx2t X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-29_01,2023-05-25_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 clxscore=1011 spamscore=0 mlxscore=0 suspectscore=0 phishscore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 adultscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305290029 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jiufu Guo via Gcc-patches From: Jiufu Guo Reply-To: Jiufu Guo Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, Previously, I was investigating some struct parameters and returns related PRs 69143/65421/108073. Investigating the issues case by case, and drafting patches for each of them one by one. This would help us to enhance code incrementally. While, this way, patches would interact with each other and implement different codes for similar issues (because of the different paths in gimple/rtl). We may have a common fix for those issues. We know a few other related PRs(such as meta-bug PR101926) exist. For those PRs in different targets with different symptoms (and also different root cause), I would expect a method could help some of them, but it may be hard to handle all of them in one fix. With investigation and check discussion for the issues, I remember a suggestion from Richard: it would be nice to perform some SRA-like analysis for the accesses on the structs (parameter/returns). https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html This may be a 'fairly common method' for those issues. With this idea, I drafted a patch as below in this mail. I also thought about directly using tree-sra.cc, e.g. enhance it and rerun it at the end of GIMPLE passes. While since some issues are introduced inside the expander, so below patch also co-works with other parts of the expander. And since we already have tree-sra in gimple pass, we only need to take more care on parameter and return in this patch: other decls could be handled well in tree-sra. The steps of this patch are: 1. Collect struct type parameters and returns, and then scan the function to get the accesses on them. And figure out the accesses which would be profitable to be scalarized (using registers of the parameter/return ). Now, reading on parameter and writing on returns are checked in the current patch. 2. When/after the scalar registers are determined/expanded for the return or parameters, compute the corresponding scalar register(s) for each accesses of the return/parameter, and prepare the scalar RTLs for those accesses. 3. When using/expanding the accesses expression, leverage the computed/prepared scalars directly. This patch is tested on ppc64 both LE and BE. To continue, I would ask for comments and suggestions first. And then I would update/enhance accordingly. Thanks in advance! BR, Jeff (Jiufu) --- gcc/cfgexpand.cc | 567 ++++++++++++++++++- gcc/expr.cc | 15 +- gcc/function.cc | 26 +- gcc/opts.cc | 8 +- gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- gcc/testsuite/gcc.target/powerpc/pr108073.c | 29 + gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 + gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 32 ++ 8 files changed, 675 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index 85a93a547c0..95c29b6b6fe 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -97,6 +97,564 @@ static bool defer_stack_allocation (tree, bool); static void record_alignment_for_reg_var (unsigned int); +/* For light SRA in expander about paramaters and returns. */ +namespace { + +struct access +{ + /* Each accessing on the aggragate is about OFFSET/SIZE and BASE. */ + HOST_WIDE_INT offset; + HOST_WIDE_INT size; + tree base; + bool writing; + + /* The context expression of this access. */ + tree expr; + + /* The rtx for the access: link to incoming/returning register(s). */ + rtx rtx_val; +}; + +typedef struct access *access_p; + +/* Expr (tree) -> Acess (access_p) map. */ +static hash_map *expr_access_vec; + +/* Base (tree) -> Vector (vec *) map. */ +static hash_map > *base_access_vec; + +/* Return a vector of pointers to accesses for the variable given in BASE or + NULL if there is none. */ + +static vec * +get_base_access_vector (tree base) +{ + return base_access_vec->get (base); +} + +/* Remove DECL from candidates for SRA. */ +static void +disqualify_candidate (tree decl) +{ + decl = get_base_address (decl); + base_access_vec->remove (decl); +} + +/* Create and insert access for EXPR. Return created access, or NULL if it is + not possible. */ +static struct access * +create_access (tree expr, bool write) +{ + poly_int64 poffset, psize, pmax_size; + bool reverse; + + tree base + = get_ref_base_and_extent (expr, &poffset, &psize, &pmax_size, &reverse); + + if (!DECL_P (base)) + return NULL; + + vec *access_vec = get_base_access_vector (base); + if (!access_vec) + return NULL; + + /* TODO: support reverse. */ + if (reverse) + { + disqualify_candidate (expr); + return NULL; + } + + HOST_WIDE_INT offset, size, max_size; + if (!poffset.is_constant (&offset) || !psize.is_constant (&size) + || !pmax_size.is_constant (&max_size)) + return NULL; + + if (size != max_size || size == 0 || offset < 0 || size < 0 + || offset + size > tree_to_shwi (DECL_SIZE (base))) + return NULL; + + struct access *access = XNEWVEC (struct access, 1); + + memset (access, 0, sizeof (struct access)); + access->base = base; + access->offset = offset; + access->size = size; + access->expr = expr; + access->writing = write; + access->rtx_val = NULL_RTX; + + access_vec->safe_push (access); + + return access; +} + +/* Return true if VAR is a candidate for SRA. */ +static bool +add_sra_candidate (tree var) +{ + tree type = TREE_TYPE (var); + + if (!AGGREGATE_TYPE_P (type) || TREE_THIS_VOLATILE (var) + || !COMPLETE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type)) + || tree_to_shwi (TYPE_SIZE (type)) == 0 + || TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (va_list_type_node)) + return false; + + base_access_vec->get_or_insert (var); + + return true; +} + +/* Callback of walk_stmt_load_store_addr_ops visit_addr used to remove + operands with address taken. */ +static tree +visit_addr (tree *tp, int *, void *) +{ + tree op = *tp; + if (op && DECL_P (op)) + disqualify_candidate (op); + + return NULL; +} + +/* Scan expression EXPR and create access structures for all accesses to + candidates for scalarization. Return the created access or NULL if none is + created. */ +static struct access * +build_access_from_expr (tree expr, bool write) +{ + if (TREE_CODE (expr) == VIEW_CONVERT_EXPR) + expr = TREE_OPERAND (expr, 0); + + if (TREE_CODE (expr) == BIT_FIELD_REF || storage_order_barrier_p (expr) + || TREE_THIS_VOLATILE (expr)) + { + disqualify_candidate (expr); + return NULL; + } + + switch (TREE_CODE (expr)) + { + case MEM_REF: { + tree op = TREE_OPERAND (expr, 0); + if (TREE_CODE (op) == ADDR_EXPR) + disqualify_candidate (TREE_OPERAND (op, 0)); + break; + } + case ADDR_EXPR: + case IMAGPART_EXPR: + case REALPART_EXPR: + disqualify_candidate (TREE_OPERAND (expr, 0)); + break; + case VAR_DECL: + case PARM_DECL: + case RESULT_DECL: + case COMPONENT_REF: + case ARRAY_REF: + case ARRAY_RANGE_REF: + return create_access (expr, write); + break; + default: + break; + } + + return NULL; +} + +/* Scan function and look for interesting expressions and create access + structures for them. */ +static void +scan_function (void) +{ + basic_block bb; + + FOR_EACH_BB_FN (bb, cfun) + { + for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gphi *phi = gsi.phi (); + for (size_t i = 0; i < gimple_phi_num_args (phi); i++) + { + tree t = gimple_phi_arg_def (phi, i); + walk_tree (&t, visit_addr, NULL, NULL); + } + } + + for (gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb); + !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + switch (gimple_code (stmt)) + { + case GIMPLE_RETURN: { + tree r = gimple_return_retval (as_a (stmt)); + if (r && VAR_P (r) && r != DECL_RESULT (current_function_decl)) + build_access_from_expr (r, true); + } + break; + case GIMPLE_ASSIGN: + if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt)) + { + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + if (TREE_CODE (rhs) == CONSTRUCTOR) + disqualify_candidate (lhs); + else + { + build_access_from_expr (rhs, false); + build_access_from_expr (lhs, true); + } + } + break; + default: + walk_gimple_op (stmt, visit_addr, NULL); + break; + } + } + } +} + +/* Collect the parameter and returns with type which is suitable for + * scalarization. */ +static bool +collect_light_sra_candidates (void) +{ + bool ret = false; + + /* Collect parameters. */ + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; + parm = DECL_CHAIN (parm)) + ret |= add_sra_candidate (parm); + + /* Collect VARs on returns. */ + if (DECL_RESULT (current_function_decl)) + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) + { + tree val = gimple_return_retval (r); + if (val && VAR_P (val)) + ret |= add_sra_candidate (val); + } + } + + return ret; +} + +/* Now, only scalarize the parms only with reading + or returns only with writing. */ +bool +check_access_vec (tree const &base, auto_vec const &access_vec, + auto_vec *unqualify_vec) +{ + bool read = false; + bool write = false; + for (unsigned int j = 0; j < access_vec.length (); j++) + { + struct access *access = access_vec[j]; + if (access->writing) + write = true; + else + read = true; + + if (write && read) + break; + } + if ((write && read) || (!write && !read)) + unqualify_vec->safe_push (base); + + return true; +} + +/* Analyze all the accesses, remove those inprofitable candidates. + And build the expr->access map. */ +static void +analyze_accesses () +{ + auto_vec unqualify_vec; + base_access_vec->traverse *, check_access_vec> ( + &unqualify_vec); + + tree base; + unsigned i; + FOR_EACH_VEC_ELT (unqualify_vec, i, base) + disqualify_candidate (base); +} + +static void +prepare_expander_sra () +{ + if (optimize <= 0) + return; + + base_access_vec = new hash_map >; + expr_access_vec = new hash_map; + + if (collect_light_sra_candidates ()) + { + scan_function (); + analyze_accesses (); + } +} + +static void +free_expander_sra () +{ + if (optimize <= 0 || !expr_access_vec) + return; + delete expr_access_vec; + expr_access_vec = 0; + delete base_access_vec; + base_access_vec = 0; +} +} /* namespace */ + +/* Check If there is an sra access for the expr. + Return the correspond scalar sym for the access. */ +rtx +get_scalar_rtx_for_aggregate_expr (tree expr) +{ + if (!expr_access_vec) + return NULL_RTX; + access_p *access = expr_access_vec->get (expr); + return access ? (*access)->rtx_val : NULL_RTX; +} + +extern rtx +expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int); + +/* Compute/Set RTX registers for those accesses on BASE. */ +void +set_scalar_rtx_for_aggregate_access (tree base, rtx regs) +{ + if (!base_access_vec) + return; + vec *access_vec = get_base_access_vector (base); + if (!access_vec) + return; + + /* Go through each access, compute corresponding rtx(regs or subregs) + for the expression. */ + int n = access_vec->length (); + int cur_access_index = 0; + for (; cur_access_index < n; cur_access_index++) + { + access_p acc = (*access_vec)[cur_access_index]; + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr)); + /* non BLK in mult registers*/ + if (expr_mode != BLKmode + && known_gt (acc->size, GET_MODE_BITSIZE (word_mode))) + break; + + int start_index = -1; + int end_index = -1; + HOST_WIDE_INT left_margin_bits = 0; + HOST_WIDE_INT right_margin_bits = 0; + int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1; + for (; cur_index < XVECLEN (regs, 0); cur_index++) + { + rtx slot = XVECEXP (regs, 0, cur_index); + HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT; + HOST_WIDE_INT size + = GET_MODE_BITSIZE (GET_MODE (XEXP (slot, 0))).to_constant (); + if (off <= acc->offset && off + size > acc->offset) + { + start_index = cur_index; + left_margin_bits = acc->offset - off; + } + if (off + size >= acc->offset + acc->size) + { + end_index = cur_index; + right_margin_bits = off + size - (acc->offset + acc->size); + break; + } + } + /* accessing pading and outof bound. */ + if (start_index < 0 || end_index < 0) + break; + + /* Need a parallel for possible multi-registers. */ + if (expr_mode == BLKmode || end_index > start_index) + { + /* Can not support start from middle of a register. */ + if (left_margin_bits != 0) + break; + + int len = end_index - start_index + 1; + const int margin = 3; /* more space for SI, HI, QI. */ + rtx *tmps = XALLOCAVEC (rtx, len + (right_margin_bits ? margin : 0)); + + HOST_WIDE_INT start_off + = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1)); + int pos = 0; + for (; pos < len - (right_margin_bits ? 1 : 0); pos++) + { + int index = start_index + pos; + rtx orig_reg = XEXP (XVECEXP (regs, 0, index), 0); + machine_mode mode = GET_MODE (orig_reg); + rtx reg = NULL_RTX; + if (HARD_REGISTER_P (orig_reg)) + { + /* Reading from param hard reg need to be moved to a temp. */ + gcc_assert (!acc->writing); + reg = gen_reg_rtx (mode); + emit_move_insn (reg, orig_reg); + } + else + reg = orig_reg; + + HOST_WIDE_INT off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)); + tmps[pos] + = gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off - start_off)); + } + + /* There are some fields are in part of registers. */ + if (right_margin_bits != 0) + { + if (acc->writing) + break; + + gcc_assert ((right_margin_bits % BITS_PER_UNIT) == 0); + HOST_WIDE_INT off_byte + = UINTVAL (XEXP (XVECEXP (regs, 0, end_index), 1)) - start_off; + rtx orig_reg = XEXP (XVECEXP (regs, 0, end_index), 0); + machine_mode orig_mode = GET_MODE (orig_reg); + gcc_assert (GET_MODE_CLASS (orig_mode) == MODE_INT); + + machine_mode mode_aux[] = {SImode, HImode, QImode}; + HOST_WIDE_INT reg_size + = GET_MODE_BITSIZE (orig_mode).to_constant (); + HOST_WIDE_INT off_bits = 0; + for (unsigned long j = 0; + j < sizeof (mode_aux) / sizeof (mode_aux[0]); j++) + { + HOST_WIDE_INT submode_bitsize + = GET_MODE_BITSIZE (mode_aux[j]).to_constant (); + if (reg_size - right_margin_bits - off_bits + >= submode_bitsize) + { + rtx reg = gen_reg_rtx (orig_mode); + emit_move_insn (reg, orig_reg); + + poly_uint64 lowpart_off + = subreg_lowpart_offset (mode_aux[j], orig_mode); + int lowpart_off_bits + = lowpart_off.to_constant () * BITS_PER_UNIT; + int shift_bits = lowpart_off_bits >= off_bits + ? (lowpart_off_bits - off_bits) + : (off_bits - lowpart_off_bits); + if (shift_bits > 0) + reg = expand_shift (RSHIFT_EXPR, orig_mode, reg, + shift_bits, NULL, 1); + rtx subreg = gen_lowpart (mode_aux[j], reg); + rtx off = GEN_INT (off_byte); + tmps[pos++] + = gen_rtx_EXPR_LIST (mode_aux[j], subreg, off); + off_byte += submode_bitsize / BITS_PER_UNIT; + off_bits += submode_bitsize; + } + } + } + + /* Currently, PARALLELs with register elements for param/returns + are using BLKmode. */ + acc->rtx_val = gen_rtx_PARALLEL (TYPE_MODE (TREE_TYPE (acc->expr)), + gen_rtvec_v (pos, tmps)); + continue; + } + + /* The access corresponds to one reg. */ + if (end_index == start_index && left_margin_bits == 0 + && right_margin_bits == 0) + { + rtx orig_reg = XEXP (XVECEXP (regs, 0, start_index), 0); + rtx reg = NULL_RTX; + if (HARD_REGISTER_P (orig_reg)) + { + /* Reading from param hard reg need to be moved to a temp. */ + gcc_assert (!acc->writing); + reg = gen_reg_rtx (GET_MODE (orig_reg)); + emit_move_insn (reg, orig_reg); + } + else + reg = orig_reg; + if (GET_MODE (orig_reg) != expr_mode) + reg = gen_lowpart (expr_mode, reg); + + acc->rtx_val = reg; + continue; + } + + /* It is accessing a filed which is part of a register. */ + scalar_int_mode imode; + if (!acc->writing && end_index == start_index + && int_mode_for_size (acc->size, 1).exists (&imode)) + { + /* get and copy original register inside the param. */ + rtx orig_reg = XEXP (XVECEXP (regs, 0, start_index), 0); + machine_mode mode = GET_MODE (orig_reg); + gcc_assert (GET_MODE_CLASS (mode) == MODE_INT); + rtx reg = gen_reg_rtx (mode); + emit_move_insn (reg, orig_reg); + + /* shift to expect part. */ + poly_uint64 lowpart_off = subreg_lowpart_offset (imode, mode); + int lowpart_off_bits = lowpart_off.to_constant () * BITS_PER_UNIT; + int shift_bits = lowpart_off_bits >= left_margin_bits + ? (lowpart_off_bits - left_margin_bits) + : (left_margin_bits - lowpart_off_bits); + if (shift_bits > 0) + reg = expand_shift (RSHIFT_EXPR, mode, reg, shift_bits, NULL, 1); + + /* move corresond part subreg to result. */ + rtx subreg = gen_lowpart (imode, reg); + rtx result = gen_reg_rtx (imode); + emit_move_insn (result, subreg); + + if (expr_mode != imode) + result = gen_lowpart (expr_mode, result); + + acc->rtx_val = result; + continue; + } + + break; + } + + /* Some access expr(s) are not scalarized. */ + if (cur_access_index != n) + disqualify_candidate (base); + else + { + /* Add elements to expr->access map. */ + for (int j = 0; j < n; j++) + { + access_p access = (*access_vec)[j]; + expr_access_vec->put (access->expr, access); + } + } +} + +void +set_scalar_rtx_for_returns () +{ + tree res = DECL_RESULT (current_function_decl); + gcc_assert (res); + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) + { + tree val = gimple_return_retval (r); + if (val && VAR_P (val)) + set_scalar_rtx_for_aggregate_access (val, DECL_RTL (res)); + } +} + /* Return an expression tree corresponding to the RHS of GIMPLE statement STMT. */ @@ -3778,7 +4336,8 @@ expand_return (tree retval) /* If we are returning the RESULT_DECL, then the value has already been stored into it, so we don't have to do anything special. */ - if (TREE_CODE (retval_rhs) == RESULT_DECL) + if (TREE_CODE (retval_rhs) == RESULT_DECL + || get_scalar_rtx_for_aggregate_expr (retval_rhs)) expand_value_return (result_rtl); /* If the result is an aggregate that is being returned in one (or more) @@ -4422,6 +4981,9 @@ expand_debug_expr (tree exp) int unsignedp = TYPE_UNSIGNED (TREE_TYPE (exp)); addr_space_t as; scalar_int_mode op0_mode, op1_mode, addr_mode; + rtx x = get_scalar_rtx_for_aggregate_expr (exp); + if (x) + return NULL_RTX;/* optimized out. */ switch (TREE_CODE_CLASS (TREE_CODE (exp))) { @@ -6630,6 +7192,8 @@ pass_expand::execute (function *fun) avoid_deep_ter_for_debug (gsi_stmt (gsi), 0); } + prepare_expander_sra (); + /* Mark arrays indexed with non-constant indices with TREE_ADDRESSABLE. */ auto_bitmap forced_stack_vars; discover_nonconstant_array_refs (forced_stack_vars); @@ -7062,6 +7626,7 @@ pass_expand::execute (function *fun) loop_optimizer_finalize (); } + free_expander_sra (); timevar_pop (TV_POST_EXPAND); return 0; diff --git a/gcc/expr.cc b/gcc/expr.cc index 56b51876f80..b970f98e689 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -100,6 +100,7 @@ static void do_tablejump (rtx, machine_mode, rtx, rtx, rtx, static rtx const_vector_from_tree (tree); static tree tree_expr_size (const_tree); static void convert_mode_scalar (rtx, rtx, int); +rtx get_scalar_rtx_for_aggregate_expr (tree); /* This is run to set up which modes can be used @@ -5623,11 +5624,12 @@ expand_assignment (tree to, tree from, bool nontemporal) Assignment of an array element at a constant index, and assignment of an array element in an unaligned packed structure field, has the same problem. Same for (partially) storing into a non-memory object. */ - if (handled_component_p (to) - || (TREE_CODE (to) == MEM_REF - && (REF_REVERSE_STORAGE_ORDER (to) - || mem_ref_refers_to_non_mem_p (to))) - || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE) + if (!get_scalar_rtx_for_aggregate_expr (to) + && (handled_component_p (to) + || (TREE_CODE (to) == MEM_REF + && (REF_REVERSE_STORAGE_ORDER (to) + || mem_ref_refers_to_non_mem_p (to))) + || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE)) { machine_mode mode1; poly_int64 bitsize, bitpos; @@ -8995,6 +8997,9 @@ expand_expr_real (tree exp, rtx target, machine_mode tmode, ret = CONST0_RTX (tmode); return ret ? ret : const0_rtx; } + rtx x = get_scalar_rtx_for_aggregate_expr (exp); + if (x) + return x; ret = expand_expr_real_1 (exp, target, tmode, modifier, alt_rtl, inner_reference_p); diff --git a/gcc/function.cc b/gcc/function.cc index 82102ed78d7..262d3f17e72 100644 --- a/gcc/function.cc +++ b/gcc/function.cc @@ -2742,6 +2742,9 @@ assign_parm_find_stack_rtl (tree parm, struct assign_parm_data_one *data) data->stack_parm = stack_parm; } +extern void +set_scalar_rtx_for_aggregate_access (tree, rtx); + /* A subroutine of assign_parms. Adjust DATA->ENTRY_RTL such that it's always valid and contiguous. */ @@ -3117,8 +3120,21 @@ assign_parm_setup_block (struct assign_parm_data_all *all, emit_move_insn (mem, entry_parm); } else - move_block_from_reg (REGNO (entry_parm), mem, - size_stored / UNITS_PER_WORD); + { + int regno = REGNO (entry_parm); + int nregs = size_stored / UNITS_PER_WORD; + move_block_from_reg (regno, mem, nregs); + + rtx *tmps = XALLOCAVEC (rtx, nregs); + machine_mode mode = word_mode; + for (int i = 0; i < nregs; i++) + tmps[i] = gen_rtx_EXPR_LIST ( + VOIDmode, gen_rtx_REG (mode, regno + i), + GEN_INT (GET_MODE_SIZE (mode).to_constant () * i)); + + rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps)); + set_scalar_rtx_for_aggregate_access (parm, regs); + } } else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type)) { @@ -3718,6 +3734,10 @@ assign_parms (tree fndecl) else set_decl_incoming_rtl (parm, data.entry_parm, false); + rtx incoming = DECL_INCOMING_RTL (parm); + if (GET_CODE (incoming) == PARALLEL) + set_scalar_rtx_for_aggregate_access (parm, incoming); + assign_parm_adjust_stack_rtl (&data); if (assign_parm_setup_block_p (&data)) @@ -5037,6 +5057,7 @@ stack_protect_epilogue (void) the function's parameters, which must be run at any return statement. */ bool currently_expanding_function_start; +extern void set_scalar_rtx_for_returns (); void expand_function_start (tree subr) { @@ -5138,6 +5159,7 @@ expand_function_start (tree subr) { gcc_assert (GET_CODE (hard_reg) == PARALLEL); set_parm_rtl (res, gen_group_rtx (hard_reg)); + set_scalar_rtx_for_returns (); } } diff --git a/gcc/opts.cc b/gcc/opts.cc index 86b94d62b58..5e129a1cc49 100644 --- a/gcc/opts.cc +++ b/gcc/opts.cc @@ -1559,6 +1559,10 @@ public: vec m_values; }; +#ifdef __GNUC__ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wformat-truncation" +#endif /* Print help for a specific front-end, etc. */ static void print_filtered_help (unsigned int include_flags, @@ -1913,7 +1917,9 @@ print_filtered_help (unsigned int include_flags, printf ("\n\n"); } } - +#ifdef __GNUC__ +#pragma GCC diagnostic pop +#endif /* Display help for a specified type of option. The options must have ALL of the INCLUDE_FLAGS set ANY of the flags in the ANY_FLAGS set diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C index 769585052b5..c8995cae707 100644 --- a/gcc/testsuite/g++.target/powerpc/pr102024.C +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C @@ -5,7 +5,7 @@ // Test that a zero-width bit field in an otherwise homogeneous aggregate // generates a psabi warning and passes arguments in GPRs. -// { dg-final { scan-assembler-times {\mstd\M} 4 } } +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } struct a_thing { diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c new file mode 100644 index 00000000000..7dd1a4a326a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -save-temps" } */ + +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; } DF; +typedef struct SF {float a[4]; int i1; int i2; } SF; + +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ +short __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == 2)return a.s2+a.s3;return 0;} +int __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == 2)return a.i2+a.i1;return 0;} +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == 2)return arg.a[3];else return 0.0;} +float __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 2)return arg.a[2]; return 0;} +float __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 2)return arg.a[1];return 0;} + +DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4}; +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2}; + +int main() +{ + if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0 + && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0)) + __builtin_abort (); + if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0 + && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0)) + __builtin_abort (); + return 0; +} + diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c new file mode 100644 index 00000000000..4e1f87f7939 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c @@ -0,0 +1,6 @@ +/* PR target/65421 */ +/* { dg-options "-O2" } */ + +typedef struct LARGE {double a[4]; int arr[32];} LARGE; +LARGE foo (LARGE a){return a;} +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c new file mode 100644 index 00000000000..8a8e1a0e996 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c @@ -0,0 +1,32 @@ +/* PR target/65421 */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target powerpc_elfv2 } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +typedef struct FLOATS +{ + double a[3]; +} FLOATS; + +/* 3 lfd after returns also optimized */ +/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */ + +/* 3 stfd */ +void st_arg (FLOATS a, FLOATS *p) {*p = a;} +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ + +/* blr */ +FLOATS ret_arg (FLOATS a) {return a;} + +typedef struct MIX +{ + double a[2]; + long l; +} MIX; + +/* std 3 param regs to return slot */ +MIX ret_arg1 (MIX a) {return a;} +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */ + +/* count insns */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */