From patchwork Thu Nov 17 06:15:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 60736 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D6A33382E504 for ; Thu, 17 Nov 2022 06:16:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D6A33382E504 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668665787; bh=Qw+6/WqmDQO9FSsLY0qYHb+qhjymg49Qutw2FpYFqRY=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=HKCnJOyrALFU7dROyG93ylC2NCzAFgfahX5yl+nK3KytjFAFCWsOTJ9zkp317weiF vSFWr8+7YCjOv9XOQBmYeA52Ayzoy/eCoYMyKsJxF5+vHrxKJ99nS+Rjd4/5QXUYiI omnb2AKZrahVbiWahDrMil1hDvjX6317YMOxgBzo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 1E1683853573; Thu, 17 Nov 2022 06:15:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1E1683853573 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AH5L6dB025648; Thu, 17 Nov 2022 06:15:56 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kweqw15gg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 17 Nov 2022 06:15:56 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AH5Mx8F032265; Thu, 17 Nov 2022 06:15:55 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kweqw15g0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 17 Nov 2022 06:15:55 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AH66tml004398; Thu, 17 Nov 2022 06:15:53 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma06fra.de.ibm.com with ESMTP id 3kt2rj5acc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 17 Nov 2022 06:15:53 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AH69oUi49414538 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 17 Nov 2022 06:09:50 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 61551A4040; Thu, 17 Nov 2022 06:15:51 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 64328A4055; Thu, 17 Nov 2022 06:15:50 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 17 Nov 2022 06:15:50 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, guojiufu@linux.ibm.com, rguenther@suse.de, jeffreyalaw@gmail.com Subject: [PATCH V2] Use subscalar mode to move struct block for parameter Date: Thu, 17 Nov 2022 14:15:49 +0800 Message-Id: <20221117061549.178481-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: JNLSWVzjFgYdhlkORu_gaCCLFX3kJxg7 X-Proofpoint-GUID: H8dEFiaqLxZUqna_IBaS2aM86_PfuvvI X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-17_02,2022-11-16_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 clxscore=1011 mlxlogscore=999 priorityscore=1501 phishscore=0 impostorscore=0 adultscore=0 malwarescore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211170045 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jiufu Guo via Gcc-patches From: Jiufu Guo Reply-To: Jiufu Guo Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, As mentioned in the previous version patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html The suboptimal code is generated for "assigning from parameter" or "assigning to return value". This patch enhances the assignment from parameters like the below cases: /////case1.c typedef struct SA {double a[3];long l; } A; A ret_arg (A a) {return a;} void st_arg (A a, A *p) {*p = a;} ////case2.c typedef struct SA {double a[3];} A; A ret_arg (A a) {return a;} void st_arg (A a, A *p) {*p = a;} For this patch, bootstrap and regtest pass on ppc64{,le} and x86_64. * Besides asking for help reviewing this patch, I would like to consult comments about enhancing for "assigning to returns". On some targets(ppc64), for below case: ////case3.c typedef struct SA {double a[3]; long l; } A; A ret_arg_pt (A *a) {return *a;} The optimized GIMPLE code looks like: = *a_2(D); return ; Here, (aka. RESULT_DECL) is MEM, and "aggregate_value_p" returns true for . * While for below case, the generated code is still suboptimal. ////case4.c typedef struct SA {double a[3];} A; A ret_arg_pt (A *a) {return *a;} The optimized GIMPLE code looks like: D.3951 = *a_2(D); return D.3951; The "return/assign" stmts are using D.3951(VAR_DECL) instead "(RESULT_DECL)". The mode of D.3951/ is BLK. The RTL of D.3951 is MEM, and RTL of is PARALLEL. For PARALLEL, aggregate_value_p returns false. In function expand_assignment, there is code: if (TREE_CODE (to) == RESULT_DECL && (REG_P (to_rtx) || GET_CODE (to_rtx) == PARALLEL)) This code can handle "", but can not handle "D.3951". I'm thinking of one way to handle this issue is to update the GIMPLE sequence as: " = *a_2(D); return ;" Or, collecting VARs which are used by return stmts; and for assignments to those VARs, using sub scalar mode for the block move. Thanks for any comments and suggestions! BR, Jeff (Jiufu) --- gcc/expr.cc | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/gcc/expr.cc b/gcc/expr.cc index d9407432ea5..420f9cf3662 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -6045,6 +6045,46 @@ expand_assignment (tree to, tree from, bool nontemporal) return; } + if (TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from) + && TYPE_MODE (TREE_TYPE (from)) == BLKmode + && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL + || REG_P (DECL_INCOMING_RTL (from)))) + { + rtx parm = DECL_INCOMING_RTL (from); + + push_temp_slots (); + machine_mode mode; + mode = GET_CODE (parm) == PARALLEL + ? GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0)) + : word_mode; + int mode_size = GET_MODE_SIZE (mode).to_constant (); + int size = INTVAL (expr_size (from)); + + /* If/How the parameter using submode, it dependes on the size and + position of the parameter. Here using heurisitic number. */ + int hurstc_num = 8; + if (size < mode_size || (size % mode_size) != 0 + || size > (mode_size * hurstc_num)) + result = store_expr (from, to_rtx, 0, nontemporal, false); + else + { + rtx from_rtx + = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL); + for (int i = 0; i < size / mode_size; i++) + { + rtx temp = gen_reg_rtx (mode); + rtx src = adjust_address (from_rtx, mode, mode_size * i); + rtx dest = adjust_address (to_rtx, mode, mode_size * i); + emit_move_insn (temp, src); + emit_move_insn (dest, temp); + } + result = to_rtx; + } + preserve_temp_slots (result); + pop_temp_slots (); + return; + } + /* Compute FROM and store the value in the rtx we got. */ push_temp_slots ();