[PATCH-1,rs6000] Put constant into pseudo at expand when it needs two insns [PR86106]
Message ID | 22e83da3-a81f-dd61-c04b-a39b459a965f@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6FA493858032 for <patchwork@sourceware.org>; Thu, 16 Mar 2023 05:35:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6FA493858032 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678944910; bh=3sJfzULXcqdhF5WBjho3+r0vNrMFlvqpCfWhT5NCR18=; h=Date:Subject:To:Cc:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=wObmuXEznKnhwT3CACC8Y/kzTMGDF3pCNZMo4C9CMQWekW8Ix7IGDH9Py/w/g+xIS dbXr4NKi9/WjAwNQkc9VLNTlbNRH9svmSIraeVLo5MUi2/Mh0iEWSB2bo4ksXiwyLq nD7BJ0CoyGSLoBq+LR879+Dnpx1S9Hn6yfde5yJI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id CDAEF3858D38 for <gcc-patches@gcc.gnu.org>; Thu, 16 Mar 2023 05:34:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CDAEF3858D38 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32G5Xei1032452; Thu, 16 Mar 2023 05:34:41 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pbt9q3qes-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 Mar 2023 05:34:40 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 32G4w6pV030734; Thu, 16 Mar 2023 05:34:40 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pbt9q3qe7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 Mar 2023 05:34:40 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 32G17jAi009829; Thu, 16 Mar 2023 05:34:38 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma04fra.de.ibm.com (PPS) with ESMTPS id 3pbs5sr6gb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 Mar 2023 05:34:38 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 32G5YY4M12780064 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 16 Mar 2023 05:34:34 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CB85420049; Thu, 16 Mar 2023 05:34:34 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 508FE20043; Thu, 16 Mar 2023 05:34:33 +0000 (GMT) Received: from [9.200.144.91] (unknown [9.200.144.91]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 16 Mar 2023 05:34:33 +0000 (GMT) Message-ID: <22e83da3-a81f-dd61-c04b-a39b459a965f@linux.ibm.com> Date: Thu, 16 Mar 2023 13:34:32 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: [PATCH-1, rs6000] Put constant into pseudo at expand when it needs two insns [PR86106] To: gcc-patches <gcc-patches@gcc.gnu.org> Cc: Segher Boessenkool <segher@kernel.crashing.org>, David <dje.gcc@gmail.com>, "Kewen.Lin" <linkw@linux.ibm.com>, Peter Bergner <bergner@linux.ibm.com> Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ScT0nWzI9beO1hLbVQl-KfWGQAPdTh1M X-Proofpoint-GUID: 7qi5Kbtq-SfNDHcaDtd492p3o8H7HDZX X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-16_03,2023-03-15_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 priorityscore=1501 mlxscore=0 malwarescore=0 spamscore=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=999 bulkscore=0 impostorscore=0 adultscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303150002 definitions=main-2303160046 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: HAO CHEN GUI via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: HAO CHEN GUI <guihaoc@linux.ibm.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[PATCH-1,rs6000] Put constant into pseudo at expand when it needs two insns [PR86106]
|
|
Commit Message
HAO CHEN GUI
March 16, 2023, 5:34 a.m. UTC
Hi, Currently, rs6000 directly expands to 2 insns if an integer constant is the second operand and it needs two insns. For example, addi/addis and ori/oris. It may not benefit when the constant is used for more than 2 times in an extended basic block, just like the case in PR shows. One possible solution is to force the constant in pseudo at expand and let propagation pass and combine pass decide if the pseudo should be replaced with the constant or not by comparing the rtx/insn cost. It generates a constant move if the constant is forced to a pseudo. There is one constant move if it's used only once. The combine pass can combine the constant move and add/ior/xor insn and eliminate the move as the insn cost reduces. There are multiple moves if the constant is used for several times. In an extended basic block, these constant moves are merged to one by propagation pass. The combine pass can't replace the pseudo with the constant as it is no cost saving. In an extreme case, the constant is used twice in an extended basic block. The cost(latency) is unchanged between putting constant in pseudo and generating 2 insns. The dependence of instructions reduces but one more register is used. In other case, it should be always optimal to put constant in a pseudo. This patch changes the expander of integer add and force constant to a pseudo when it needs 2 insn. Also a combine and split pattern is defined. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog 2023-03-14 Haochen Gui <guihaoc@linux.ibm.com> gcc/ * config/rs6000/predicates.md (add_2insn_cint_operand): New predicate which returns true when op is a 32-bit but not a 16-bit signed integer constant. * config/rs6000/rs6000.md (add<mode>3): Put the second operand into register when it's a constant and need 2 add insns. (*add<mode>_2insn): New insn_and_split for 2-insn add. patch.diff
Comments
On Thu, Mar 16, 2023 at 6:35 AM HAO CHEN GUI via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi, > Currently, rs6000 directly expands to 2 insns if an integer constant is the > second operand and it needs two insns. For example, addi/addis and ori/oris. > It may not benefit when the constant is used for more than 2 times in an > extended basic block, just like the case in PR shows. > > One possible solution is to force the constant in pseudo at expand and let > propagation pass and combine pass decide if the pseudo should be replaced > with the constant or not by comparing the rtx/insn cost. > > It generates a constant move if the constant is forced to a pseudo. There > is one constant move if it's used only once. The combine pass can combine > the constant move and add/ior/xor insn and eliminate the move as the insn > cost reduces. There are multiple moves if the constant is used for several > times. In an extended basic block, these constant moves are merged to one by > propagation pass. The combine pass can't replace the pseudo with the constant > as it is no cost saving. > > In an extreme case, the constant is used twice in an extended basic block. > The cost(latency) is unchanged between putting constant in pseudo and > generating 2 insns. The dependence of instructions reduces but one more > register is used. In other case, it should be always optimal to put constant > in a pseudo. > > This patch changes the expander of integer add and force constant to a > pseudo when it needs 2 insn. Also a combine and split pattern is defined. So this is one way around the lack of CSE/PRE of constant operands. I'd argue that a better spot for this _might_ be LRA (split the constant out if there's a free register available), postreload-[g]cse (CSE the constants) and then maybe cprop_hardreg to combine back single-use constants? I'm not sure if careful constraints massaging like adding magic letters to alternatives with constants to pessimize them for LRA, making them more expensive than spilling the constant to a register but avoid secondary reloads with spilling a register to the stack to make room for the constant, is possible - but in theory a special constraint modifier for this purpose could be invented. Richard. > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > > Thanks > Gui Haochen > > ChangeLog > 2023-03-14 Haochen Gui <guihaoc@linux.ibm.com> > > gcc/ > * config/rs6000/predicates.md (add_2insn_cint_operand): New predicate > which returns true when op is a 32-bit but not a 16-bit signed > integer constant. > * config/rs6000/rs6000.md (add<mode>3): Put the second operand into > register when it's a constant and need 2 add insns. > (*add<mode>_2insn): New insn_and_split for 2-insn add. > > > patch.diff > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index a1764018545..09e59a48cd3 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -282,6 +282,13 @@ (define_predicate "s32bit_cint_operand" > (and (match_code "const_int") > (match_test "(0x80000000 + UINTVAL (op)) >> 32 == 0"))) > > +;; Return 1 if op is a 32-bit but not 16-bit constant signed integer > +(define_predicate "add_2insn_cint_operand" > + (and (match_code "const_int") > + (and (match_operand 0 "s32bit_cint_operand") > + (and (not (match_operand 0 "short_cint_operand")) > + (not (match_operand 0 "upper16_cint_operand")))))) > + > ;; Return 1 if op is a constant 32-bit unsigned > (define_predicate "c32bit_cint_operand" > (and (match_code "const_int") > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 6011f5bf76a..dba41e3df90 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -1796,12 +1796,44 @@ (define_expand "add<mode>3" > /* The ordering here is important for the prolog expander. > When space is allocated from the stack, adding 'low' first may > produce a temporary deallocation (which would be bad). */ > - emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest))); > - emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low))); > - DONE; > + if (!can_create_pseudo_p ()) > + { > + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest))); > + emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low))); > + DONE; > + } > + > + operands[2] = force_reg (<MODE>mode, operands[2]); > } > }) > > +/* The ordering here is important for the prolog expander. > + When space is allocated from the stack, adding 'low' first may > + produce a temporary deallocation (which would be bad). */ > + > +(define_insn_and_split "*add<mode>_2insn" > + [(set (match_operand:GPR 0 "gpc_reg_operand" "=b") > + (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%b") > + (match_operand:GPR 2 "add_2insn_cint_operand" "n")))] > + "!TARGET_PREFIXED" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (plus:GPR (match_dup 1) > + (match_dup 3))) > + (set (match_dup 0) > + (plus:GPR (match_dup 0) > + (match_dup 4)))] > +{ > + HOST_WIDE_INT val = INTVAL (operands[2]); > + HOST_WIDE_INT low = sext_hwi (val, 16); > + HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode); > + > + operands[3] = GEN_INT (rest); > + operands[4] = GEN_INT (low); > +} > + [(set_attr "length" "8")]) > + > (define_insn "*add<mode>3" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r") > (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
Hi Richard, 在 2023/3/16 15:57, Richard Biener 写道: > So this is one way around the lack of CSE/PRE of constant operands. I'd > argue that a better spot for this _might_ be LRA (split the constant out if > there's a free register available), postreload-[g]cse (CSE the constants) and > then maybe cprop_hardreg to combine back single-use constants? > > I'm not sure if careful constraints massaging like adding magic letters to > alternatives with constants to pessimize them for LRA, making them > more expensive than spilling the constant to a register but avoid > secondary reloads with spilling a register to the stack to make room > for the constant, is possible - but in theory a special constraint modifier > for this purpose could be invented. Thanks so much for your advice. cse/gcse doesn't take cost of constant set (the def insn of the constant) into consideration. So it won't replace the register with a constant as it costs 1 insn with the register and costs 2 insn with the constant. Finally, the single- use constants can't be back to 2 insn. Not sure if I understand it correctly. Looking forward to your advice. Thanks Gui Haochen
On Thu, Mar 16, 2023 at 10:04 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote: > > Hi Richard, > > 在 2023/3/16 15:57, Richard Biener 写道: > > So this is one way around the lack of CSE/PRE of constant operands. I'd > > argue that a better spot for this _might_ be LRA (split the constant out if > > there's a free register available), postreload-[g]cse (CSE the constants) and > > then maybe cprop_hardreg to combine back single-use constants? > > > > I'm not sure if careful constraints massaging like adding magic letters to > > alternatives with constants to pessimize them for LRA, making them > > more expensive than spilling the constant to a register but avoid > > secondary reloads with spilling a register to the stack to make room > > for the constant, is possible - but in theory a special constraint modifier > > for this purpose could be invented. > > Thanks so much for your advice. > > cse/gcse doesn't take cost of constant set (the def insn of the constant) into > consideration. So it won't replace the register with a constant as it costs 1 > insn with the register and costs 2 insn with the constant. I think it does (and should) cost the constant set (IIRC we had some improvements there, or at least proposed, during this stage1). But sure - this is why your "trick" works. > Finally, the single- > use constants can't be back to 2 insn. And that's because of the issue you point out above? > Not sure if I understand it correctly. > Looking forward to your advice. My main point is that CSEing constants has impacts on register pressure and thus should probably be done after or within register allocation. RTL expansion itself is probably a bad time to pro-actively split out constants even more if, as you say, nothing puts them back. Richard. > Thanks > Gui Haochen
Hi Richard, 在 2023/3/16 18:36, Richard Biener 写道: > On Thu, Mar 16, 2023 at 10:04 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote: >> >> Hi Richard, >> >> 在 2023/3/16 15:57, Richard Biener 写道: >>> So this is one way around the lack of CSE/PRE of constant operands. I'd >>> argue that a better spot for this _might_ be LRA (split the constant out if >>> there's a free register available), postreload-[g]cse (CSE the constants) and >>> then maybe cprop_hardreg to combine back single-use constants? >>> >>> I'm not sure if careful constraints massaging like adding magic letters to >>> alternatives with constants to pessimize them for LRA, making them >>> more expensive than spilling the constant to a register but avoid >>> secondary reloads with spilling a register to the stack to make room >>> for the constant, is possible - but in theory a special constraint modifier >>> for this purpose could be invented. >> >> Thanks so much for your advice. >> >> cse/gcse doesn't take cost of constant set (the def insn of the constant) into >> consideration. So it won't replace the register with a constant as it costs 1 >> insn with the register and costs 2 insn with the constant. > > I think it does (and should) cost the constant set (IIRC we had some > improvements > there, or at least proposed, during this stage1). But sure - this is why your > "trick" works. > It's doable if post-reload gsc costs the constant set. I will draft a patch to test it. >> Finally, the single- >> use constants can't be back to 2 insn. > > And that's because of the issue you point out above? No. my original concern is the constant can't be back. If post-reload gsc doen't cost the constant set, the insn with a register always cost less than two insns with immediates. Commonly the constant set itself costs 2 insn also. > >> Not sure if I understand it correctly. >> Looking forward to your advice. > > My main point is that CSEing constants has impacts on register pressure > and thus should probably be done after or within register allocation. RTL > expansion itself is probably a bad time to pro-actively split out constants > even more if, as you say, nothing puts them back. > I agree. Thank a lot. > Richard. > >> Thanks >> Gui Haochen Gui Haochen
Hi Richard, 在 2023/3/16 15:57, Richard Biener 写道: > I'm not sure if careful constraints massaging like adding magic letters to > alternatives with constants to pessimize them for LRA, making them > more expensive than spilling the constant to a register but avoid > secondary reloads with spilling a register to the stack to make room > for the constant, is possible - but in theory a special constraint modifier > for this purpose could be invented. I have made some tests on constraint modifiers. They all seems not work. By checking the code, I found that the no reloading is always better than reloading in LRA. So there is no way to spill the constant to register in LRA. /* If this alternative can be made to work by reloading, and it needs less reloading than the others checked so far, record it as the chosen goal for reloading. */ if ((best_losers != 0 && losers == 0) || (((best_losers == 0 && losers == 0) || (best_losers != 0 && losers != 0)) && (best_overall > overall || (best_overall == overall ... // set goal_alt Looking forward to your advice. Thanks Gui Haochen
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index a1764018545..09e59a48cd3 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -282,6 +282,13 @@ (define_predicate "s32bit_cint_operand" (and (match_code "const_int") (match_test "(0x80000000 + UINTVAL (op)) >> 32 == 0"))) +;; Return 1 if op is a 32-bit but not 16-bit constant signed integer +(define_predicate "add_2insn_cint_operand" + (and (match_code "const_int") + (and (match_operand 0 "s32bit_cint_operand") + (and (not (match_operand 0 "short_cint_operand")) + (not (match_operand 0 "upper16_cint_operand")))))) + ;; Return 1 if op is a constant 32-bit unsigned (define_predicate "c32bit_cint_operand" (and (match_code "const_int") diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 6011f5bf76a..dba41e3df90 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -1796,12 +1796,44 @@ (define_expand "add<mode>3" /* The ordering here is important for the prolog expander. When space is allocated from the stack, adding 'low' first may produce a temporary deallocation (which would be bad). */ - emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest))); - emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low))); - DONE; + if (!can_create_pseudo_p ()) + { + emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest))); + emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low))); + DONE; + } + + operands[2] = force_reg (<MODE>mode, operands[2]); } }) +/* The ordering here is important for the prolog expander. + When space is allocated from the stack, adding 'low' first may + produce a temporary deallocation (which would be bad). */ + +(define_insn_and_split "*add<mode>_2insn" + [(set (match_operand:GPR 0 "gpc_reg_operand" "=b") + (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%b") + (match_operand:GPR 2 "add_2insn_cint_operand" "n")))] + "!TARGET_PREFIXED" + "#" + "&& 1" + [(set (match_dup 0) + (plus:GPR (match_dup 1) + (match_dup 3))) + (set (match_dup 0) + (plus:GPR (match_dup 0) + (match_dup 4)))] +{ + HOST_WIDE_INT val = INTVAL (operands[2]); + HOST_WIDE_INT low = sext_hwi (val, 16); + HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode); + + operands[3] = GEN_INT (rest); + operands[4] = GEN_INT (low); +} + [(set_attr "length" "8")]) + (define_insn "*add<mode>3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r") (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")