Message ID | 16fa34b8-ad8a-20f2-b285-3b3f5bf5d5b2@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E2248385B539 for <patchwork@sourceware.org>; Fri, 17 Feb 2023 16:59:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E2248385B539 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1676653170; bh=Zg1I2liHiIfciw1oc4GjgbN1Io4O7VyLbbXJWvWWdC0=; h=Date:Subject:To:Cc:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=UcXUkG1ZE5VNX4YZQ4MfdPlulu3MGJoVGCI4bRPmtq4j7jbPs3/JOo/C7LlXuhqBd J1I1cpMB1mDSOuqa98uzaVUqUEvpCFoNBo4OQZlWKakERIxMQs7wFo3Kl8EKFD4Bhr zVn7gDsL9FjNxnyLLIDQnlI1qiQ2Ihp89kikk3fE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 92828385781F for <gcc-patches@gcc.gnu.org>; Fri, 17 Feb 2023 16:58:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 92828385781F Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HGEnfg012552; Fri, 17 Feb 2023 16:58:47 GMT Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3ntcx3s16e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 16:58:47 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31HFomur000986; Fri, 17 Feb 2023 16:58:46 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([9.208.130.100]) by ppma04dal.us.ibm.com (PPS) with ESMTPS id 3np2n80tuw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 16:58:46 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HGwiej8454906 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 16:58:45 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D8F6A58061; Fri, 17 Feb 2023 16:58:44 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7691458063; Fri, 17 Feb 2023 16:58:43 +0000 (GMT) Received: from [9.43.123.201] (unknown [9.43.123.201]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 16:58:43 +0000 (GMT) Message-ID: <16fa34b8-ad8a-20f2-b285-3b3f5bf5d5b2@linux.ibm.com> Date: Fri, 17 Feb 2023 22:28:41 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.7.2 Content-Language: en-US Subject: [PATCH] rs6000: fmr gets used instead of faster xxlor [PR93571] To: gcc-patches <gcc-patches@gcc.gnu.org> Cc: bergner@linux.ibm.com, ", Segher Boessenkool" <segher@kernel.crashing.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: YPMe7Ruv6DCYe3cuf9dfVM0miCHtSoZV X-Proofpoint-ORIG-GUID: YPMe7Ruv6DCYe3cuf9dfVM0miCHtSoZV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_11,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 mlxlogscore=911 impostorscore=0 phishscore=0 spamscore=0 lowpriorityscore=0 mlxscore=0 clxscore=1011 bulkscore=0 priorityscore=1501 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170148 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Ajit Agarwal via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Ajit Agarwal <aagarwa1@linux.ibm.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
rs6000: fmr gets used instead of faster xxlor [PR93571]
|
|
Commit Message
Ajit Agarwal
Feb. 17, 2023, 4:58 p.m. UTC
Hello All: This patch replaces fmr instruction (6 cycles) with xxlor instruction ( 2 cycles) Bootstrapped and regtested on powerpc64-linux-gnu. copyright assignment form is still in the process of being sent. Thanks & Regards Ajit rs6000: fmr gets used instead of faster xxlor [PR93571] This patch replaces 6 cycles fmr instruction with xxlor 2 cycles. 2023-02-17 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> gcc/ChangeLog: * config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction. --- gcc/config/rs6000/rs6000.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Fri, Feb 17, 2023 at 8:59 AM Ajit Agarwal via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > Hello All: > > This patch replaces fmr instruction (6 cycles) with xxlor instruction ( 2 cycles) > Bootstrapped and regtested on powerpc64-linux-gnu. I don't think this can be unconditionally replaced. xxlor only exists in newer Power ISA. Thanks, Andrew Pinski > > copyright assignment form is still in the process of being sent. > > Thanks & Regards > Ajit > > rs6000: fmr gets used instead of faster xxlor [PR93571] > > This patch replaces 6 cycles fmr instruction with xxlor > 2 cycles. > > 2023-02-17 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> > > gcc/ChangeLog: > > * config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction. > --- > gcc/config/rs6000/rs6000.md | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 4a7812fa592..dfd6c73ffcb 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -8436,7 +8436,7 @@ > "@ > stfd%U0%X0 %1,%0 > lfd%U1%X1 %0,%1 > - fmr %0,%1 > + xxlor %0,%1,%1 > lxsd %0,%1 > stxsd %1,%0 > lxsdx %x0,%y1 > -- > 2.31.1 > >
Hi! On Fri, Feb 17, 2023 at 10:28:41PM +0530, Ajit Agarwal wrote: > This patch replaces fmr instruction (6 cycles) with xxlor instruction ( 2 cycles) > Bootstrapped and regtested on powerpc64-linux-gnu. You tested this on a CPU that does have VSX. It is incorrect on other (older) CPUs. > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -8436,7 +8436,7 @@ > "@ > stfd%U0%X0 %1,%0 > lfd%U1%X1 %0,%1 > - fmr %0,%1 > + xxlor %0,%1,%1 > lxsd %0,%1 > stxsd %1,%0 > lxsdx %x0,%y1 This is the *mov<mode>_hardfloat64 pattern. You can add some magic to your Git config so that will show in the patch: in .git/config: [diff "md"] xfuncname = "^\\(define.*$" (As it says in .gitattributes: # Make diff on MD files use "(define" as a function marker. # Use together with git config diff.md.xfuncname '^\(define.*$' # which is run by contrib/gcc-git-customization.sh too. *.md diff=md ) The third alternative to this insn, the fmr one, has "d" as both input and output constraint, and has "*" as isa attribute, so it will be used on any CPU that has floating point registers. The eight alternative (the existing xxlor one) has "wa" constraints (via <f64_vsx>) so it implicitly requires VSX to be enabled. You need to do something similar for what you want, but you also need to still allow fmr. Segher
Hello All: This patch replaces fmr instruction 6 cycles with 2 cycles xxlor instruction for p7 and p8 architecture. I have implemented with switch and cases otherwise it is difficult to accommodate xxlor with p7 and p8 and fmr for other architectures. Bootstrapped and regtested. Thanks & Regards Ajit rs6000: fmr gets used instead of faster xxlor [PR93571] This patch replaces 6 cycles fmr instruction with xxlor 2 cycles in p8 and p7 architecture. 2023-02-21 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> gcc/ChangeLog: * config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction. --- gcc/config/rs6000/rs6000.md | 49 ++++++++++++++++++++++--------------- 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index dfd6c73ffcb..ef587033367 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -8433,26 +8433,35 @@ (define_insn "*mov<mode>_hardfloat64" "TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], <MODE>mode) || gpc_reg_operand (operands[1], <MODE>mode))" - "@ - stfd%U0%X0 %1,%0 - lfd%U1%X1 %0,%1 - xxlor %0,%1,%1 - lxsd %0,%1 - stxsd %1,%0 - lxsdx %x0,%y1 - stxsdx %x1,%y0 - xxlor %x0,%x1,%x1 - xxlxor %x0,%x0,%x0 - li %0,0 - std%U0%X0 %1,%0 - ld%U1%X1 %0,%1 - mr %0,%1 - mt%0 %1 - mf%1 %0 - nop - mfvsrd %0,%x1 - mtvsrd %x0,%1 - #" +{ + switch (which_alternative) { + case 0 : return "stfd%U0%X0 %1,%0"; + case 1 : return "lfd%U1%X1 %0,%1"; + case 2 : if ((TARGET_VSX || TARGET_P8_VECTOR) + && !TARGET_P9_VECTOR + && !TARGET_POWER10) + return "xxlor %0,%1,%1"; + else + return "fmr %0,%1"; + + case 3 : return "lxsd %0,%1"; + case 4 : return "stxsd %1,%0"; + case 5 : return "lxsdx %x0,%y1"; + case 6 : return "stxsdx %x1,%y0"; + case 7 : return "xxlor %x0,%x1,%x1"; + case 8 : return "xxlxor %x0,%x0,%x0"; + case 9 : return "li %0,0"; + case 10 : return "std%U0%X0 %1,%0"; + case 11 : return "ld%U1%X1 %0,%1"; + case 12 : return "mr %0,%1"; + case 13 : return "mt%0 %1"; + case 14 : return "mf%1 %0"; + case 15 : return "nop"; + case 16: return "mfvsrd %0,%x1"; + case 17 : return "mtvsrd %x0,%1"; + } + return "unreachable"; +} [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, integer,
Hi! On Tue, Feb 21, 2023 at 02:18:25PM +0530, Ajit Agarwal wrote: > This patch replaces fmr instruction 6 cycles with 2 cycles xxlor instruction > for p7 and p8 architecture. > > I have implemented with switch and cases otherwise it is difficult to accommodate > xxlor with p7 and p8 and fmr for other architectures. Please domn't use a switch, it isn't needed. Instead use the "isa" attribute (with p7v here), and put the preferred alternative first. > rs6000: fmr gets used instead of faster xxlor [PR93571] rs6000: Use xxlor instead of fmr where possible > This patch replaces 6 cycles fmr instruction with xxlor > 2 cycles in p8 and p7 architecture. No, it also does it on all later architectures. Do you have any actual timings (i.e. from hardware, not documentation)? > * config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction. Line too long. And, that is not what the patch does. Changelog should be totally boring just saying what the patch changes. If the patch changes things other than what thechangelog says your reviewer will think something went missin somewhere :-) > - "@ > - stfd%U0%X0 %1,%0 > - lfd%U1%X1 %0,%1 > - xxlor %0,%1,%1 That is not what is currently in trunk, so your patch cannot apply. > + switch (which_alternative) { > + case 0 : return "stfd%U0%X0 %1,%0"; > + case 1 : return "lfd%U1%X1 %0,%1"; Formatting is all incorrect. We dom't need or want a switch at all, but correct would be: switch (which_alternative) { case 0: return "stfd%U0%X0 %1,%0"; case 1: return "lfd%U1%X1 %0,%1"; etc. > + case 2 : if ((TARGET_VSX || TARGET_P8_VECTOR) > + && !TARGET_P9_VECTOR > + && !TARGET_POWER10) > + return "xxlor %0,%1,%1"; > + else > + return "fmr %0,%1"; Ah, so you are excluding p9 and p10 here. Hrm. That should be written TARGET_VSX && !TARGET_P9_VECTOR, none of the rest is needed; but is that a good idea at all? Please use %xN for VSX arguments whenever possible. If this alternative allows only the low numbered vector registers, that is a hint that you probably should write this differently (and %xN is harmless then). > + return "unreachable"; No, never do that. There is "gcc_unreachable ()" if you need it. So, let's first do actual timings, and see if it is better on p9 and p10 as well (or at least not worse). Segher
Hello Segher: On 21/02/23 4:34 pm, Segher Boessenkool wrote: > Hi! > > On Tue, Feb 21, 2023 at 02:18:25PM +0530, Ajit Agarwal wrote: >> This patch replaces fmr instruction 6 cycles with 2 cycles xxlor instruction >> for p7 and p8 architecture. >> >> I have implemented with switch and cases otherwise it is difficult to accommodate >> xxlor with p7 and p8 and fmr for other architectures. > > Please domn't use a switch, it isn't needed. Instead use the "isa" > attribute (with p7v here), and put the preferred alternative first. I am not sure how this is possible without switch and using only "isa". > >> rs6000: fmr gets used instead of faster xxlor [PR93571] > > rs6000: Use xxlor instead of fmr where possible > >> This patch replaces 6 cycles fmr instruction with xxlor >> 2 cycles in p8 and p7 architecture. > > No, it also does it on all later architectures. > > Do you have any actual timings (i.e. from hardware, not documentation)? > >> * config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction. > > Line too long. And, that is not what the patch does. Changelog should > be totally boring just saying what the patch changes. If the patch > changes things other than what thechangelog says your reviewer will > think something went missin somewhere :-) I will correct this. > >> - "@ >> - stfd%U0%X0 %1,%0 >> - lfd%U1%X1 %0,%1 >> - xxlor %0,%1,%1 > > That is not what is currently in trunk, so your patch cannot apply. > >> + switch (which_alternative) { >> + case 0 : return "stfd%U0%X0 %1,%0"; >> + case 1 : return "lfd%U1%X1 %0,%1"; > > Formatting is all incorrect. We dom't need or want a switch at all, but > correct would be: > switch (which_alternative) > { > case 0: > return "stfd%U0%X0 %1,%0"; > case 1: > return "lfd%U1%X1 %0,%1"; > > etc. I will correct that. > >> + case 2 : if ((TARGET_VSX || TARGET_P8_VECTOR) >> + && !TARGET_P9_VECTOR >> + && !TARGET_POWER10) >> + return "xxlor %0,%1,%1"; >> + else >> + return "fmr %0,%1"; > > Ah, so you are excluding p9 and p10 here. Hrm. That should be written > TARGET_VSX && !TARGET_P9_VECTOR, none of the rest is needed; but is that > a good idea at all? > > Please use %xN for VSX arguments whenever possible. If this alternative > allows only the low numbered vector registers, that is a hint that you > probably should write this differently (and %xN is harmless then). > >> + return "unreachable"; > > No, never do that. There is "gcc_unreachable ()" if you need it. > I will also correct this. > So, let's first do actual timings, and see if it is better on p9 and > p10 as well (or at least not worse). > > > Segher Thanks & Regards Ajit
On Tue, Feb 21, 2023 at 06:00:52PM +0530, Ajit Agarwal wrote: > On 21/02/23 4:34 pm, Segher Boessenkool wrote: > > Please domn't use a switch, it isn't needed. Instead use the "isa" > > attribute (with p7v here), and put the preferred alternative first. > > I am not sure how this is possible without switch and using only "isa". You have the "p7v" "xxlor" alternative earlier than the "*" "fmr" alternative. You can have an "xxlor" for contraints "d", but probably the best (and certainly the easiest) is to just move the existing xxlor to before fmr. Oh, the existing xxlor alternative is implicitly isa p7v, the "wa" constraint causes that. It may be nicer to mark it explicitly p7v as well, nicer for the reader. Btw, please update the other similar patterns at the same time? There are eight patterns with fmr in rs6000.md (the four in dfp.md should probably not be touched); not all are similar so should be in separate patches, if changed at all, but a bunch are completely analogous so should not diverge. (It is fine to first do this one pattern only, until we have worked out all kinks, but all should be committed at the same time). Thanks, Segher
On 21/02/23 7:39 pm, Segher Boessenkool wrote: > On Tue, Feb 21, 2023 at 06:00:52PM +0530, Ajit Agarwal wrote: >> On 21/02/23 4:34 pm, Segher Boessenkool wrote: >>> Please domn't use a switch, it isn't needed. Instead use the "isa" >>> attribute (with p7v here), and put the preferred alternative first. >> >> I am not sure how this is possible without switch and using only "isa". > > You have the "p7v" "xxlor" alternative earlier than the "*" "fmr" > alternative. You can have an "xxlor" for contraints "d", but probably > the best (and certainly the easiest) is to just move the existing > xxlor to before fmr. > > Oh, the existing xxlor alternative is implicitly isa p7v, the "wa" > constraint causes that. It may be nicer to mark it explicitly p7v as > well, nicer for the reader. > If I do the above, for power9 it selects xxlor instead of fmr. > Btw, please update the other similar patterns at the same time? There > are eight patterns with fmr in rs6000.md (the four in dfp.md should > probably not be touched); not all are similar so should be in separate > patches, if changed at all, but a bunch are completely analogous so > should not diverge. > > (It is fine to first do this one pattern only, until we have worked out > all kinks, but all should be committed at the same time). > > Thanks, > > > Segher
Hello All: Here is the patch that uses xxlor instead of fmr where possible. Performance results shows that fmr is better in power9 and power10 architectures whereas xxlor is better in power7 and power 8 architectures. Bootstrapped and regtested powepc64-linux-gnu. Thanks & Regards Ajit rs6000: Use xxlor instead of fmr where possible This patch replaces fmr with xxlor instruction for power7 and power8 architectures whereas for power9 and power10 replaces xxlor with fmr instruction. Perf measurement results: Power9 fmr: 201,847,661 cycles. Power9 xxlor: 201,877,78 cycles. Power8 fmr: 201,057,795 cycles. Power8 xxlor: 201,004,671 cycles. 2023-02-24 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> gcc/ChangeLog: * config/rs6000/rs6000.md (*movdf_hardfloat64): Use xxlor for power7 and power8 and fmr for power9 and power10. --- gcc/config/rs6000/rs6000.md | 46 +++++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 81bffb04ceb..1253b8622a7 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -354,7 +354,7 @@ (define_attr "cpu" (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) ;; The ISA we implement. -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p7p8,p10" (const_string "any")) ;; Is this alternative enabled for the current CPU/ISA/etc.? @@ -402,6 +402,11 @@ (define_attr "enabled" "" (and (eq_attr "isa" "p10") (match_test "TARGET_POWER10")) (const_int 1) + + (and (eq_attr "isa" "p7p8") + (match_test "TARGET_VSX && !TARGET_P9_VECTOR")) + (const_int 1) + ] (const_int 0))) ;; If this instruction is microcoded on the CELL processor @@ -8436,27 +8441,29 @@ (define_insn "*mov<mode>_softfloat32" (define_insn "*mov<mode>_hardfloat64" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" - "=m, d, d, <f64_p9>, wY, - <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, - YZ, r, !r, *c*l, !r, - *h, r, <f64_dm>, wa") + "=m, d, <f64_vsx>, <f64_p9>, wY, + <f64_av>, Z, wa, <f64_vsx>, !r, + YZ, r, !r, *c*l, !r, + *h, r, <f64_dm>, d, wn, + wa") (match_operand:FMOVE64 1 "input_operand" - "d, m, d, wY, <f64_p9>, - Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, + "d, m, <f64_vsx>, wY, <f64_p9>, + Z, <f64_av>, wa, <zero_fp>, <zero_fp>, r, YZ, r, r, *h, - 0, <f64_dm>, r, eP"))] + 0, <f64_dm>, r, d, wn, + eP"))] "TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], <MODE>mode) || gpc_reg_operand (operands[1], <MODE>mode))" "@ stfd%U0%X0 %1,%0 lfd%U1%X1 %0,%1 - fmr %0,%1 + xxlor %x0,%x1,%x1 lxsd %0,%1 stxsd %1,%0 lxsdx %x0,%y1 stxsdx %x1,%y0 - xxlor %x0,%x1,%x1 + fmr %0,%1 xxlxor %x0,%x0,%x0 li %0,0 std%U0%X0 %1,%0 @@ -8467,23 +8474,28 @@ (define_insn "*mov<mode>_hardfloat64" nop mfvsrd %0,%x1 mtvsrd %x0,%1 + fmr %0,%1 + fmr %0,%1 #" [(set_attr "type" - "fpstore, fpload, fpsimple, fpload, fpstore, + "fpstore, fpload, veclogical, fpload, fpstore, fpload, fpstore, veclogical, veclogical, integer, store, load, *, mtjmpr, mfjmpr, - *, mfvsr, mtvsr, vecperm") + *, mfvsr, mtvsr, fpsimple, fpsimple, + vecperm") (set_attr "size" "64") (set_attr "isa" - "*, *, *, p9v, p9v, - p7v, p7v, *, *, *, - *, *, *, *, *, - *, p8v, p8v, p10") + "*, *, p7p8, p9v, p9v, + p7v, p7v, *, *, *, + *, *, *, *, *, + *, p8v, p8v, *, *, + p10") (set_attr "prefixed" "*, *, *, *, *, *, *, *, *, *, *, *, *, *, *, - *, *, *, *")]) + *, *, *, *, *, + *")]) ;; STD LD MR MT<SPR> MF<SPR> G-const ;; H-const F-const Special
Hi! For future patches: please don't send patches as replies to existing threads. Just start a new thread for a new patch (series). You can mark it as [PATCH v2] in the subject, if you want. On Fri, Feb 24, 2023 at 01:41:49PM +0530, Ajit Agarwal wrote: > Here is the patch that uses xxlor instead of fmr where possible. > Performance results shows that fmr is better in power9 and > power10 architectures whereas xxlor is better in power7 and > power 8 architectures. And fmr is the only option before p7. > rs6000: Use xxlor instead of fmr where possible > > This patch replaces fmr with xxlor instruction for power7 > and power8 architectures whereas for power9 and power10 > replaces xxlor with fmr instruction. Saying "this patch" in a commit message reads strangely. Just "Replace fmr with" etc.? The second part is just wrong, you cannot replace xxlor by fmr in general. > Perf measurement results: > > Power9 fmr: 201,847,661 cycles. > Power9 xxlor: 201,877,78 cycles. > Power8 fmr: 201,057,795 cycles. > Power8 xxlor: 201,004,671 cycles. What is this measuring? 100M insns back-to-back, each dependent on the previous one? What are the results on p7 and p10? These numbers show there is no difference on p8 either. Did you paste the wrong numbers maybe? > * config/rs6000/rs6000.md (*movdf_hardfloat64): Use xxlor > for power7 and power8 and fmr for power9 and power10. Please don't break lines early. Changelogs lines can be 80 columns wide, just like source code lines. > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -354,7 +354,7 @@ (define_attr "cpu" > (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) > > ;; The ISA we implement. > -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" > +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p7p8,p10" p78v, and sort it after p8v please. > + (and (eq_attr "isa" "p7p8") > + (match_test "TARGET_VSX && !TARGET_P9_VECTOR")) > + (const_int 1) Okay. > (define_insn "*mov<mode>_hardfloat64" > [(set (match_operand:FMOVE64 0 "nonimmediate_operand" > - "=m, d, d, <f64_p9>, wY, > - <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, > - YZ, r, !r, *c*l, !r, > - *h, r, <f64_dm>, wa") > + "=m, d, <f64_vsx>, <f64_p9>, wY, > + <f64_av>, Z, wa, <f64_vsx>, !r, > + YZ, r, !r, *c*l, !r, > + *h, r, <f64_dm>, d, wn, > + wa") > (match_operand:FMOVE64 1 "input_operand" (You posted this mail as wrapping. That means the patch cannot be applied non-manually, and that replies to your mail will be mangled. Just get a Real mail client, and configure it correctly :-) ) > - "d, m, d, wY, <f64_p9>, > - Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, > + "d, m, <f64_vsx>, wY, <f64_p9>, > + Z, <f64_av>, wa, <zero_fp>, <zero_fp>, > r, YZ, r, r, *h, > - 0, <f64_dm>, r, eP"))] > + 0, <f64_dm>, r, d, wn, > + eP"))] No. It is impossible to figure out what you changed here by just reading it. There is no requirement there should be exactly five alternatives per line, and/or that there should be the same number everywhere. If the indentation was incorrect, and you want to fix that, do that in a separate *earlier* patch in the series, please. > "TARGET_POWERPC64 && TARGET_HARD_FLOAT > && (gpc_reg_operand (operands[0], <MODE>mode) > || gpc_reg_operand (operands[1], <MODE>mode))" > "@ > stfd%U0%X0 %1,%0 > lfd%U1%X1 %0,%1 > - fmr %0,%1 > + xxlor %x0,%x1,%x1 > lxsd %0,%1 > stxsd %1,%0 > lxsdx %x0,%y1 > stxsdx %x1,%y0 > - xxlor %x0,%x1,%x1 > + fmr %0,%1 > xxlxor %x0,%x0,%x0 > li %0,0 > std%U0%X0 %1,%0 > @@ -8467,23 +8474,28 @@ (define_insn "*mov<mode>_hardfloat64" > nop > mfvsrd %0,%x1 > mtvsrd %x0,%1 > + fmr %0,%1 > + fmr %0,%1 > #" > [(set_attr "type" > - "fpstore, fpload, fpsimple, fpload, fpstore, > + "fpstore, fpload, veclogical, fpload, fpstore, > fpload, fpstore, veclogical, veclogical, integer, > store, load, *, mtjmpr, mfjmpr, > - *, mfvsr, mtvsr, vecperm") > + *, mfvsr, mtvsr, fpsimple, fpsimple, > + vecperm") > (set_attr "size" "64") > (set_attr "isa" > - "*, *, *, p9v, p9v, > - p7v, p7v, *, *, *, > - *, *, *, *, *, > - *, p8v, p8v, p10") > + "*, *, p7p8, p9v, p9v, > + p7v, p7v, *, *, *, > + *, *, *, *, *, > + *, p8v, p8v, *, *, > + p10") So, you swapped the xxlor and fmr entries, and added two nextra fmr entries at the end?! Segher
Hello Segher: On 24/02/23 8:41 pm, Segher Boessenkool wrote: > Hi! > > For future patches: please don't send patches as replies to existing > threads. Just start a new thread for a new patch (series). You can > mark it as [PATCH v2] in the subject, if you want. > > On Fri, Feb 24, 2023 at 01:41:49PM +0530, Ajit Agarwal wrote: >> Here is the patch that uses xxlor instead of fmr where possible. >> Performance results shows that fmr is better in power9 and >> power10 architectures whereas xxlor is better in power7 and >> power 8 architectures. > > And fmr is the only option before p7. > >> rs6000: Use xxlor instead of fmr where possible >> >> This patch replaces fmr with xxlor instruction for power7 >> and power8 architectures whereas for power9 and power10 >> replaces xxlor with fmr instruction. > > Saying "this patch" in a commit message reads strangely. Just "Replace > fmr with" etc.? > I will correct this. > The second part is just wrong, you cannot replace xxlor by fmr in > general. > >> Perf measurement results: >> >> Power9 fmr: 201,847,661 cycles. >> Power9 xxlor: 201,877,78 cycles. >> Power8 fmr: 201,057,795 cycles. >> Power8 xxlor: 201,004,671 cycles. > > What is this measuring? 100M insns back-to-back, each dependent on the > previous one? > Yes. > What are the results on p7 and p10? > > These numbers show there is no difference on p8 either. Did you paste > the wrong numbers maybe? > I will measure it again and update with a new patch. >> * config/rs6000/rs6000.md (*movdf_hardfloat64): Use xxlor >> for power7 and power8 and fmr for power9 and power10. > > Please don't break lines early. Changelogs lines can be 80 columns > wide, just like source code lines. > >> --- a/gcc/config/rs6000/rs6000.md >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -354,7 +354,7 @@ (define_attr "cpu" >> (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) >> >> ;; The ISA we implement. >> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" >> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p7p8,p10" > > p78v, and sort it after p8v please. > >> + (and (eq_attr "isa" "p7p8") >> + (match_test "TARGET_VSX && !TARGET_P9_VECTOR")) >> + (const_int 1) > > Okay. > >> (define_insn "*mov<mode>_hardfloat64" >> [(set (match_operand:FMOVE64 0 "nonimmediate_operand" >> - "=m, d, d, <f64_p9>, wY, >> - <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, >> - YZ, r, !r, *c*l, !r, >> - *h, r, <f64_dm>, wa") >> + "=m, d, <f64_vsx>, <f64_p9>, wY, >> + <f64_av>, Z, wa, <f64_vsx>, !r, >> + YZ, r, !r, *c*l, !r, >> + *h, r, <f64_dm>, d, wn, >> + wa") >> (match_operand:FMOVE64 1 "input_operand" > > (You posted this mail as wrapping. That means the patch cannot be > applied non-manually, and that replies to your mail will be mangled. > Just get a Real mail client, and configure it correctly :-) ) > I am using Thunderbird as mail client and the settings are all correct. I have set the mailnews.wrapLength 0. >> - "d, m, d, wY, <f64_p9>, >> - Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, >> + "d, m, <f64_vsx>, wY, <f64_p9>, >> + Z, <f64_av>, wa, <zero_fp>, <zero_fp>, >> r, YZ, r, r, *h, >> - 0, <f64_dm>, r, eP"))] >> + 0, <f64_dm>, r, d, wn, >> + eP"))] > > No. It is impossible to figure out what you changed here by just > reading it. > > There is no requirement there should be exactly five alternatives per > line, and/or that there should be the same number everywhere. > > If the indentation was incorrect, and you want to fix that, do that in a > separate *earlier* patch in the series, please. > I will Keep indentation as same. >> "TARGET_POWERPC64 && TARGET_HARD_FLOAT >> && (gpc_reg_operand (operands[0], <MODE>mode) >> || gpc_reg_operand (operands[1], <MODE>mode))" >> "@ >> stfd%U0%X0 %1,%0 >> lfd%U1%X1 %0,%1 >> - fmr %0,%1 >> + xxlor %x0,%x1,%x1 >> lxsd %0,%1 >> stxsd %1,%0 >> lxsdx %x0,%y1 >> stxsdx %x1,%y0 >> - xxlor %x0,%x1,%x1 >> + fmr %0,%1 >> xxlxor %x0,%x0,%x0 >> li %0,0 >> std%U0%X0 %1,%0 >> @@ -8467,23 +8474,28 @@ (define_insn "*mov<mode>_hardfloat64" >> nop >> mfvsrd %0,%x1 >> mtvsrd %x0,%1 >> + fmr %0,%1 >> + fmr %0,%1 >> #" >> [(set_attr "type" >> - "fpstore, fpload, fpsimple, fpload, fpstore, >> + "fpstore, fpload, veclogical, fpload, fpstore, >> fpload, fpstore, veclogical, veclogical, integer, >> store, load, *, mtjmpr, mfjmpr, >> - *, mfvsr, mtvsr, vecperm") >> + *, mfvsr, mtvsr, fpsimple, fpsimple, >> + vecperm") >> (set_attr "size" "64") >> (set_attr "isa" >> - "*, *, *, p9v, p9v, >> - p7v, p7v, *, *, *, >> - *, *, *, *, *, >> - *, p8v, p8v, p10") >> + "*, *, p7p8, p9v, p9v, >> + p7v, p7v, *, *, *, >> + *, *, *, *, *, >> + *, p8v, p8v, *, *, >> + p10") > > So, you swapped the xxlor and fmr entries, and added two nextra fmr > entries at the end?! > I have moved xxlor <f64_vsx> "p7p8" before any other constraints with fmr "*". I have added first constraints as xxlor <f64_vsx> with "p7p8" then wa fmr "*" and wn,d "*" as fmr at end. > > Segher Thanks & Regards Ajit
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 4a7812fa592..dfd6c73ffcb 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -8436,7 +8436,7 @@ "@ stfd%U0%X0 %1,%0 lfd%U1%X1 %0,%1 - fmr %0,%1 + xxlor %0,%1,%1 lxsd %0,%1 stxsd %1,%0 lxsdx %x0,%y1