Message ID | 20220915083052.74903-1-guojiufu@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B25203857028 for <patchwork@sourceware.org>; Thu, 15 Sep 2022 08:31:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B25203857028 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663230690; bh=xwtYU7HlAnsGD2zz4uWj7jyHBiRylK8KfmxpHJBK7iI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=GEegz0LqCJaqMocWo5kIyBHvumQhheaULLolI6JW8nIvfFVv7LCatO3nbnJpkXWzM B+MGWgq6UpTf6EfeAdgk8y0Bzs1iMy3S03pg/ymbn4lII9/LNjvxbxmk91tHh/5Nh6 JxzOC8YTBYwlT3ALFIi58FOmGvj6m4A/Iv9ZiB5c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 819073858C74; Thu, 15 Sep 2022 08:30:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 819073858C74 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28F7iO6R029142; Thu, 15 Sep 2022 08:30:59 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jkywxhdgm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Sep 2022 08:30:58 +0000 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28F7juPT002232; Thu, 15 Sep 2022 08:30:58 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jkywxhdfu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Sep 2022 08:30:58 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28F8L3hU006715; Thu, 15 Sep 2022 08:30:56 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma04ams.nl.ibm.com with ESMTP id 3jjytx238a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Sep 2022 08:30:56 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28F8Us0g37618080 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Sep 2022 08:30:54 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2B5DA11C054; Thu, 15 Sep 2022 08:30:54 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 614E911C04C; Thu, 15 Sep 2022 08:30:53 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 15 Sep 2022 08:30:53 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH]rs6000: Load high and low part of 64bit constant independently Date: Thu, 15 Sep 2022 16:30:52 +0800 Message-Id: <20220915083052.74903-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: K-p_hXnBlHKdlSqXa9KlGLE-JgbDcAaR X-Proofpoint-GUID: OfDnVfhgJe6fvYtOEKXRXjhLXUZPk13P X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-15_04,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 suspectscore=0 clxscore=1015 impostorscore=0 mlxscore=0 malwarescore=0 mlxlogscore=999 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209150043 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Jiufu Guo <guojiufu@linux.ibm.com> Cc: dje.gcc@gmail.com, segher@kernel.crashing.org, linkw@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
rs6000: Load high and low part of 64bit constant independently
|
|
Commit Message
Jiufu Guo
Sept. 15, 2022, 8:30 a.m. UTC
Hi, For a complicate 64bit constant, blow is one instruction-sequence to build: lis 9,0x800a ori 9,9,0xabcd sldi 9,9,32 oris 9,9,0xc167 ori 9,9,0xfa16 while we can also use below sequence to build: lis 9,0xc167 lis 10,0x800a ori 9,9,0xfa16 ori 10,10,0xabcd rldimi 9,10,32,0 This sequence is using 2 registers to build high and low part firstly, and then merge them. In parallel aspect, this sequence would be faster. (Ofcause, using 1 more register with potential register pressure). Bootstrap and regtest pass on ppc64le. Is this ok for trunk? BR, Jeff(Jiufu) gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit constant build. gcc/testsuite/ChangeLog: * gcc.target/powerpc/parall_5insn_const.c: New test. --- gcc/config/rs6000/rs6000.cc | 45 +++++++++++-------- .../gcc.target/powerpc/parall_5insn_const.c | 27 +++++++++++ 2 files changed, 53 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
Comments
Hi Jeff, Sorry for the late review. on 2022/9/15 16:30, Jiufu Guo wrote: > Hi, > > For a complicate 64bit constant, blow is one instruction-sequence to > build: > lis 9,0x800a > ori 9,9,0xabcd > sldi 9,9,32 > oris 9,9,0xc167 > ori 9,9,0xfa16 > > while we can also use below sequence to build: > lis 9,0xc167 > lis 10,0x800a > ori 9,9,0xfa16 > ori 10,10,0xabcd > rldimi 9,10,32,0 > This sequence is using 2 registers to build high and low part firstly, > and then merge them. > In parallel aspect, this sequence would be faster. (Ofcause, using 1 more > register with potential register pressure). > > Bootstrap and regtest pass on ppc64le. > Is this ok for trunk? > > > BR, > Jeff(Jiufu) > > > gcc/ChangeLog: > > * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit > constant build. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/parall_5insn_const.c: New test. > > --- > gcc/config/rs6000/rs6000.cc | 45 +++++++++++-------- > .../gcc.target/powerpc/parall_5insn_const.c | 27 +++++++++++ > 2 files changed, 53 insertions(+), 19 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index a656cb32a47..759c6309677 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -10180,26 +10180,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) > } > else > { > - temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); > - > - emit_move_insn (copy_rtx (temp), > - GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); > - if (ud3 != 0) > - emit_move_insn (copy_rtx (temp), > - gen_rtx_IOR (DImode, copy_rtx (temp), > - GEN_INT (ud3))); > + if (can_create_pseudo_p ()) > + { > + /* lis A,U4; ori A,U3; lis B,U2; ori B,U1; rldimi A,B,32,0. */ Nit: A, B are supposed to be H, L? > + rtx H = gen_reg_rtx (DImode); > + rtx L = gen_reg_rtx (DImode); > + HOST_WIDE_INT num = (ud2 << 16) | ud1; > + rs6000_emit_set_long_const (L, (num ^ 0x80000000) - 0x80000000); > + num = (ud4 << 16) | ud3; > + rs6000_emit_set_long_const (H, (num ^ 0x80000000) - 0x80000000); > + emit_insn (gen_rotldi3_insert_3 (dest, H, GEN_INT (32), L, > + GEN_INT (0xffffffff))); > + } > + else > + { > + /* lis A, U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1. */ ~~~ unexpected space? > + emit_move_insn (dest, > + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); > + if (ud3 != 0) > + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3))); > > - emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, > - gen_rtx_ASHIFT (DImode, copy_rtx (temp), > - GEN_INT (32))); > - if (ud2 != 0) > - emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, > - gen_rtx_IOR (DImode, copy_rtx (temp), > - GEN_INT (ud2 << 16))); > - if (ud1 != 0) > - emit_move_insn (dest, > - gen_rtx_IOR (DImode, copy_rtx (temp), > - GEN_INT (ud1))); > + emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32))); > + if (ud2 != 0) > + emit_move_insn (dest, > + gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16))); > + if (ud1 != 0) > + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1))); > + } > } > } > > diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > new file mode 100644 > index 00000000000..ed8ccc73378 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > @@ -0,0 +1,27 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power7 -save-temps" } */ Why do we need power7 here? > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ > + > +void __attribute__ ((noinline)) foo (unsigned long long *a) > +{ > + /* 2lis+2ori+1rldimi for each constant. */ Nit: seems better to read with "/* 2 lis + 2 ori + 1 rldimi for ..." BR, Kewen
Hi Kewen, Thanks for your review on this patch! "Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi Jeff, > > Sorry for the late review. > > on 2022/9/15 16:30, Jiufu Guo wrote: >> Hi, >> >> For a complicate 64bit constant, blow is one instruction-sequence to >> build: >> lis 9,0x800a >> ori 9,9,0xabcd >> sldi 9,9,32 >> oris 9,9,0xc167 >> ori 9,9,0xfa16 >> >> while we can also use below sequence to build: >> lis 9,0xc167 >> lis 10,0x800a >> ori 9,9,0xfa16 >> ori 10,10,0xabcd >> rldimi 9,10,32,0 >> This sequence is using 2 registers to build high and low part firstly, >> and then merge them. >> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more >> register with potential register pressure). >> >> Bootstrap and regtest pass on ppc64le. >> Is this ok for trunk? >> >> >> BR, >> Jeff(Jiufu) >> >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit >> constant build. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/parall_5insn_const.c: New test. >> >> --- >> gcc/config/rs6000/rs6000.cc | 45 +++++++++++-------- >> .../gcc.target/powerpc/parall_5insn_const.c | 27 +++++++++++ >> 2 files changed, 53 insertions(+), 19 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index a656cb32a47..759c6309677 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -10180,26 +10180,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) >> } >> else >> { >> - temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); >> - >> - emit_move_insn (copy_rtx (temp), >> - GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); >> - if (ud3 != 0) >> - emit_move_insn (copy_rtx (temp), >> - gen_rtx_IOR (DImode, copy_rtx (temp), >> - GEN_INT (ud3))); >> + if (can_create_pseudo_p ()) >> + { >> + /* lis A,U4; ori A,U3; lis B,U2; ori B,U1; rldimi A,B,32,0. */ > > Nit: A, B are supposed to be H, L? Yes, thanks for this catch! It should be /* lis H,U4; ori H,U3; lis L,U2; ori L,U1; rldimi L,H,32,0. */ > >> + rtx H = gen_reg_rtx (DImode); >> + rtx L = gen_reg_rtx (DImode); >> + HOST_WIDE_INT num = (ud2 << 16) | ud1; >> + rs6000_emit_set_long_const (L, (num ^ 0x80000000) - 0x80000000); >> + num = (ud4 << 16) | ud3; >> + rs6000_emit_set_long_const (H, (num ^ 0x80000000) - 0x80000000); >> + emit_insn (gen_rotldi3_insert_3 (dest, H, GEN_INT (32), L, >> + GEN_INT (0xffffffff))); >> + } >> + else >> + { >> + /* lis A, U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1. */ > ~~~ unexpected space? Thanks for the catch! > >> + emit_move_insn (dest, >> + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); >> + if (ud3 != 0) >> + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3))); >> >> - emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, >> - gen_rtx_ASHIFT (DImode, copy_rtx (temp), >> - GEN_INT (32))); >> - if (ud2 != 0) >> - emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, >> - gen_rtx_IOR (DImode, copy_rtx (temp), >> - GEN_INT (ud2 << 16))); >> - if (ud1 != 0) >> - emit_move_insn (dest, >> - gen_rtx_IOR (DImode, copy_rtx (temp), >> - GEN_INT (ud1))); >> + emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32))); >> + if (ud2 != 0) >> + emit_move_insn (dest, >> + gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16))); >> + if (ud1 != 0) >> + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1))); >> + } >> } >> } >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c >> new file mode 100644 >> index 00000000000..ed8ccc73378 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c >> @@ -0,0 +1,27 @@ >> +/* { dg-do run } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -save-temps" } */ > > Why do we need power7 here? power8/9 are also ok for this case. Actually, O just want to avoid to use new p10 instruction, like "pli", and then selected an old arch option. > >> +/* { dg-require-effective-target has_arch_ppc64 } */ >> + >> +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */ >> +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */ >> +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ >> + >> +void __attribute__ ((noinline)) foo (unsigned long long *a) >> +{ >> + /* 2lis+2ori+1rldimi for each constant. */ > > Nit: seems better to read with "/* 2 lis + 2 ori + 1 rldimi for ..." Thanks for your sugguestion! BR, Jeff (Jiufu) I updated it as below: gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit constant build. gcc/testsuite/ChangeLog: * gcc.target/powerpc/parall_5insn_const.c: New test. --- gcc/config/rs6000/rs6000.cc | 45 +++++++++++-------- .../gcc.target/powerpc/parall_5insn_const.c | 27 +++++++++++ 2 files changed, 53 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index a656cb32a47..759c6309677 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -10180,26 +10180,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) } else { - temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); - - emit_move_insn (copy_rtx (temp), - GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); - if (ud3 != 0) - emit_move_insn (copy_rtx (temp), - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud3))); + if (can_create_pseudo_p ()) + { + /* lis H,U4; ori H,U3; lis L,U2; ori L,U1; rldimi L,H,32,0. */ + rtx H = gen_reg_rtx (DImode); + rtx L = gen_reg_rtx (DImode); + HOST_WIDE_INT num = (ud2 << 16) | ud1; + rs6000_emit_set_long_const (L, (num ^ 0x80000000) - 0x80000000); + num = (ud4 << 16) | ud3; + rs6000_emit_set_long_const (H, (num ^ 0x80000000) - 0x80000000); + emit_insn (gen_rotldi3_insert_3 (dest, H, GEN_INT (32), L, + GEN_INT (0xffffffff))); + } + else + { + /* lis A,U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1. */ + emit_move_insn (dest, + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); + if (ud3 != 0) + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3))); - emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, - gen_rtx_ASHIFT (DImode, copy_rtx (temp), - GEN_INT (32))); - if (ud2 != 0) - emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud2 << 16))); - if (ud1 != 0) - emit_move_insn (dest, - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud1))); + emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32))); + if (ud2 != 0) + emit_move_insn (dest, + gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16))); + if (ud1 != 0) + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1))); + } } } diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c new file mode 100644 index 00000000000..ed8ccc73378 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c @@ -0,1 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8 -save-temps" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ + +void __attribute__ ((noinline)) foo (unsigned long long *a) +{ + /* 2 lis + 2 ori + 1 rldimi for each constant. */ + *a++ = 0x800aabcdc167fa16ULL; + *a++ = 0x7543a876867f616ULL; +} + +long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL}; +int +main () +{ + long long res[2]; + + foo (res); + if (__builtin_memcmp (res, A, sizeof (res)) != 0) + __builtin_abort (); + + return 0; +}
Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi Kewen, > > Thanks for your review on this patch! > > "Kewen.Lin" <linkw@linux.ibm.com> writes: > >> Hi Jeff, >> >> Sorry for the late review. >> >> on 2022/9/15 16:30, Jiufu Guo wrote: >>> Hi, >>> >>> For a complicate 64bit constant, blow is one instruction-sequence to >>> build: >>> lis 9,0x800a >>> ori 9,9,0xabcd >>> sldi 9,9,32 >>> oris 9,9,0xc167 >>> ori 9,9,0xfa16 >>> >>> while we can also use below sequence to build: >>> lis 9,0xc167 >>> lis 10,0x800a >>> ori 9,9,0xfa16 >>> ori 10,10,0xabcd >>> rldimi 9,10,32,0 >>> This sequence is using 2 registers to build high and low part firstly, >>> and then merge them. >>> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more >>> register with potential register pressure). >>> >>> Bootstrap and regtest pass on ppc64le. >>> Is this ok for trunk? >>> >>> >>> BR, >>> Jeff(Jiufu) >>> >>> >>> gcc/ChangeLog: >>> >>> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit >>> constant build. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/parall_5insn_const.c: New test. >>> >>> --- cut... > @@ -0,1 +1,27 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8 -save-temps" } */ maybe, I could use power7. Any comments? > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ > + > +void __attribute__ ((noinline)) foo (unsigned long long *a) > +{ > + /* 2 lis + 2 ori + 1 rldimi for each constant. */ > + *a++ = 0x800aabcdc167fa16ULL; > + *a++ = 0x7543a876867f616ULL; > +} > + > +long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL}; > +int > +main () > +{ > + long long res[2]; > + > + foo (res); > + if (__builtin_memcmp (res, A, sizeof (res)) != 0) > + __builtin_abort (); > + > + return 0; > +}
Hi! On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu Guo wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: > > on 2022/9/15 16:30, Jiufu Guo wrote: > >> For a complicate 64bit constant, blow is one instruction-sequence to > >> build: > >> lis 9,0x800a > >> ori 9,9,0xabcd > >> sldi 9,9,32 > >> oris 9,9,0xc167 > >> ori 9,9,0xfa16 > >> > >> while we can also use below sequence to build: > >> lis 9,0xc167 > >> lis 10,0x800a > >> ori 9,9,0xfa16 > >> ori 10,10,0xabcd > >> rldimi 9,10,32,0 > >> This sequence is using 2 registers to build high and low part firstly, > >> and then merge them. > >> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more > >> register with potential register pressure). And crucially this patch only uses two registers if can_create_pseudo_p. Please mention that. > >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit > >> constant build. If you don't give details of what this does, just say "Update." please. But update to what? "Generate more parallel code if can_create_pseudo_p." maybe? > >> + rtx H = gen_reg_rtx (DImode); > >> + rtx L = gen_reg_rtx (DImode); Please don't use all-uppercase variable names, those are for macros. In fact, don't use uppercase in variable (and function etc.) names at all, unless there is a really good reason to. Just call it "high" and "low", or "hi" and "lo", or something? > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > >> @@ -0,0 +1,27 @@ > >> +/* { dg-do run } */ > >> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -save-temps" } */ > > > > Why do we need power7 here? > power8/9 are also ok for this case. Actually, O just want to > avoid to use new p10 instruction, like "pli", and then selected > an old arch option. Why does it need _at least_ p7, is the question (as I understand it). To prohibit pli etc. you can do -mno-prefixed (which works on all older CPUs just as well), or skip the test if prefixed insns are enabled, or scan for the then generated code as well. The first option is by far the simplest. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > @@ -0,1 +1,27 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8 -save-temps" } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ p8 here makes no sense, we also want good and correct code generated for older CPUs, so we should not preevent testing on those. For that reason you shouldn't use -mcpu= when not needed. Like here :-)
Hi Segher! Thanks for your helpful comments! Segher Boessenkool <segher@kernel.crashing.org> writes: > Hi! > > On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu Guo wrote: >> "Kewen.Lin" <linkw@linux.ibm.com> writes: >> > on 2022/9/15 16:30, Jiufu Guo wrote: >> >> For a complicate 64bit constant, blow is one instruction-sequence to >> >> build: >> >> lis 9,0x800a >> >> ori 9,9,0xabcd >> >> sldi 9,9,32 >> >> oris 9,9,0xc167 >> >> ori 9,9,0xfa16 >> >> >> >> while we can also use below sequence to build: >> >> lis 9,0xc167 >> >> lis 10,0x800a >> >> ori 9,9,0xfa16 >> >> ori 10,10,0xabcd >> >> rldimi 9,10,32,0 >> >> This sequence is using 2 registers to build high and low part firstly, >> >> and then merge them. >> >> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more >> >> register with potential register pressure). > > And crucially this patch only uses two registers if can_create_pseudo_p. > Please mention that. OK. > >> >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit >> >> constant build. > > If you don't give details of what this does, just say "Update." please. > But update to what? > > "Generate more parallel code if can_create_pseudo_p." maybe? Thanks. > >> >> + rtx H = gen_reg_rtx (DImode); >> >> + rtx L = gen_reg_rtx (DImode); > > Please don't use all-uppercase variable names, those are for macros. In > fact, don't use uppercase in variable (and function etc.) names at all, > unless there is a really good reason to. > > Just call it "high" and "low", or "hi" and "lo", or something? Thanks. > >> >> --- /dev/null >> >> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c >> >> @@ -0,0 +1,27 @@ >> >> +/* { dg-do run } */ >> >> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -save-temps" } */ >> > >> > Why do we need power7 here? >> power8/9 are also ok for this case. Actually, O just want to >> avoid to use new p10 instruction, like "pli", and then selected >> an old arch option. > > Why does it need _at least_ p7, is the question (as I understand it). > > To prohibit pli etc. you can do -mno-prefixed (which works on all older > CPUs just as well), or skip the test if prefixed insns are enabled, or > scan for the then generated code as well. The first option is by far > the simplest. Thanks for your suggestion! -mno-prefixed is great, it meets the intention here. > >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c >> @@ -0,1 +1,27 @@ >> +/* { dg-do run } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -save-temps" } */ >> +/* { dg-require-effective-target has_arch_ppc64 } */ > > p8 here makes no sense, we also want good and correct code generated for > older CPUs, so we should not preevent testing on those. For that reason > you shouldn't use -mcpu= when not needed. Like here :-) Yeap, thanks! BR, Jeff (Jiufu)
Hi Segher, on 2022/11/25 23:46, Segher Boessenkool wrote: > Hi! > > On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu Guo wrote: >> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>> on 2022/9/15 16:30, Jiufu Guo wrote: >>>> For a complicate 64bit constant, blow is one instruction-sequence to >>>> build: >>>> lis 9,0x800a >>>> ori 9,9,0xabcd >>>> sldi 9,9,32 >>>> oris 9,9,0xc167 >>>> ori 9,9,0xfa16 >>>> >>>> while we can also use below sequence to build: >>>> lis 9,0xc167 >>>> lis 10,0x800a >>>> ori 9,9,0xfa16 >>>> ori 10,10,0xabcd >>>> rldimi 9,10,32,0 >>>> This sequence is using 2 registers to build high and low part firstly, >>>> and then merge them. >>>> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more >>>> register with potential register pressure). > > And crucially this patch only uses two registers if can_create_pseudo_p. > Please mention that. > >>>> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit >>>> constant build. > > If you don't give details of what this does, just say "Update." please. > But update to what? > > "Generate more parallel code if can_create_pseudo_p." maybe? > >>>> + rtx H = gen_reg_rtx (DImode); >>>> + rtx L = gen_reg_rtx (DImode); > > Please don't use all-uppercase variable names, those are for macros. In > fact, don't use uppercase in variable (and function etc.) names at all, > unless there is a really good reason to. > > Just call it "high" and "low", or "hi" and "lo", or something? > >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c >>>> @@ -0,0 +1,27 @@ >>>> +/* { dg-do run } */ >>>> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -save-temps" } */ >>> >>> Why do we need power7 here? >> power8/9 are also ok for this case. Actually, O just want to >> avoid to use new p10 instruction, like "pli", and then selected >> an old arch option. > > Why does it need _at least_ p7, is the question (as I understand it). > Yeah, that's what I was intended to ask, since those insns to be scanned don't actually require Power7 or later. > To prohibit pli etc. you can do -mno-prefixed (which works on all older > CPUs just as well), or skip the test if prefixed insns are enabled, or > scan for the then generated code as well. The first option is by far > the simplest. Yeah, using -mno-prefixed is perfect here, nice! BR, Kewen
Hi Kewen/Segher, Thanks a lot for your review! I updated the patch accordingly as below for message/code/testcase: For a complicate 64bit constant, blow is one instruction-sequence to build: lis 9,0x800a ori 9,9,0xabcd sldi 9,9,32 oris 9,9,0xc167 ori 9,9,0xfa16 while we can also use below sequence to build: lis 9,0xc167 lis 10,0x800a ori 9,9,0xfa16 ori 10,10,0xabcd rldimi 9,10,32,0 This sequence is using 2 registers to build high and low part firstly, and then merge them. In parallel aspect, this sequence would be faster. (Ofcause, using 1 more register with potential register pressure). The instruction sequence with two registers for parallel version can be generated only if can_create_pseudo_p. Otherwise, the one register sequence is generated. Bootstrap and regtest pass on ppc64le. Is this ok for trunk? gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Generate more parallel code if can_create_pseudo_p. gcc/testsuite/ChangeLog: * gcc.target/powerpc/parall_5insn_const.c: New test. --- gcc/config/rs6000/rs6000.cc | 45 +++++++++++-------- .../gcc.target/powerpc/parall_5insn_const.c | 27 +++++++++++ 2 files changed, 53 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index eb7ad5e954f..877b314a57a 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -10337,26 +10337,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) } else { - temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); - - emit_move_insn (copy_rtx (temp), - GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); - if (ud3 != 0) - emit_move_insn (copy_rtx (temp), - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud3))); + if (can_create_pseudo_p ()) + { + /* lis H,U4; ori H,U3; lis L,U2; ori L,U1; rldimi L,H,32,0. */ + rtx high = gen_reg_rtx (DImode); + rtx low = gen_reg_rtx (DImode); + HOST_WIDE_INT num = (ud2 << 16) | ud1; + rs6000_emit_set_long_const (low, (num ^ 0x80000000) - 0x80000000); + num = (ud4 << 16) | ud3; + rs6000_emit_set_long_const (high, (num ^ 0x80000000) - 0x80000000); + emit_insn (gen_rotldi3_insert_3 (dest, high, GEN_INT (32), low, + GEN_INT (0xffffffff))); + } + else + { + /* lis A,U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1. */ + emit_move_insn (dest, + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); + if (ud3 != 0) + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3))); - emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, - gen_rtx_ASHIFT (DImode, copy_rtx (temp), - GEN_INT (32))); - if (ud2 != 0) - emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud2 << 16))); - if (ud1 != 0) - emit_move_insn (dest, - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud1))); + emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32))); + if (ud2 != 0) + emit_move_insn (dest, + gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16))); + if (ud1 != 0) + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1))); + } } } diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c new file mode 100644 index 00000000000..e3a9a7264cf --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mno-prefixed -save-temps" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ + +void __attribute__ ((noinline)) foo (unsigned long long *a) +{ + /* 2 lis + 2 ori + 1 rldimi for each constant. */ + *a++ = 0x800aabcdc167fa16ULL; + *a++ = 0x7543a876867f616ULL; +} + +long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL}; +int +main () +{ + long long res[2]; + + foo (res); + if (__builtin_memcmp (res, A, sizeof (res)) != 0) + __builtin_abort (); + + return 0; +}
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index a656cb32a47..759c6309677 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -10180,26 +10180,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) } else { - temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); - - emit_move_insn (copy_rtx (temp), - GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); - if (ud3 != 0) - emit_move_insn (copy_rtx (temp), - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud3))); + if (can_create_pseudo_p ()) + { + /* lis A,U4; ori A,U3; lis B,U2; ori B,U1; rldimi A,B,32,0. */ + rtx H = gen_reg_rtx (DImode); + rtx L = gen_reg_rtx (DImode); + HOST_WIDE_INT num = (ud2 << 16) | ud1; + rs6000_emit_set_long_const (L, (num ^ 0x80000000) - 0x80000000); + num = (ud4 << 16) | ud3; + rs6000_emit_set_long_const (H, (num ^ 0x80000000) - 0x80000000); + emit_insn (gen_rotldi3_insert_3 (dest, H, GEN_INT (32), L, + GEN_INT (0xffffffff))); + } + else + { + /* lis A, U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1. */ + emit_move_insn (dest, + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); + if (ud3 != 0) + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3))); - emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, - gen_rtx_ASHIFT (DImode, copy_rtx (temp), - GEN_INT (32))); - if (ud2 != 0) - emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud2 << 16))); - if (ud1 != 0) - emit_move_insn (dest, - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud1))); + emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32))); + if (ud2 != 0) + emit_move_insn (dest, + gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16))); + if (ud1 != 0) + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1))); + } } } diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c new file mode 100644 index 00000000000..ed8ccc73378 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mdejagnu-cpu=power7 -save-temps" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ + +void __attribute__ ((noinline)) foo (unsigned long long *a) +{ + /* 2lis+2ori+1rldimi for each constant. */ + *a++ = 0x800aabcdc167fa16ULL; + *a++ = 0x7543a876867f616ULL; +} + +long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL}; +int +main () +{ + long long res[2]; + + foo (res); + if (__builtin_memcmp (res, A, sizeof (res)) != 0) + __builtin_abort (); + + return 0; +}