From patchwork Wed Mar 2 20:54:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 51512 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1476D385842A for ; Wed, 2 Mar 2022 20:55:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1476D385842A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1646254507; bh=GIGS7BbLdXQa9xb2Y6ZTXf4GGQf9vS5On0AcLddpzDw=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=siWJKQKzbl2nfJjXHnBoReJZTb6ggWNRJEIprw2hao1Fy3n5ZambtzfymU2X1y6Pz 3+e8bsauH81oORCGkW4S/9z0DHh1RZLW+gN6Ayv6Bvg0ExlE79WqVllJd43mSADpoR CFUAwevWlzh807MOvaXHQvKo9eC3++1B/9wA4wm0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 4C3893858C83 for ; Wed, 2 Mar 2022 20:54:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4C3893858C83 Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 222JliJw019070; Wed, 2 Mar 2022 20:54:36 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ejf25h5a7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Mar 2022 20:54:35 +0000 Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 222Kg9JV037471; Wed, 2 Mar 2022 20:54:35 GMT Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ejf25h59w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Mar 2022 20:54:35 +0000 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 222KNLwc014028; Wed, 2 Mar 2022 20:54:34 GMT Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by ppma02wdc.us.ibm.com with ESMTP id 3efbua7hpy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Mar 2022 20:54:34 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 222KsVHT25690376 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 2 Mar 2022 20:54:31 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 48DAB2805C; Wed, 2 Mar 2022 20:54:31 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D08F628059; Wed, 2 Mar 2022 20:54:30 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.77.136.59]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTPS; Wed, 2 Mar 2022 20:54:30 +0000 (GMT) Date: Wed, 2 Mar 2022 15:54:29 -0500 To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt Subject: [PATCH, V2] Optimize signed DImode -> TImode on power10, PR target/104698 Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: MsURyXiT81MxRxNnahNMuGtjpNGiKbnY X-Proofpoint-ORIG-GUID: cn4rHdjVnaBxILifPoLMCd-TywkFhxS3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-03-02_12,2022-02-26_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 priorityscore=1501 suspectscore=0 spamscore=0 mlxscore=0 impostorscore=0 adultscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2203020086 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Optimize signed DImode -> TImode on power10. In comparison to the patch I submitted on February 25th, this patch changes the comments based on feedback from Will Schmidt, and I added 2 test cases, to test the conversion when the target registers are GPRs and when the target registers are Altivec registers. On power10, GCC tries to optimize the signed conversion from DImode to TImode by using the vextsd2q instruction. However to generate this instruction, it would have to generate 3 direct moves (1 from the GPR registers to the altivec registers, and 2 from the altivec registers to the GPR register). This patch generates the shift right immediate instruction to do the conversion if the target/source registers ares GPR registers like it does on earlier systems. If the target/source registers are Altivec registers, it will generate the vextsd2q instruction. I have tested this by doing boostraps on power9 and power10 systems using --with-cpu=power9 or --with-cpu=power10 on each system. After doing the bootstraps, I ran the regression tests and there were no regressions. I verified that the two patches did pass. After it goes into the master branch, I would like to update the GCC 11 branch, since the bug first showed up there. Can I check this in to the GCC 11 branch after a burn-in period? I do not plan to back port it to the GCC 10 branch, since GCC 10 did not generate vextsd2q instruction. 2022-03-02 Michael Meissner gcc/ PR target/104698 * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete. (extendditi2): Convert from define_expand to define_insn_and_split. Replace with code to deal with both GPR registers and with altivec registers. gcc/testsuite/ PR target/104698 * gcc.target/powerpc/pr104698-1.c: New test. * gcc.target/powerpc/pr104698-2.c: New test. --- gcc/config/rs6000/vsx.md | 78 ++++++++++++++----- gcc/testsuite/gcc.target/powerpc/pr104698-1.c | 30 +++++++ gcc/testsuite/gcc.target/powerpc/pr104698-2.c | 33 ++++++++ 3 files changed, 120 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr104698-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr104698-2.c diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index b53de103872..8263eaed923 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5023,15 +5023,63 @@ (define_expand "vsignextend_si_v2di" DONE; }) -;; ISA 3.1 vector sign extend -;; Move DI value from GPR to TI mode in VSX register, word 1. -(define_insn "mtvsrdd_diti_w1" - [(set (match_operand:TI 0 "register_operand" "=wa") - (unspec:TI [(match_operand:DI 1 "register_operand" "r")] - UNSPEC_MTVSRD_DITI_W1))] - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "mtvsrdd %x0,0,%1" - [(set_attr "type" "vecmove")]) +;; Sign extend DI to TI. We provide both GPR targets and Altivec targets on +;; power10. On earlier systems, the machine independent code will generate a +;; shift left to sign extend the 64-bit value to 128-bit. +;; +;; If the register allocator prefers to use GPR registers, we will use a shift +;; left instruction to sign extend the 64-bit value to 128-bit. +;; +;; If the register allocator prefers to use Altivec registers on power10, +;; generate the vextsd2q instruction. +(define_insn_and_split "extendditi2" + [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v") + (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z"))) + (clobber (reg:DI CA_REGNO))] + "TARGET_POWERPC64 && TARGET_POWER10" + "#" + "&& reload_completed" + [(pc)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + int dest_regno = reg_or_subregno (dest); + + /* Handle conversion to GPR registers. Load up the low part and then do + a sign extension to the upper part. */ + if (INT_REGNO_P (dest_regno)) + { + rtx dest_hi = gen_highpart (DImode, dest); + rtx dest_lo = gen_lowpart (DImode, dest); + + emit_move_insn (dest_lo, src); + emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63))); + DONE; + } + + /* For conversion to an Altivec register, generate either a splat operation + or a load rightmost double word instruction. Both instructions gets the + DImode value into the lower 64 bits, and then do the vextsd2q + instruction. */ + + else if (ALTIVEC_REGNO_P (dest_regno)) + { + if (MEM_P (src)) + emit_insn (gen_vsx_lxvrdx (dest, src)); + else + { + rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno); + emit_insn (gen_vsx_splat_v2di (dest_v2di, src)); + } + + emit_insn (gen_extendditi2_vector (dest, dest)); + DONE; + } + + else + gcc_unreachable (); +} + [(set_attr "length" "8")]) ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg (define_insn "extendditi2_vector" @@ -5042,18 +5090,6 @@ (define_insn "extendditi2_vector" "vextsd2q %0,%1" [(set_attr "type" "vecexts")]) -(define_expand "extendditi2" - [(set (match_operand:TI 0 "gpc_reg_operand") - (sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))] - "TARGET_POWER10" - { - /* Move 64-bit src from GPR to vector reg and sign extend to 128-bits. */ - rtx temp = gen_reg_rtx (TImode); - emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1])); - emit_insn (gen_extendditi2_vector (operands[0], temp)); - DONE; - }) - ;; ISA 3.0 Binary Floating-Point Support diff --git a/gcc/testsuite/gcc.target/powerpc/pr104698-1.c b/gcc/testsuite/gcc.target/powerpc/pr104698-1.c new file mode 100644 index 00000000000..cd17b6b616d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr104698-1.c @@ -0,0 +1,30 @@ +/* { dg-require-effective-target int128 } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* PR target/104698 involved a regression where on power10, conversion from + long long to __int128_t generated mtvsrdd, vextsd2q, mfvsrd, and mfvsrld + instructions instead of just a GPR sign extension. This test makes sure the + result is kept in the GPR registers. */ + +__int128_t convert_1 (long long a) +{ + return a; /* sradi. */ +} + +/* Like convert_1, but make sure a normal offsettable load is done. The + pattern in vsx.md has support for generating lxvdsx if it is coming from + memory. Make sure when the gpr is used, a normal load with offset is still + done. */ + +__int128_t convert_2 (long long *p) +{ + return p[2]; /* ld and sradi. */ +} + +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrld\M} } } */ +/* { dg-final { scan-assembler-not {\mmtvsrdd\M} } } */ +/* { dg-final { scan-assembler-not {\mvextsd2q\M} } } */ +/* { dg-final { scan-assembler-times {\mld\M} 1 } } */ +/* { dg-final { scan-assembler-times {\msradi\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr104698-2.c b/gcc/testsuite/gcc.target/powerpc/pr104698-2.c new file mode 100644 index 00000000000..6966fce2ba9 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr104698-2.c @@ -0,0 +1,33 @@ +/* { dg-require-effective-target int128 } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* PR target/104694 involved GCC generating vextsd2q to convent long long to + __int128_t when the long long value was in the GPR register. This test + verifies that if the result is in the Altivec registers, we still want to + generate vextsd2q. We use __int128_t to indicate that we want the result of + the conversion to be in an Altivec register. */ + +void do_div_1 (__int128_t *p, __int128_t *q, long long r) +{ + *p = *q / r; /* mtvsrdd, vextsd2q, vdivsq. */ +} + +/* Test the optimization in vsx.md to use lxvrdx instead of ld and mtvsrdd if + the value is coming from memory. */ + +void do_div_2 (__int128_t *p, __int128_t *q, long long *r) +{ + *p = *q / r[2]; /* lxvrdx, vextsd2q, vdivsq. */ +} + +/* { dg-final { scan-assembler-not {\mld\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrld\M} } } */ +/* { dg-final { scan-assembler-not {\msradi\M} } } */ +/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxvrdx\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvdivsq\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 } } */