From patchwork Wed Apr 6 18:21:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 52685 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 78E7B3858413 for ; Wed, 6 Apr 2022 18:22:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 78E7B3858413 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1649269336; bh=r10I98fu3IwHxjSaJhCBqiAtJENEAptq7+bkc5SU9+w=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=r92VZMdtjl085lnnFsuRVThBFqBICpYP9tpOwOuC6CVjk3xb0Sc2Ye4TIagHmu3uq IcZVMVf+MeK28lhcmupPtvXCFtCWbFfMu+xvxcTWTxy3WeTry0vyBZQPZV3UhqsaWo lynMdWf9mO3I5RkMJLgyE5PRcELTVMqs39LCDoS8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 322283857C42 for ; Wed, 6 Apr 2022 18:21:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 322283857C42 Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 236Fw6sb037679; Wed, 6 Apr 2022 18:21:31 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3f9a9qgcm0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 06 Apr 2022 18:21:31 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 236I1npd040820; Wed, 6 Apr 2022 18:21:30 GMT Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0a-001b2d01.pphosted.com with ESMTP id 3f9a9qgckn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 06 Apr 2022 18:21:30 +0000 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 236I37Xv028292; Wed, 6 Apr 2022 18:21:30 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma05wdc.us.ibm.com with ESMTP id 3f6e498thq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 06 Apr 2022 18:21:30 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 236ILTSq26804604 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 6 Apr 2022 18:21:29 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0479F6E059; Wed, 6 Apr 2022 18:21:29 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 76E5D6E04E; Wed, 6 Apr 2022 18:21:28 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.65.244.27]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTPS; Wed, 6 Apr 2022 18:21:28 +0000 (GMT) Date: Wed, 6 Apr 2022 14:21:26 -0400 To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner , Will Schmidt MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: vUBeItKMyDwo6ZPjqCBS3N0gpxvGHu2F X-Proofpoint-ORIG-GUID: YB08wtgh4mT5xKzLrHRXAOPnk8rvK0xr X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.850,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-04-06_09,2022-04-06_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 spamscore=0 mlxlogscore=999 suspectscore=0 phishscore=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204060090 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From bf51c49f1481001c7b3223474d261dcbf9365eda Mon Sep 17 00:00:00 2001 From: Michael Meissner Date: Fri, 1 Apr 2022 22:27:13 -0400 Subject: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation. This pattern adds zero_extendditi2 so that if we are extending DImode to TImode, and we want the result in a vector register, the compiler can generate MTVSRDDD. In addition the patterns for generating lxvr{b,h,w,d}x were tuned to allow loading to gpr registers. This prevents needlessly doing direct moves to get the value into the vector registers if the gpr register was already selected. In updating the insn counts for two tests due to these changes, I noticed the tests were done at -O0. I changed this so that the tests are now done at the normal -O2 optimization level. I have tested this patch with bootstrap builds and running the regression testsuite using this patch on: Little endian power10, --with-cpu=power10 Little endian power9, --with-cpu=power9 Big endian power8, --with-cpu=power8 (both 64/32-bit tests done). There were no regressions. Can I check this into the master branch? 2022-04-06 Michael Meissner gcc/ * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading to GPR registers. (vsx_stxvrx): Add support for storing from GPR registers. (zero_extendditi2): New insn. gcc/testsuite/ * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2 instead of -O0 and update insn counts. * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise. * gcc.target/powerpc/zero-extend-di-ti.c: New test. --- gcc/config/rs6000/vsx.md | 82 +++++++++++++++++-- .../powerpc/vsx-load-element-extend-int.c | 36 ++++---- .../powerpc/vsx-load-element-extend-short.c | 35 ++++---- .../gcc.target/powerpc/zero-extend-di-ti.c | 62 ++++++++++++++ 4 files changed, 164 insertions(+), 51 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index c091e5e2f47..ad971e3a1de 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_" } }) -;; Load rightmost element from load_data -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx. -(define_insn "vsx_lxvrx" - [(set (match_operand:TI 0 "vsx_register_operand" "=wa") - (zero_extend:TI (match_operand:INT_ISA3 1 "memory_operand" "Z")))] - "TARGET_POWER10" - "lxvrx %x0,%y1" - [(set_attr "type" "vecload")]) +;; Load rightmost element from load_data using lxvrbx, lxvrhx, lxvrwx, lxvrdx. +;; Support TImode being in a GPR register to prevent generating lvxr{d,w,b}x +;; and then two direct moves if we ultimately need the value in a GPR register. +(define_insn_and_split "vsx_lxvrx" + [(set (match_operand:TI 0 "register_operand" "=r,wa") + (zero_extend:TI (match_operand:INT_ISA3 1 "memory_operand" "m,Z")))] + "TARGET_POWERPC64 && TARGET_POWER10" + "@ + # + lxvrx %x0,%y1" + "&& reload_completed && int_reg_operand (operands[0], TImode)" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (const_int 0))] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + + operands[2] = gen_lowpart (DImode, op0); + operands[3] = (mode == DImode + ? op1 + : gen_rtx_ZERO_EXTEND (DImode, op1)); + + operands[4] = gen_highpart (DImode, op0); +} + [(set_attr "type" "load,vecload") + (set_attr "num_insns" "2,*")]) ;; Store rightmost element into store_data ;; using stxvrbx, stxvrhx, strvxwx, strvxdx. @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di" DONE; }) +;; Zero extend DI to TI. If we don't have the MTVSRDD instruction (and LXVRDX +;; in the case of power10), we use the machine independent code. If we are +;; loading up GPRs, we fall back to the old code. +(define_insn_and_split "zero_extendditi2" + [(set (match_operand:TI 0 "register_operand" "=r,r, wa,&wa") + (zero_extend:TI (match_operand:DI 1 "register_operand" "r,wa,r, wa")))] + "TARGET_POWERPC64 && TARGET_P9_VECTOR" + "@ + # + # + mtvsrdd %x0,0,%1 + #" + "&& reload_completed + && (int_reg_operand (operands[0], TImode) + || vsx_register_operand (operands[1], DImode))" + [(pc)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + int dest_regno = reg_or_subregno (dest); + + /* Handle conversion to GPR registers. Load up the low part and then do + zero out the upper part. */ + if (INT_REGNO_P (dest_regno)) + { + rtx dest_hi = gen_highpart (DImode, dest); + rtx dest_lo = gen_lowpart (DImode, dest); + + emit_move_insn (dest_lo, src); + emit_move_insn (dest_hi, const0_rtx); + DONE; + } + + /* For settomg a VSX register from another VSX register, clear the result + register, and use XXPERMDI to shift the value into the lower 64-bits. */ + rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno); + rtx dest_di = gen_rtx_REG (DImode, dest_regno); + + emit_move_insn (dest_v2di, CONST0_RTX (V2DImode)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vsx_concat_v2di (dest_v2di, dest_di, src)); + else + emit_insn (gen_vsx_concat_v2di (dest_v2di, src, dest_di)); + DONE; +} + [(set_attr "type" "integer,mfvsr,vecmove,vecperm") + (set_attr "length" "8, 8, *, 8")]) + ;; Sign extend DI to TI. We provide both GPR targets and Altivec targets on ;; power10. On earlier systems, the machine independent code will generate a ;; shift left to sign extend the 64-bit value to 128-bit. diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c index c40e1a3a0f7..1f1281d6b75 100644 --- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c @@ -6,33 +6,25 @@ /* { dg-do compile { target { ! power10_hw } } } */ /* { dg-require-effective-target power10_ok } */ /* { dg-require-effective-target int128 } */ - -/* Deliberately set optization to zero for this test to confirm - the lxvr*x instruction is generated. At higher optimization levels - the instruction we are looking for is sometimes replaced by other - load instructions. */ -/* { dg-options "-mdejagnu-cpu=power10 -O0 -save-temps" } */ - +/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */ /* { dg-final { scan-assembler-times {\mlxvrwx\M} 2 } } */ #define NUM_VEC_ELEMS 4 #define ITERS 16 -/* -Codegen at time of writing is a single lxvrwx for the zero -extended test, and a lwax,mtvsrdd,vextsd2q for the sign -extended test. - -0000000010000c90 : - 10000c90: aa 1a 24 7d lwax r9,r4,r3 - 10000c94: 67 4b 40 7c mtvsrdd vs34,0,r9 - 10000c98: 02 16 5b 10 vextsd2q v2,v2 - 10000c9c: 20 00 80 4e blr - -0000000010000cb0 : - 10000cb0: 9b 18 44 7c lxvrwx vs34,r4,r3 - 10000cb4: 20 00 80 4e blr -*/ +/* Codegen at time of writing is a single lxvrwx for the zero extended test, + and a lxvrwx + vexts* sign extension instructions for the sign extended + test. + + 0000000000000000 : + 0: 9b 18 44 7c lxvrwx vs34,r4,r3 + 4: 02 16 5a 10 vextsw2d v2,v2 + 8: 02 16 5b 10 vextsd2q v2,v2 + c: 20 00 80 4e blr + + 0000000000000020 : + 20: 9b 18 44 7c lxvrwx vs34,r4,r3 + 24: 20 00 80 4e blr */ #include #include diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c index 837ba79c9ab..a7721318812 100644 --- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c @@ -6,33 +6,26 @@ /* { dg-do compile { target { ! power10_hw } } } */ /* { dg-require-effective-target power10_ok } */ /* { dg-require-effective-target int128 } */ - -/* Deliberately set optization to zero for this test to confirm - the lxvr*x instruction is generated. At higher optimization levels - the instruction we are looking for is sometimes replaced by other - load instructions. */ -/* { dg-options "-mdejagnu-cpu=power10 -O0 -save-temps" } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */ /* { dg-final { scan-assembler-times {\mlxvrhx\M} 2 } } */ #define NUM_VEC_ELEMS 8 #define ITERS 16 -/* -Codegen at time of writing uses lxvrhx for the zero -extension test and lhax,mtvsrdd,vextsd2q for the -sign extended test. - -0000000010001810 : - 10001810: ae 1a 24 7d lhax r9,r4,r3 - 10001814: 67 4b 40 7c mtvsrdd vs34,0,r9 - 10001818: 02 16 5b 10 vextsd2q v2,v2 - 1000181c: 20 00 80 4e blr - -0000000010001830 : - 10001830: 5b 18 44 7c lxvrhx vs34,r4,r3 - 10001834: 20 00 80 4e blr -*/ +/* Codegen at time of writing is a single lxvrwx for the zero extended test, + and a lxvrwx + vexts* sign extension instructions for the sign extended + test. + + 0000000000000000 : + 0: 5b 18 44 7c lxvrhx vs34,r4,r3 + 4: 02 16 59 10 vextsh2d v2,v2 + 8: 02 16 5b 10 vextsd2q v2,v2 + c: 20 00 80 4e blr + + 0000000000000020 : + 20: 5b 18 44 7c lxvrhx vs34,r4,r3 + 24: 20 00 80 4e blr */ #include #include diff --git a/gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c b/gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c new file mode 100644 index 00000000000..9b3b9c4dbd0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c @@ -0,0 +1,62 @@ +/* { dg-require-effective-target int128 } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* This patch makes sure the various optimization and code paths are done for + zero extending DImode to TImode on power10. */ + +__uint128_t +gpr_to_gpr (unsigned long long a) +{ + /* li 4,0. */ + return a; +} + +__uint128_t +mem_to_gpr (unsigned long long *p) +{ + /* ld 3,0(3); li 4,0. */ + return *p; +} + +__uint128_t +vsx_to_gpr (__uint128_t *p, double d) +{ + /* fctiduz 1,1; li 4,0;mfvsrd 3,1. */ + return (unsigned long long)d; +} + +void +gpr_to_vsx (__uint128_t *p, unsigned long long a) +{ + /* mtvsrdd 0,0,4; stxv 0,0(3). */ + __uint128_t b = a; + __asm__ (" # %x0" : "+wa" (b)); + *p = b; +} + +void +mem_to_vsx (__uint128_t *p, unsigned long long *q) +{ + /* lxvrdx 0,0,4; stxv 0,0(3). */ + __uint128_t a = *q; + __asm__ (" # %x0" : "+wa" (a)); + *p = a; +} + +void +vsx_to_vsx (__uint128_t *p, double d) +{ + /* fctiduz 1,1; xxspltib 0,0; xxpermdi 0,0,1,0; stxv 0,0(3). */ + __uint128_t a = (unsigned long long)d; + __asm__ (" # %x0" : "+wa" (a)); + *p = a; +} + +/* { dg-final { scan-assembler-times {\mli\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mld\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlxvrdx\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mmfvsrd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstxv\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */