From patchwork Fri May 13 16:13:37 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.ibm.com>
X-Patchwork-Id: 53960
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 43ED0383D816
	for <patchwork@sourceware.org>; Fri, 13 May 2022 16:14:14 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 43ED0383D816
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1652458454;
	bh=PQ7Q935Krb/z6mRWxDoNgrRZVOj+uIpy3dJObem8+d0=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=t1G3qJnz2pR3uSro4AbfWhhFOmE70sRya9RpppbcYw9x+iKcgTnTIu1lp1Uz/lRAl
	 iE+hnnC9eQYKMYYQT937rRAjxtd8Hr4Re/27K/WWH17aKmTLGV6tkrO0fqS6h3IaUS
	 2bD/8gqcJB8Ks7ghYu+ZecAD3cv8Bu9g9wyt0Ybk=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 449563857C4A
 for <gcc-patches@gcc.gnu.org>; Fri, 13 May 2022 16:13:44 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 449563857C4A
Received: from pps.filterd (m0098410.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 24DFoTCR029674;
 Fri, 13 May 2022 16:13:43 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1tay0fmr-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 16:13:42 +0000
Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1])
 by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24DFvCNh019945;
 Fri, 13 May 2022 16:13:42 GMT
Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com
 [169.63.214.131])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1tay0fmk-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 16:13:42 +0000
Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1])
 by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24DG7IWB015959;
 Fri, 13 May 2022 16:13:41 GMT
Received: from b03cxnp07028.gho.boulder.ibm.com
 (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15])
 by ppma01dal.us.ibm.com with ESMTP id 3fwgdbe5sm-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 16:13:41 +0000
Received: from b03ledav003.gho.boulder.ibm.com
 (b03ledav003.gho.boulder.ibm.com [9.17.130.234])
 by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 24DGDdnn30343664
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 13 May 2022 16:13:40 GMT
Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id E51C16A04F;
 Fri, 13 May 2022 16:13:39 +0000 (GMT)
Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 57B946A057;
 Fri, 13 May 2022 16:13:39 +0000 (GMT)
Received: from toto.the-meissners.org (unknown [9.65.255.130])
 by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTPS;
 Fri, 13 May 2022 16:13:39 +0000 (GMT)
Date: Fri, 13 May 2022 12:13:37 -0400
To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
Subject: [PATCH] Add zero_extendditi2.  Improve lxvr*x code generation.
Message-ID: <Yn6DsT5zDrzeuvsV@toto.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>,
 gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>,
 David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
MIME-Version: 1.0
Content-Disposition: inline
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: sLN6q3IBg7wfFFiw6fREjbJluR4h0g0t
X-Proofpoint-GUID: YyDxCKodXhGeWKKxMGkELp4Fs5JHbrCa
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514
 definitions=2022-05-13_04,2022-05-13_01,2022-02-23_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 bulkscore=0
 priorityscore=1501 mlxscore=0 mlxlogscore=999 lowpriorityscore=0
 spamscore=0 malwarescore=0 phishscore=0 impostorscore=0 clxscore=1015
 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.12.0-2202240000 definitions=main-2205130069
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT,
 RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Michael Meissner via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Michael Meissner <meissner@linux.ibm.com>
Reply-To: Michael Meissner <meissner@linux.ibm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Add zero_extendditi2.  Improve lxvr*x code generation.

This pattern adds zero_extendditi2 so that if we are extending DImode that
is in a GPR register to TImode in a vector register, the compiler can
generate MTVSRDDD.

In addition the patterns for generating lxvr{b,h,w,d}x were tuned to allow
loading to gpr registers.  This prevents needlessly doing direct moves to
get the value into the vector registers if the gpr register was already
selected.

In updating the insn counts for two tests due to these changes, I noticed
the tests were done at -O0.  I changed this so that the tests are now done
at the normal -O2 optimization level.

This patch will be needed for an upcoming patch for PR target/103109.

I have built this patch on little endian power10, little endian power9,
and big endian power8 systems.  There were no regressions with this
patch.  Can I install this on the GCC 13 trunk?

2022-05-013   Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/vsx.md (vsx_lxvr<wd>x): Add support for loading to
	GPR registers.
	(vsx_stxvr<wd>x): Add support for storing from GPR registers.
	(zero_extendditi2): New insn.

gcc/testsuite/
	* gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2
	instead of -O0 and update insn counts.
	* gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
	* gcc.target/powerpc/zero-extend-di-ti.c: New test.
---
 gcc/config/rs6000/vsx.md                      | 82 +++++++++++++++++--
 .../powerpc/vsx-load-element-extend-int.c     | 36 ++++----
 .../powerpc/vsx-load-element-extend-short.c   | 35 ++++----
 .../gcc.target/powerpc/zero-extend-di-ti.c    | 62 ++++++++++++++
 4 files changed, 164 insertions(+), 51 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index c091e5e2f47..ad971e3a1de 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1315,14 +1315,32 @@ (define_expand "vsx_store_<mode>"
     }
 })
 
-;; Load rightmost element from load_data
-;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx.
-(define_insn "vsx_lxvr<wd>x"
-  [(set (match_operand:TI 0 "vsx_register_operand" "=wa")
-	(zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "Z")))]
-  "TARGET_POWER10"
-  "lxvr<wd>x %x0,%y1"
-  [(set_attr "type" "vecload")])
+;; Load rightmost element from load_data using lxvrbx, lxvrhx, lxvrwx, lxvrdx.
+;; Support TImode being in a GPR register to prevent generating lvxr{d,w,b}x
+;; and then two direct moves if we ultimately need the value in a GPR register.
+(define_insn_and_split "vsx_lxvr<wd>x"
+  [(set (match_operand:TI 0 "register_operand" "=r,wa")
+	(zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "m,Z")))]
+  "TARGET_POWERPC64 && TARGET_POWER10"
+  "@
+   #
+   lxvr<wd>x %x0,%y1"
+  "&& reload_completed && int_reg_operand (operands[0], TImode)"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (const_int 0))]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+
+  operands[2] = gen_lowpart (DImode, op0);
+  operands[3] = (<MODE>mode == DImode
+		 ? op1
+		 : gen_rtx_ZERO_EXTEND (DImode, op1));
+
+  operands[4] = gen_highpart (DImode, op0);
+}
+  [(set_attr "type" "load,vecload")
+   (set_attr "num_insns" "2,*")])
 
 ;; Store rightmost element into store_data
 ;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
@@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di"
   DONE;
 })
 
+;; Zero extend DI to TI.  If we don't have the MTVSRDD instruction (and LXVRDX
+;; in the case of power10), we use the machine independent code.  If we are
+;; loading up GPRs, we fall back to the old code.
+(define_insn_and_split "zero_extendditi2"
+  [(set (match_operand:TI 0 "register_operand"                 "=r,r, wa,&wa")
+	(zero_extend:TI (match_operand:DI 1 "register_operand"  "r,wa,r,  wa")))]
+  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
+  "@
+   #
+   #
+   mtvsrdd %x0,0,%1
+   #"
+  "&& reload_completed
+   && (int_reg_operand (operands[0], TImode)
+       || vsx_register_operand (operands[1], DImode))"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  int dest_regno = reg_or_subregno (dest);
+
+  /* Handle conversion to GPR registers.  Load up the low part and then do
+     zero out the upper part.  */
+  if (INT_REGNO_P (dest_regno))
+    {
+      rtx dest_hi = gen_highpart (DImode, dest);
+      rtx dest_lo = gen_lowpart (DImode, dest);
+
+      emit_move_insn (dest_lo, src);
+      emit_move_insn (dest_hi, const0_rtx);
+      DONE;
+    }
+
+  /* For settomg a VSX register from another VSX register, clear the result
+     register, and use XXPERMDI to shift the value into the lower 64-bits.  */
+  rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
+  rtx dest_di = gen_rtx_REG (DImode, dest_regno);
+
+  emit_move_insn (dest_v2di, CONST0_RTX (V2DImode));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_vsx_concat_v2di (dest_v2di, dest_di, src));
+  else
+    emit_insn (gen_vsx_concat_v2di (dest_v2di, src, dest_di));
+  DONE;
+}
+  [(set_attr "type"   "integer,mfvsr,vecmove,vecperm")
+   (set_attr "length" "8,      8,    *,      8")])
+
 ;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets on
 ;; power10.  On earlier systems, the machine independent code will generate a
 ;; shift left to sign extend the 64-bit value to 128-bit.
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
index c40e1a3a0f7..1f1281d6b75 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
@@ -6,33 +6,25 @@
 /* { dg-do compile { target { ! power10_hw } } } */
 /* { dg-require-effective-target power10_ok } */
 /* { dg-require-effective-target int128 } */
-
-/* Deliberately set optization to zero for this test to confirm
-   the lxvr*x instruction is generated. At higher optimization levels
-   the instruction we are looking for is sometimes replaced by other
-   load instructions. */
-/* { dg-options "-mdejagnu-cpu=power10 -O0 -save-temps" } */
-
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */
 /* { dg-final { scan-assembler-times {\mlxvrwx\M} 2 } } */
 
 #define NUM_VEC_ELEMS 4
 #define ITERS 16
 
-/*
-Codegen at time of writing is a single lxvrwx for the zero
-extended test, and a lwax,mtvsrdd,vextsd2q for the sign
-extended test.
-
-0000000010000c90 <test_sign_extended_load>:
-    10000c90:	aa 1a 24 7d 	lwax    r9,r4,r3
-    10000c94:	67 4b 40 7c 	mtvsrdd vs34,0,r9
-    10000c98:	02 16 5b 10 	vextsd2q v2,v2
-    10000c9c:	20 00 80 4e 	blr
-
-0000000010000cb0 <test_zero_extended_unsigned_load>:
-    10000cb0:	9b 18 44 7c 	lxvrwx  vs34,r4,r3
-    10000cb4:	20 00 80 4e 	blr
-*/
+/* Codegen at time of writing is a single lxvrwx for the zero extended test,
+   and a lxvrwx + vexts* sign extension instructions for the sign extended
+   test.
+
+   0000000000000000 <test_sign_extended_load>:
+	0:   9b 18 44 7c     lxvrwx  vs34,r4,r3
+	4:   02 16 5a 10     vextsw2d v2,v2
+	8:   02 16 5b 10     vextsd2q v2,v2
+	c:   20 00 80 4e     blr
+
+   0000000000000020 <test_zero_extended_unsigned_load>:
+	20:   9b 18 44 7c     lxvrwx  vs34,r4,r3
+	24:   20 00 80 4e     blr  */
 
 #include <altivec.h>
 #include <stdio.h>
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c
index 837ba79c9ab..a7721318812 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c
@@ -6,33 +6,26 @@
 /* { dg-do compile { target { ! power10_hw } } } */
 /* { dg-require-effective-target power10_ok } */
 /* { dg-require-effective-target int128 } */
-
-/* Deliberately set optization to zero for this test to confirm
-   the lxvr*x instruction is generated. At higher optimization levels
-   the instruction we are looking for is sometimes replaced by other
-   load instructions.  */
-/* { dg-options "-mdejagnu-cpu=power10 -O0 -save-temps" } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */
 
 /* { dg-final { scan-assembler-times {\mlxvrhx\M} 2 } } */
 
 #define NUM_VEC_ELEMS 8
 #define ITERS 16
 
-/*
-Codegen at time of writing uses lxvrhx for the zero
-extension test and lhax,mtvsrdd,vextsd2q for the
-sign extended test.
-
-0000000010001810 <test_sign_extended_load>:
-    10001810:	ae 1a 24 7d 	lhax    r9,r4,r3
-    10001814:	67 4b 40 7c 	mtvsrdd vs34,0,r9
-    10001818:	02 16 5b 10 	vextsd2q v2,v2
-    1000181c:	20 00 80 4e 	blr
-
-0000000010001830 <test_zero_extended_unsigned_load>:
-    10001830:	5b 18 44 7c 	lxvrhx  vs34,r4,r3
-    10001834:	20 00 80 4e 	blr
-*/
+/* Codegen at time of writing is a single lxvrwx for the zero extended test,
+   and a lxvrwx + vexts* sign extension instructions for the sign extended
+   test.
+
+	0000000000000000 <test_sign_extended_load>:
+	   0:   5b 18 44 7c     lxvrhx  vs34,r4,r3
+	   4:   02 16 59 10     vextsh2d v2,v2
+	   8:   02 16 5b 10     vextsd2q v2,v2
+	   c:   20 00 80 4e     blr
+
+	0000000000000020 <test_zero_extended_unsigned_load>:
+	  20:   5b 18 44 7c     lxvrhx  vs34,r4,r3
+	  24:   20 00 80 4e     blr  */
 
 #include <altivec.h>
 #include <stdio.h>
diff --git a/gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c b/gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c
new file mode 100644
index 00000000000..9b3b9c4dbd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c
@@ -0,0 +1,62 @@
+/* { dg-require-effective-target int128     } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* This patch makes sure the various optimization and code paths are done for
+   zero extending DImode to TImode on power10.  */
+
+__uint128_t
+gpr_to_gpr (unsigned long long a)
+{
+  /* li 4,0.  */
+  return a;
+}
+
+__uint128_t
+mem_to_gpr (unsigned long long *p)
+{
+  /* ld 3,0(3); li 4,0.  */
+  return *p;
+}
+
+__uint128_t
+vsx_to_gpr (__uint128_t *p, double d)
+{
+  /* fctiduz 1,1; li 4,0;mfvsrd 3,1.  */
+  return (unsigned long long)d;
+}
+
+void
+gpr_to_vsx (__uint128_t *p, unsigned long long a)
+{
+  /* mtvsrdd 0,0,4; stxv 0,0(3).  */
+  __uint128_t b = a;
+  __asm__ (" # %x0" : "+wa" (b));
+  *p = b;
+}
+
+void
+mem_to_vsx (__uint128_t *p, unsigned long long *q)
+{
+  /* lxvrdx 0,0,4; stxv 0,0(3).  */
+  __uint128_t a = *q;
+  __asm__ (" # %x0" : "+wa" (a));
+  *p = a;
+}
+
+void
+vsx_to_vsx (__uint128_t *p, double d)
+{
+  /* fctiduz 1,1; xxspltib 0,0; xxpermdi 0,0,1,0; stxv 0,0(3).  */
+  __uint128_t a = (unsigned long long)d;
+  __asm__ (" # %x0" : "+wa" (a));
+  *p = a;
+}
+
+/* { dg-final { scan-assembler-times {\mli\M}       3 } } */
+/* { dg-final { scan-assembler-times {\mld\M}       1 } } */
+/* { dg-final { scan-assembler-times {\mlxvrdx\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mmfvsrd\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mmtvsrdd\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M}     3 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */