From patchwork Fri May 13 16:17:23 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.ibm.com>
X-Patchwork-Id: 53961
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3C0A63856DEF
	for <patchwork@sourceware.org>; Fri, 13 May 2022 16:18:00 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3C0A63856DEF
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1652458680;
	bh=wLp7D+iwBB31FARs6O5EhSAn//OGvx6fgexQTa+NVpQ=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=yAWK5h7RU2z0hYmmesbJhR9BNOIBZ6GEpd60qHTcOr971mLNcwckA5wGGv3421AX5
	 u0wCIT8gfjgul5y/BcPMBFzfCr9PdYDGKEUqIqiHNTf9NfTSNgMN4+rdl/DdeyC4Qz
	 5MQG5atWn0mHDZSfIfMyNcqK89V/O0ycI/xiCBX8=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 67A943857368
 for <gcc-patches@gcc.gnu.org>; Fri, 13 May 2022 16:17:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 67A943857368
Received: from pps.filterd (m0098399.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 24DFBpwT023632;
 Fri, 13 May 2022 16:17:28 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1srpsaww-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 16:17:28 +0000
Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1])
 by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24DFsnXw028590;
 Fri, 13 May 2022 16:17:27 GMT
Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com
 [169.62.189.10])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1srpsawn-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 16:17:27 +0000
Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1])
 by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24DG8QCk024540;
 Fri, 13 May 2022 16:17:27 GMT
Received: from b03cxnp08027.gho.boulder.ibm.com
 (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19])
 by ppma02dal.us.ibm.com with ESMTP id 3fwgdb665e-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 13 May 2022 16:17:26 +0000
Received: from b03ledav003.gho.boulder.ibm.com
 (b03ledav003.gho.boulder.ibm.com [9.17.130.234])
 by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 24DGHPOB15335850
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 13 May 2022 16:17:25 GMT
Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 7E3D36A04F;
 Fri, 13 May 2022 16:17:25 +0000 (GMT)
Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id E12E06A04D;
 Fri, 13 May 2022 16:17:24 +0000 (GMT)
Received: from toto.the-meissners.org (unknown [9.65.255.130])
 by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTPS;
 Fri, 13 May 2022 16:17:24 +0000 (GMT)
Date: Fri, 13 May 2022 12:17:23 -0400
To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
Subject: [PATCH] Optimize multiply/add of DImode extended to TImode, PR
 target/103109.
Message-ID: <Yn6Ek+CgdLQiXV73@toto.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>,
 gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>,
 David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
MIME-Version: 1.0
Content-Disposition: inline
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: ajaTINxECsvdJVRaIlT2vPkmwYw6IFLI
X-Proofpoint-ORIG-GUID: 8JgbZn8PAngXl2VCfTSRtjQIor0YiCLH
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514
 definitions=2022-05-13_08,2022-05-13_01,2022-02-23_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 priorityscore=1501 mlxscore=0
 suspectscore=0 bulkscore=0 adultscore=0 spamscore=0 clxscore=1015
 phishscore=0 mlxlogscore=998 impostorscore=0 malwarescore=0
 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2202240000 definitions=main-2205130069
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT,
 RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Michael Meissner via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Michael Meissner <meissner@linux.ibm.com>
Reply-To: Michael Meissner <meissner@linux.ibm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Optimize multiply/add of DImode extended to TImode, PR target/103109.

On power9 and power10 systems, we have instructions that support doing
64-bit integers converted to 128-bit integers and producing 128-bit
results.  This patch adds support to generate these instructions.

Previously GCC had define_expands to handle conversion of the 64-bit
extend to 128-bit and multiply.  This patch changes these define_expands
to define_insn_and_split and then it provides combiner patterns to
generate thes multiply/add instructions.

To support using this optimization on power9, this patch extends the sign
extend DImode to TImode to also run on power9 (added for PR
target/104698).

This patch needs the previous patch to add unsigned DImode to TImode
conversion so that the combiner can combine the extend, multiply, and add
instructions.

I have built this patch on little endian power10, little endian power9, and big
endian power8 systems.  There were no regressions when I ran it.  Can I install
this patch into the GCC 13 master branch?

2022-05-13   Michael Meissner  <meissner@linux.ibm.com>

gcc/
	PR target/103109
	* config/rs6000/rs6000.md (su_int32): New code attribute.
	(<u>mul<mode><dmode>3): Convert from define_expand to
	define_insn_and_split.
	(maddld<mode>4): Add generator function.
	(<u>mulditi3_<u>adddi3): New insn.
	(<u>mulditi3_add_const): New insn.
	(<u>mulditi3_<u>adddi3_upper): New insn.

gcc/testsuite/
	PR target/103109
	* gcc.target/powerpc/pr103109.c: New test.
---
 gcc/config/rs6000/rs6000.md                 | 128 +++++++++++++++++++-
 gcc/testsuite/gcc.target/powerpc/pr103109.c |  62 ++++++++++
 2 files changed, 184 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103109.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2aba70393d8..83eacec57ba 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -667,6 +667,9 @@ (define_code_attr uns [(fix		"")
 		       (float		"")
 		       (unsigned_float	"uns")])
 
+(define_code_attr su_int32 [(sign_extend "s32bit_cint_operand")
+			    (zero_extend "c32bit_cint_operand")])
+
 ; Various instructions that come in SI and DI forms.
 ; A generic w/d attribute, for things like cmpw/cmpd.
 (define_mode_attr wd [(QI    "b")
@@ -3190,13 +3193,16 @@ (define_insn "<su>mulsi3_highpart_64"
   "mulhw<u> %0,%1,%2"
   [(set_attr "type" "mul")])
 
-(define_expand "<u>mul<mode><dmode>3"
-  [(set (match_operand:<DMODE> 0 "gpc_reg_operand")
+(define_insn_and_split "<u>mul<mode><dmode>3"
+  [(set (match_operand:<DMODE> 0 "gpc_reg_operand" "=&r")
 	(mult:<DMODE> (any_extend:<DMODE>
-			(match_operand:GPR 1 "gpc_reg_operand"))
+		       (match_operand:GPR 1 "gpc_reg_operand" "r"))
 		      (any_extend:<DMODE>
-			(match_operand:GPR 2 "gpc_reg_operand"))))]
+		       (match_operand:GPR 2 "gpc_reg_operand" "r"))))]
   "!(<MODE>mode == SImode && TARGET_POWERPC64)"
+  "#"
+  "&& 1"
+  [(pc)]
 {
   rtx l = gen_reg_rtx (<MODE>mode);
   rtx h = gen_reg_rtx (<MODE>mode);
@@ -3205,9 +3211,10 @@ (define_expand "<u>mul<mode><dmode>3"
   emit_move_insn (gen_lowpart (<MODE>mode, operands[0]), l);
   emit_move_insn (gen_highpart (<MODE>mode, operands[0]), h);
   DONE;
-})
+}
+  [(set_attr "length" "8")])
 
-(define_insn "*maddld<mode>4"
+(define_insn "maddld<mode>4"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(plus:GPR (mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
 			    (match_operand:GPR 2 "gpc_reg_operand" "r"))
@@ -3216,6 +3223,115 @@ (define_insn "*maddld<mode>4"
   "maddld %0,%1,%2,%3"
   [(set_attr "type" "mul")])
 
+(define_insn_and_split "*<u>mulditi3_<u>adddi3"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=&r")
+	(plus:TI
+	 (mult:TI
+	  (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
+	  (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
+	 (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"))))]
+  "TARGET_MADDLD && TARGET_POWERPC64"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx dest_hi = gen_highpart (DImode, dest);
+  rtx dest_lo = gen_lowpart (DImode, dest);
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx op3 = operands[3];
+  rtx tmp_hi, tmp_lo;
+
+  if (can_create_pseudo_p ())
+    {
+      tmp_hi = gen_reg_rtx (DImode);
+      tmp_lo = gen_reg_rtx (DImode);
+    }
+  else
+    {
+      tmp_hi = dest_hi;
+      tmp_lo = dest_lo;
+    }
+
+  emit_insn (gen_<u>mulditi3_<u>adddi3_upper (tmp_hi, op1, op2, op3));
+  emit_insn (gen_maddlddi4 (tmp_lo, op1, op2, op3));
+
+  if (can_create_pseudo_p ())
+    {
+      emit_move_insn (dest_hi, tmp_hi);
+      emit_move_insn (dest_lo, tmp_lo);
+    }
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize 128-bit multiply with zero/sign extend and adding a constant.  We
+;; force the constant into a register to generate li, maddhd, and maddld,
+;; instead of mulld, mulhd, addic, and addze.  We can't combine this pattern
+;; with the pattern that handles registers, since constants don't have a sign
+;; or zero extend around them.
+(define_insn_and_split "*<u>mulditi3_add_const"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=&r")
+	(plus:TI
+	 (mult:TI
+	  (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
+	  (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
+	 (match_operand 3 "<su_int32>" "r")))]
+  "TARGET_MADDLD && TARGET_POWERPC64
+"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx dest_hi = gen_highpart (DImode, dest);
+  rtx dest_lo = gen_lowpart (DImode, dest);
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx op3 = force_reg (DImode, operands[3]);
+  rtx tmp_hi, tmp_lo;
+
+  if (can_create_pseudo_p ())
+    {
+      tmp_hi = gen_reg_rtx (DImode);
+      tmp_lo = gen_reg_rtx (DImode);
+    }
+  else
+    {
+      tmp_hi = dest_hi;
+      tmp_lo = dest_lo;
+    }
+
+  emit_insn (gen_<u>mulditi3_<u>adddi3_upper (tmp_hi, op1, op2, op3));
+  emit_insn (gen_maddlddi4 (tmp_lo, op1, op2, op3));
+
+  if (can_create_pseudo_p ())
+    {
+      emit_move_insn (dest_hi, tmp_hi);
+      emit_move_insn (dest_lo, tmp_lo);
+    }
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "mul")
+   (set_attr "size" "64")])
+
+(define_insn "<u>mulditi3_<u>adddi3_upper"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+	(truncate:DI
+	 (lshiftrt:TI
+	  (plus:TI
+	   (mult:TI
+	    (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
+	    (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
+	   (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r")))
+	  (const_int 64))))]
+  "TARGET_MADDLD && TARGET_POWERPC64"
+  "maddhd<u> %0,%1,%2,%3"
+  [(set_attr "type" "mul")
+   (set_attr "size" "64")])
+
 (define_insn "udiv<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
         (udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103109.c b/gcc/testsuite/gcc.target/powerpc/pr103109.c
new file mode 100644
index 00000000000..ae2cfb9eda7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103109.c
@@ -0,0 +1,62 @@
+/* { dg-require-effective-target int128     } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* This test makes sure that GCC generates the maddhd, maddhdu, and maddld
+   power9 instructions when doing some forms of 64-bit integers converted to
+   128-bit integers and used with multiply/add operations.  */
+
+__int128_t
+s_mult_add (long long a,
+	    long long b,
+	    long long c)
+{
+  /* maddhd, maddld.  */
+  return ((__int128_t)a * (__int128_t)b) + (__int128_t)c;
+}
+
+/* Test 32-bit constants that are loaded into GPRs instead of doing the
+   mulld/mulhd and then addic/addime or addc/addze.  */
+__int128_t
+s_mult_add_m10 (long long a,
+		long long b)
+{
+  /* maddhd, maddld.  */
+  return ((__int128_t)a * (__int128_t)b) - 10;
+}
+
+__int128_t
+s_mult_add_70000 (long long a,
+		  long long b)
+{
+  /* maddhd, maddld.  */
+  return ((__int128_t)a * (__int128_t)b) + 70000;
+}
+
+__uint128_t
+u_mult_add (unsigned long long a,
+	    unsigned long long b,
+	    unsigned long long c)
+{
+  /* maddhd, maddld.  */
+  return ((__uint128_t)a * (__uint128_t)b) + (__uint128_t)c;
+}
+
+__uint128_t
+u_mult_add_0x80000000 (unsigned long long a,
+		       unsigned long long b)
+{
+  /* maddhd, maddld.  */
+  return ((__uint128_t)a * (__uint128_t)b) + 0x80000000UL;
+}
+
+/* { dg-final { scan-assembler-not   {\maddc\M}      } } */
+/* { dg-final { scan-assembler-not   {\madde\M}      } } */
+/* { dg-final { scan-assembler-not   {\maddid\M}     } } */
+/* { dg-final { scan-assembler-not   {\maddme\M}     } } */
+/* { dg-final { scan-assembler-not   {\maddze\M}     } } */
+/* { dg-final { scan-assembler-not   {\mmulhd\M}     } } */
+/* { dg-final { scan-assembler-not   {\mmulld\M}     } } */
+/* { dg-final { scan-assembler-times {\mmaddhd\M}  3 } } */
+/* { dg-final { scan-assembler-times {\mmaddhdu\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mmaddld\M}  5 } } */