From patchwork Fri May 13 16:17:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 53961 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3C0A63856DEF for ; Fri, 13 May 2022 16:18:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3C0A63856DEF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1652458680; bh=wLp7D+iwBB31FARs6O5EhSAn//OGvx6fgexQTa+NVpQ=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=yAWK5h7RU2z0hYmmesbJhR9BNOIBZ6GEpd60qHTcOr971mLNcwckA5wGGv3421AX5 u0wCIT8gfjgul5y/BcPMBFzfCr9PdYDGKEUqIqiHNTf9NfTSNgMN4+rdl/DdeyC4Qz 5MQG5atWn0mHDZSfIfMyNcqK89V/O0ycI/xiCBX8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 67A943857368 for ; Fri, 13 May 2022 16:17:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 67A943857368 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24DFBpwT023632; Fri, 13 May 2022 16:17:28 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1srpsaww-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 May 2022 16:17:28 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24DFsnXw028590; Fri, 13 May 2022 16:17:27 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3g1srpsawn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 May 2022 16:17:27 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24DG8QCk024540; Fri, 13 May 2022 16:17:27 GMT Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by ppma02dal.us.ibm.com with ESMTP id 3fwgdb665e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 May 2022 16:17:26 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 24DGHPOB15335850 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 13 May 2022 16:17:25 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7E3D36A04F; Fri, 13 May 2022 16:17:25 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E12E06A04D; Fri, 13 May 2022 16:17:24 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.65.255.130]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTPS; Fri, 13 May 2022 16:17:24 +0000 (GMT) Date: Fri, 13 May 2022 12:17:23 -0400 To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ajaTINxECsvdJVRaIlT2vPkmwYw6IFLI X-Proofpoint-ORIG-GUID: 8JgbZn8PAngXl2VCfTSRtjQIor0YiCLH X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-13_08,2022-05-13_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 suspectscore=0 bulkscore=0 adultscore=0 spamscore=0 clxscore=1015 phishscore=0 mlxlogscore=998 impostorscore=0 malwarescore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2205130069 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Optimize multiply/add of DImode extended to TImode, PR target/103109. On power9 and power10 systems, we have instructions that support doing 64-bit integers converted to 128-bit integers and producing 128-bit results. This patch adds support to generate these instructions. Previously GCC had define_expands to handle conversion of the 64-bit extend to 128-bit and multiply. This patch changes these define_expands to define_insn_and_split and then it provides combiner patterns to generate thes multiply/add instructions. To support using this optimization on power9, this patch extends the sign extend DImode to TImode to also run on power9 (added for PR target/104698). This patch needs the previous patch to add unsigned DImode to TImode conversion so that the combiner can combine the extend, multiply, and add instructions. I have built this patch on little endian power10, little endian power9, and big endian power8 systems. There were no regressions when I ran it. Can I install this patch into the GCC 13 master branch? 2022-05-13 Michael Meissner gcc/ PR target/103109 * config/rs6000/rs6000.md (su_int32): New code attribute. (mul3): Convert from define_expand to define_insn_and_split. (maddld4): Add generator function. (mulditi3_adddi3): New insn. (mulditi3_add_const): New insn. (mulditi3_adddi3_upper): New insn. gcc/testsuite/ PR target/103109 * gcc.target/powerpc/pr103109.c: New test. --- gcc/config/rs6000/rs6000.md | 128 +++++++++++++++++++- gcc/testsuite/gcc.target/powerpc/pr103109.c | 62 ++++++++++ 2 files changed, 184 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103109.c diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2aba70393d8..83eacec57ba 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -667,6 +667,9 @@ (define_code_attr uns [(fix "") (float "") (unsigned_float "uns")]) +(define_code_attr su_int32 [(sign_extend "s32bit_cint_operand") + (zero_extend "c32bit_cint_operand")]) + ; Various instructions that come in SI and DI forms. ; A generic w/d attribute, for things like cmpw/cmpd. (define_mode_attr wd [(QI "b") @@ -3190,13 +3193,16 @@ (define_insn "mulsi3_highpart_64" "mulhw %0,%1,%2" [(set_attr "type" "mul")]) -(define_expand "mul3" - [(set (match_operand: 0 "gpc_reg_operand") +(define_insn_and_split "mul3" + [(set (match_operand: 0 "gpc_reg_operand" "=&r") (mult: (any_extend: - (match_operand:GPR 1 "gpc_reg_operand")) + (match_operand:GPR 1 "gpc_reg_operand" "r")) (any_extend: - (match_operand:GPR 2 "gpc_reg_operand"))))] + (match_operand:GPR 2 "gpc_reg_operand" "r"))))] "!(mode == SImode && TARGET_POWERPC64)" + "#" + "&& 1" + [(pc)] { rtx l = gen_reg_rtx (mode); rtx h = gen_reg_rtx (mode); @@ -3205,9 +3211,10 @@ (define_expand "mul3" emit_move_insn (gen_lowpart (mode, operands[0]), l); emit_move_insn (gen_highpart (mode, operands[0]), h); DONE; -}) +} + [(set_attr "length" "8")]) -(define_insn "*maddld4" +(define_insn "maddld4" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (plus:GPR (mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") (match_operand:GPR 2 "gpc_reg_operand" "r")) @@ -3216,6 +3223,115 @@ (define_insn "*maddld4" "maddld %0,%1,%2,%3" [(set_attr "type" "mul")]) +(define_insn_and_split "*mulditi3_adddi3" + [(set (match_operand:TI 0 "gpc_reg_operand" "=&r") + (plus:TI + (mult:TI + (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) + (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"))))] + "TARGET_MADDLD && TARGET_POWERPC64" + "#" + "&& 1" + [(pc)] +{ + rtx dest = operands[0]; + rtx dest_hi = gen_highpart (DImode, dest); + rtx dest_lo = gen_lowpart (DImode, dest); + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op3 = operands[3]; + rtx tmp_hi, tmp_lo; + + if (can_create_pseudo_p ()) + { + tmp_hi = gen_reg_rtx (DImode); + tmp_lo = gen_reg_rtx (DImode); + } + else + { + tmp_hi = dest_hi; + tmp_lo = dest_lo; + } + + emit_insn (gen_mulditi3_adddi3_upper (tmp_hi, op1, op2, op3)); + emit_insn (gen_maddlddi4 (tmp_lo, op1, op2, op3)); + + if (can_create_pseudo_p ()) + { + emit_move_insn (dest_hi, tmp_hi); + emit_move_insn (dest_lo, tmp_lo); + } + DONE; +} + [(set_attr "length" "8")]) + +;; Optimize 128-bit multiply with zero/sign extend and adding a constant. We +;; force the constant into a register to generate li, maddhd, and maddld, +;; instead of mulld, mulhd, addic, and addze. We can't combine this pattern +;; with the pattern that handles registers, since constants don't have a sign +;; or zero extend around them. +(define_insn_and_split "*mulditi3_add_const" + [(set (match_operand:TI 0 "gpc_reg_operand" "=&r") + (plus:TI + (mult:TI + (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) + (match_operand 3 "" "r")))] + "TARGET_MADDLD && TARGET_POWERPC64 +" + "#" + "&& 1" + [(pc)] +{ + rtx dest = operands[0]; + rtx dest_hi = gen_highpart (DImode, dest); + rtx dest_lo = gen_lowpart (DImode, dest); + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op3 = force_reg (DImode, operands[3]); + rtx tmp_hi, tmp_lo; + + if (can_create_pseudo_p ()) + { + tmp_hi = gen_reg_rtx (DImode); + tmp_lo = gen_reg_rtx (DImode); + } + else + { + tmp_hi = dest_hi; + tmp_lo = dest_lo; + } + + emit_insn (gen_mulditi3_adddi3_upper (tmp_hi, op1, op2, op3)); + emit_insn (gen_maddlddi4 (tmp_lo, op1, op2, op3)); + + if (can_create_pseudo_p ()) + { + emit_move_insn (dest_hi, tmp_hi); + emit_move_insn (dest_lo, tmp_lo); + } + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "mul") + (set_attr "size" "64")]) + +(define_insn "mulditi3_adddi3_upper" + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") + (truncate:DI + (lshiftrt:TI + (plus:TI + (mult:TI + (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) + (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"))) + (const_int 64))))] + "TARGET_MADDLD && TARGET_POWERPC64" + "maddhd %0,%1,%2,%3" + [(set_attr "type" "mul") + (set_attr "size" "64")]) + (define_insn "udiv3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") diff --git a/gcc/testsuite/gcc.target/powerpc/pr103109.c b/gcc/testsuite/gcc.target/powerpc/pr103109.c new file mode 100644 index 00000000000..ae2cfb9eda7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103109.c @@ -0,0 +1,62 @@ +/* { dg-require-effective-target int128 } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* This test makes sure that GCC generates the maddhd, maddhdu, and maddld + power9 instructions when doing some forms of 64-bit integers converted to + 128-bit integers and used with multiply/add operations. */ + +__int128_t +s_mult_add (long long a, + long long b, + long long c) +{ + /* maddhd, maddld. */ + return ((__int128_t)a * (__int128_t)b) + (__int128_t)c; +} + +/* Test 32-bit constants that are loaded into GPRs instead of doing the + mulld/mulhd and then addic/addime or addc/addze. */ +__int128_t +s_mult_add_m10 (long long a, + long long b) +{ + /* maddhd, maddld. */ + return ((__int128_t)a * (__int128_t)b) - 10; +} + +__int128_t +s_mult_add_70000 (long long a, + long long b) +{ + /* maddhd, maddld. */ + return ((__int128_t)a * (__int128_t)b) + 70000; +} + +__uint128_t +u_mult_add (unsigned long long a, + unsigned long long b, + unsigned long long c) +{ + /* maddhd, maddld. */ + return ((__uint128_t)a * (__uint128_t)b) + (__uint128_t)c; +} + +__uint128_t +u_mult_add_0x80000000 (unsigned long long a, + unsigned long long b) +{ + /* maddhd, maddld. */ + return ((__uint128_t)a * (__uint128_t)b) + 0x80000000UL; +} + +/* { dg-final { scan-assembler-not {\maddc\M} } } */ +/* { dg-final { scan-assembler-not {\madde\M} } } */ +/* { dg-final { scan-assembler-not {\maddid\M} } } */ +/* { dg-final { scan-assembler-not {\maddme\M} } } */ +/* { dg-final { scan-assembler-not {\maddze\M} } } */ +/* { dg-final { scan-assembler-not {\mmulhd\M} } } */ +/* { dg-final { scan-assembler-not {\mmulld\M} } } */ +/* { dg-final { scan-assembler-times {\mmaddhd\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mmaddhdu\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mmaddld\M} 5 } } */