From patchwork Fri Nov 19 14:53:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 47935 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BAD0F3858022 for ; Fri, 19 Nov 2021 14:54:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BAD0F3858022 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1637333676; bh=zLELYuRcTmssJIHdSMdkXwEjPzvR7VHavZt8m4WQCrg=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=jel4wMCxz0g5K0P9Ur4KbZZkENeW1Te7FYUi5Vcp0rbadAuVuipRwv81MfMtlosrL uzUL4rYWoekH7qKCq9J2SNcPCE0pX9+rrehHokrz6vxQ7zp5E8jd+xhJiR43lYnpqz VsQmqAkHK0+b1qiqElhm4f36lnAm/D0sHO8NAuaQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 2CDC53858014 for ; Fri, 19 Nov 2021 14:53:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2CDC53858014 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1AJEBrdi028020; Fri, 19 Nov 2021 14:53:19 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ceae2da05-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Nov 2021 14:53:19 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1AJEC0YQ032092; Fri, 19 Nov 2021 14:53:19 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ceae2d9yy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Nov 2021 14:53:19 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1AJErARc002001; Fri, 19 Nov 2021 14:53:18 GMT Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by ppma03wdc.us.ibm.com with ESMTP id 3ca50cx8d9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Nov 2021 14:53:18 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1AJErGo519792278 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 19 Nov 2021 14:53:17 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DDB9E12405C; Fri, 19 Nov 2021 14:53:16 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 217BA124053; Fri, 19 Nov 2021 14:53:16 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.65.240.210]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTPS; Fri, 19 Nov 2021 14:53:15 +0000 (GMT) Date: Fri, 19 Nov 2021 09:53:14 -0500 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt Subject: [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: OsXTI-dt2fHiu3mSRKEf_ugrUeV-aUhZ X-Proofpoint-ORIG-GUID: cEsEEEmvpq7R_DeWXmeGLgY7DY0-KdgK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-19_09,2021-11-17_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 phishscore=0 mlxscore=0 bulkscore=0 priorityscore=1501 suspectscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111190081 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Add power10 zero cycle moves for switches. Power10 will fuse adjacenet 'mtctr' and 'bctr' instructions to form zero cycle moves. This code exploits this fusion opportunity. I have built bootstrapped compilers with this patch on little endian power9 and power10 systems with no regressions. Can I install this into the master branch? 2021-11-19 Michael Meissner * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add support for -mpower10-fusion-zero-cycle. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise. * config/rs6000/rs6000.md (indirect_jump): Support zero cycle moves. (indirect_jump_zero_cycle): New insns. (tablejump_normal): Likewise. (tablejump_absolute): Likewise. (tablejump_insn_zero_cycle): New insn. * config/rs6000/rs6000.opt (-mpower10-fusion-zero-cycle): New debug switch. --- gcc/config/rs6000/rs6000-cpus.def | 4 ++- gcc/config/rs6000/rs6000.c | 4 +++ gcc/config/rs6000/rs6000.md | 52 ++++++++++++++++++++++++++++--- gcc/config/rs6000/rs6000.opt | 4 +++ 4 files changed, 59 insertions(+), 5 deletions(-) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index f5812da0184..cc072ee94ea 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -91,7 +91,8 @@ | OPTION_MASK_P10_FUSION_LOGADD \ | OPTION_MASK_P10_FUSION_ADDLOG \ | OPTION_MASK_P10_FUSION_2ADD \ - | OPTION_MASK_P10_FUSION_2STORE) + | OPTION_MASK_P10_FUSION_2STORE \ + | OPTION_MASK_P10_FUSION_ZERO_CYCLE) /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \ @@ -145,6 +146,7 @@ | OPTION_MASK_P10_FUSION_ADDLOG \ | OPTION_MASK_P10_FUSION_2ADD \ | OPTION_MASK_P10_FUSION_2STORE \ + | OPTION_MASK_P10_FUSION_ZERO_CYCLE \ | OPTION_MASK_HTM \ | OPTION_MASK_ISEL \ | OPTION_MASK_MFCRF \ diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index e4843eb0f1c..6780304a5eb 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4497,6 +4497,10 @@ rs6000_option_override_internal (bool global_init_p) && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0) rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE; + if (TARGET_POWER10 + && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_ZERO_CYCLE) == 0) + rs6000_isa_flags |= OPTION_MASK_P10_FUSION_ZERO_CYCLE; + /* Turn off vector pair/mma options on non-power10 systems. */ else if (!TARGET_POWER10 && TARGET_MMA) { diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 6bec2bddbde..ea41eb4ada3 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -12988,15 +12988,34 @@ (define_expand "indirect_jump" emit_jump_insn (gen_indirect_jump_nospec (Pmode, operands[0], ccreg)); DONE; } + if (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE) + { + emit_jump_insn (gen_indirect_jump_zero_cycle (Pmode, operands[0])); + DONE; + } }) (define_insn "*indirect_jump" [(set (pc) (match_operand:P 0 "register_operand" "c,*l"))] - "rs6000_speculate_indirect_jumps" + "rs6000_speculate_indirect_jumps + && !(TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)" "b%T0" [(set_attr "type" "jmpreg")]) +(define_insn "@indirect_jump_zero_cycle" + [(set (pc) + (match_operand:P 0 "register_operand" "r,r,!cl")) + (clobber (match_scratch:P 1 "=c,*l,X"))] + "rs6000_speculate_indirect_jumps && TARGET_P10_FUSION + && TARGET_P10_FUSION_ZERO_CYCLE" + "@ + mt%T1 %0\;b%T1 + mt%T1 %0\;b%T1 + b%T0" + [(set_attr "type" "jmpreg") + (set_attr "length" "8,8,4")]) + (define_insn "@indirect_jump_nospec" [(set (pc) (match_operand:P 0 "register_operand" "c,*l")) (clobber (match_operand:CC 1 "cc_reg_operand" "=y,y"))] @@ -13050,7 +13069,11 @@ (define_expand "@tablejump_normal" rtx addr = gen_reg_rtx (Pmode); emit_insn (gen_add3 (addr, off, lab)); - emit_jump_insn (gen_tablejump_insn_normal (Pmode, addr, operands[1])); + rtx insn = (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE + ? gen_tablejump_insn_zero_cycle (Pmode, addr, operands[1]) + : gen_tablejump_insn_normal (Pmode, addr, operands[1])); + + emit_jump_insn (insn); DONE; }) @@ -13062,7 +13085,11 @@ (define_expand "@tablejump_absolute" rtx addr = gen_reg_rtx (Pmode); emit_move_insn (addr, operands[0]); - emit_jump_insn (gen_tablejump_insn_normal (Pmode, addr, operands[1])); + rtx insn = (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE + ? gen_tablejump_insn_zero_cycle (Pmode, addr, operands[1]) + : gen_tablejump_insn_normal (Pmode, addr, operands[1])); + + emit_jump_insn (insn); DONE; }) @@ -13107,10 +13134,27 @@ (define_insn "@tablejump_insn_normal" [(set (pc) (match_operand:P 0 "register_operand" "c,*l")) (use (label_ref (match_operand 1)))] - "rs6000_speculate_indirect_jumps" + "rs6000_speculate_indirect_jumps + && !(TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)" "b%T0" [(set_attr "type" "jmpreg")]) +;; Version of indirect jump that fuses the mtctr to bctr to achieve 0 cycle +;; moves on Power10. +(define_insn "@tablejump_insn_zero_cycle" + [(set (pc) + (match_operand:P 0 "register_operand" "r,r,!cl")) + (use (label_ref (match_operand 1))) + (clobber (match_scratch:P 2 "=c,*l,X"))] + "rs6000_speculate_indirect_jumps && TARGET_P10_FUSION + && TARGET_P10_FUSION_ZERO_CYCLE" + "@ + mt%T2 %0\;b%T2 + mt%T2 %0\;b%T2 + b%T0" + [(set_attr "type" "jmpreg") + (set_attr "length" "8,8,4")]) + (define_insn "@tablejump_insn_nospec" [(set (pc) (match_operand:P 0 "register_operand" "c,*l")) diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 9d7878f144a..ba674947557 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -518,6 +518,10 @@ mpower10-fusion-2store Target Undocumented Mask(P10_FUSION_2STORE) Var(rs6000_isa_flags) Fuse certain store operations together for better performance on power10. +mpower10-fusion-zero-cycle +Target Undocumented Mask(P10_FUSION_ZERO_CYCLE) Var(rs6000_isa_flags) +Fuse move to special register and jump for better performance on power10. + mcrypto Target Mask(CRYPTO) Var(rs6000_isa_flags) Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.