From patchwork Thu Nov 10 02:44:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 60324 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C1ABB38582BD for ; Thu, 10 Nov 2022 02:45:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C1ABB38582BD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668048316; bh=cX70Wfqj3bXl8JCTTXFOHhOSdN4YMCELxAV8314Dqj8=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=C+0rPLW/gHtz9VS21gSwTfdL7OagQMcyW+7Fr90eBFvFn526CTXpZ3Na3mRXS24zb gZpL9Avg71rHfxu144skCtiMDJxZIS9JFWQKEETw3SAhVg9JqYkGdL031L0rXfn73c oRF6ykDirWRR8jCe7Z+KX3UuWgLOuS0cGEcVfA0Q= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id EAEBC3858D38 for ; Thu, 10 Nov 2022 02:44:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EAEBC3858D38 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2AA18EZD027004; Thu, 10 Nov 2022 02:44:45 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krkguf16b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:44:45 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AA2ijjP022693; Thu, 10 Nov 2022 02:44:45 GMT Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krkguf163-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:44:45 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AA2Z7d4011893; Thu, 10 Nov 2022 02:44:44 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma04dal.us.ibm.com with ESMTP id 3kngmtj5fh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:44:44 +0000 Received: from smtpav03.wdc07v.mail.ibm.com ([9.208.128.112]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AA2igaI1704456 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 02:44:43 GMT Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 403695805A; Thu, 10 Nov 2022 02:44:42 +0000 (GMT) Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6C2195805C; Thu, 10 Nov 2022 02:44:41 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.5.6]) by smtpav03.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Thu, 10 Nov 2022 02:44:41 +0000 (GMT) Date: Wed, 9 Nov 2022 21:44:39 -0500 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH 1/6] PowerPC: Add -mcpu=future Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt References: Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: De6GoxbACVF2G05QGK1VVKFsS6qUH6EN X-Proofpoint-GUID: MmY-VYf-5aymXw6x13EorZEhF320MV29 X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-09_06,2022-11-09_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxscore=0 malwarescore=0 mlxlogscore=999 spamscore=0 clxscore=1015 lowpriorityscore=0 adultscore=0 bulkscore=0 suspectscore=0 priorityscore=1501 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211100016 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds support for the -mcpu=future and -mtune=future options. Besides defining __ARCH_PWR_FUTURE__ this particular patch does not enable any new features. These patches implement support for potential future PowerPC cpus. At this time, features enabled with -mcpu=future may or may not be in actual PowerPCs that will be delivered in the future. At present, we do not have any specific differences in terms of cpu tuning for future machines, so we make -mtune=future act the same as -mtune=power10. It is anticipated that we may add support for changing the tuning characteristics for -mtune=future at a later time. The patches have been tested on the following platforms. I added the patches for PR target/107299 that I submitted on November 2nd before doing the builds so that GCC would build on systems using IEEE 128-bit long double. * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html There were no regressions with doing bootstrap builds and running the regression tests: 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). Can I check this patch into the GCC 13 master branch? 2022-11-09 Michael Meissner gcc/ * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define __ARCH_PWR_FUTURE__ if -mcpu=future. * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro. (POWERPC_MASKS): Add -mfuture. * config/rs6000/rs6000-opts.h (enum processor_type): Add PROCESSOR_FUTURE. * config/rs6000/rs6000-tables.opt: Regenerate. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Add -mcpu=future support. Make -mtune=future act like -mtune=power10 for now. (rs6000_machine_from_flags): Likewise. (rs6000_reassociation_width): Likewise. (rs6000_adjust_cost): Likewise. (rs6000_issue_rate): Likewise. (rs6000_sched_reorder): Likewise. (rs6000_sched_reorder2): Likewise. (rs6000_register_move_cost): Likewise. (rs6000_opt_masks): Add -mfuture. * config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise. * config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch. * config/rs6000/rs6000.md (cpu attribute): Add -mcpu=future support. * doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document -mcpu=future. --- gcc/config/rs6000/rs6000-c.cc | 2 ++ gcc/config/rs6000/rs6000-cpus.def | 6 ++++++ gcc/config/rs6000/rs6000-opts.h | 4 +++- gcc/config/rs6000/rs6000-tables.opt | 3 +++ gcc/config/rs6000/rs6000.cc | 27 +++++++++++++++++++++++---- gcc/config/rs6000/rs6000.h | 1 + gcc/config/rs6000/rs6000.md | 2 +- gcc/config/rs6000/rs6000.opt | 4 ++++ gcc/doc/invoke.texi | 2 +- 9 files changed, 44 insertions(+), 7 deletions(-) diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 5c2f3bcee9f..0d7b43f8edb 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9"); if ((flags & OPTION_MASK_POWER10) != 0) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10"); + if ((flags & OPTION_MASK_FUTURE) != 0) + rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE"); if ((flags & OPTION_MASK_SOFT_FLOAT) != 0) rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT"); if ((flags & OPTION_MASK_RECIP_PRECISION) != 0) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index c3825bcccd8..5eac7d97e65 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -87,6 +87,10 @@ | OTHER_POWER10_MASKS \ | OPTION_MASK_P10_FUSION) +/* Flags for a potential future processor that may or may not be delivered. */ +#define ISA_FUTURE_MASKS (ISA_3_1_MASKS_SERVER \ + | OPTION_MASK_FUTURE) + /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \ | OPTION_MASK_P9_MINMAX) @@ -133,6 +137,7 @@ | OPTION_MASK_FPRND \ | OPTION_MASK_POWER10 \ | OPTION_MASK_P10_FUSION \ + | OPTION_MASK_FUTURE \ | OPTION_MASK_HTM \ | OPTION_MASK_ISEL \ | OPTION_MASK_MFCRF \ @@ -264,3 +269,4 @@ RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, OPTION_MASK_PPC_GFXOPT RS6000_CPU ("powerpc64le", PROCESSOR_POWER8, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER | OPTION_MASK_HTM) RS6000_CPU ("rs64", PROCESSOR_RS64A, OPTION_MASK_PPC_GFXOPT | MASK_POWERPC64) +RS6000_CPU ("future", PROCESSOR_FUTURE, MASK_POWERPC64 | ISA_FUTURE_MASKS) diff --git a/gcc/config/rs6000/rs6000-opts.h b/gcc/config/rs6000/rs6000-opts.h index 2333c2e9e66..682788043db 100644 --- a/gcc/config/rs6000/rs6000-opts.h +++ b/gcc/config/rs6000/rs6000-opts.h @@ -67,7 +67,9 @@ enum processor_type PROCESSOR_MPCCORE, PROCESSOR_CELL, PROCESSOR_PPCA2, - PROCESSOR_TITAN + PROCESSOR_TITAN, + + PROCESSOR_FUTURE }; diff --git a/gcc/config/rs6000/rs6000-tables.opt b/gcc/config/rs6000/rs6000-tables.opt index b0df52de679..a5e65b4478a 100644 --- a/gcc/config/rs6000/rs6000-tables.opt +++ b/gcc/config/rs6000/rs6000-tables.opt @@ -197,3 +197,6 @@ Enum(rs6000_cpu_opt_value) String(powerpc64le) Value(55) EnumValue Enum(rs6000_cpu_opt_value) String(rs64) Value(56) +EnumValue +Enum(rs6000_cpu_opt_value) String(future) Value(57) + diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 9a5907c7130..25929502849 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -3742,6 +3742,10 @@ rs6000_option_override_internal (bool global_init_p) gcc_assert (tune_index >= 0); rs6000_tune = processor_target_table[tune_index].processor; + /* For now, make -mtune=future the same as -mtune=power10. */ + if (rs6000_tune == PROCESSOR_FUTURE) + rs6000_tune = PROCESSOR_POWER10; + if (rs6000_cpu == PROCESSOR_PPCE300C2 || rs6000_cpu == PROCESSOR_PPCE300C3 || rs6000_cpu == PROCESSOR_PPCE500MC || rs6000_cpu == PROCESSOR_PPCE500MC64 || rs6000_cpu == PROCESSOR_PPCE5500) @@ -4405,6 +4409,7 @@ rs6000_option_override_internal (bool global_init_p) && rs6000_tune != PROCESSOR_POWER8 && rs6000_tune != PROCESSOR_POWER9 && rs6000_tune != PROCESSOR_POWER10 + && rs6000_tune != PROCESSOR_FUTURE && rs6000_tune != PROCESSOR_PPCA2 && rs6000_tune != PROCESSOR_CELL && rs6000_tune != PROCESSOR_PPC476); @@ -4419,6 +4424,7 @@ rs6000_option_override_internal (bool global_init_p) || rs6000_tune == PROCESSOR_POWER8 || rs6000_tune == PROCESSOR_POWER9 || rs6000_tune == PROCESSOR_POWER10 + || rs6000_tune == PROCESSOR_FUTURE || rs6000_tune == PROCESSOR_PPCE500MC || rs6000_tune == PROCESSOR_PPCE500MC64 || rs6000_tune == PROCESSOR_PPCE5500 @@ -4718,6 +4724,7 @@ rs6000_option_override_internal (bool global_init_p) break; case PROCESSOR_POWER10: + case PROCESSOR_FUTURE: rs6000_cost = &power10_cost; break; @@ -5849,6 +5856,10 @@ rs6000_machine_from_flags (void) if (rs6000_cpu == PROCESSOR_MPCCORE) return "\"821\""; + /* Some future processor. For now, just use power10. */ + if (rs6000_cpu == PROCESSOR_FUTURE) + return "future"; + #if 0 /* This (and ppc64 below) are disabled here (for now at least) because PROCESSOR_POWERPC, PROCESSOR_POWERPC64, and PROCESSOR_COMMON @@ -10088,6 +10099,7 @@ rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED, case PROCESSOR_POWER8: case PROCESSOR_POWER9: case PROCESSOR_POWER10: + case PROCESSOR_FUTURE: if (DECIMAL_FLOAT_MODE_P (mode)) return 1; if (VECTOR_MODE_P (mode)) @@ -17829,7 +17841,8 @@ rs6000_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost, /* Separate a load from a narrower, dependent store. */ if ((rs6000_sched_groups || rs6000_tune == PROCESSOR_POWER9 - || rs6000_tune == PROCESSOR_POWER10) + || rs6000_tune == PROCESSOR_POWER10 + || rs6000_tune == PROCESSOR_FUTURE) && GET_CODE (PATTERN (insn)) == SET && GET_CODE (PATTERN (dep_insn)) == SET && MEM_P (XEXP (PATTERN (insn), 1)) @@ -17868,6 +17881,7 @@ rs6000_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost, || rs6000_tune == PROCESSOR_POWER8 || rs6000_tune == PROCESSOR_POWER9 || rs6000_tune == PROCESSOR_POWER10 + || rs6000_tune == PROCESSOR_FUTURE || rs6000_tune == PROCESSOR_CELL) && recog_memoized (dep_insn) && (INSN_CODE (dep_insn) >= 0)) @@ -18442,6 +18456,7 @@ rs6000_issue_rate (void) case PROCESSOR_POWER9: return 6; case PROCESSOR_POWER10: + case PROCESSOR_FUTURE: return 8; default: return 1; @@ -19158,7 +19173,8 @@ rs6000_sched_reorder (FILE *dump ATTRIBUTE_UNUSED, int sched_verbose, load_store_pendulum = 0; /* Do Power10 dependent reordering. */ - if (rs6000_tune == PROCESSOR_POWER10 && last_scheduled_insn) + if ((rs6000_tune == PROCESSOR_POWER10 + || rs6000_tune == PROCESSOR_FUTURE) && last_scheduled_insn) power10_sched_reorder (ready, n_ready - 1); return rs6000_issue_rate (); @@ -19183,7 +19199,8 @@ rs6000_sched_reorder2 (FILE *dump, int sched_verbose, rtx_insn **ready, return power9_sched_reorder2 (ready, *pn_ready - 1); /* Do Power10 dependent reordering. */ - if (rs6000_tune == PROCESSOR_POWER10 && last_scheduled_insn) + if ((rs6000_tune == PROCESSOR_POWER10 + || rs6000_tune == PROCESSOR_FUTURE) && last_scheduled_insn) return power10_sched_reorder (ready, *pn_ready - 1); return cached_can_issue_more; @@ -22398,7 +22415,8 @@ rs6000_register_move_cost (machine_mode mode, allocation a move within the same class might turn out to be a nop. */ if (rs6000_tune == PROCESSOR_POWER9 - || rs6000_tune == PROCESSOR_POWER10) + || rs6000_tune == PROCESSOR_POWER10 + || rs6000_tune == PROCESSOR_FUTURE) ret = 3 * hard_regno_nregs (FIRST_GPR_REGNO, mode); else ret = 4 * hard_regno_nregs (FIRST_GPR_REGNO, mode); @@ -24061,6 +24079,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "float128-hardware", OPTION_MASK_FLOAT128_HW, false, true }, { "fprnd", OPTION_MASK_FPRND, false, true }, { "power10", OPTION_MASK_POWER10, false, true }, + { "future", OPTION_MASK_FUTURE, false, true }, { "hard-dfp", OPTION_MASK_DFP, false, true }, { "htm", OPTION_MASK_HTM, false, true }, { "isel", OPTION_MASK_ISEL, false, true }, diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index b4df22b6030..e35a14da81a 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -163,6 +163,7 @@ mcpu=e5500: -me5500; \ mcpu=e6500: -me6500; \ mcpu=titan: -mtitan; \ + mcpu=future: -mfuture; \ !mcpu*: %{mpower9-vector: -mpower9; \ mpower8-vector|mcrypto|mdirect-move|mhtm: -mpower8; \ mvsx: -mpower7; \ diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 4880df5c51c..be4623aeb69 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -350,7 +350,7 @@ (define_attr "cpu" ppc403,ppc405,ppc440,ppc476, ppc8540,ppc8548,ppce300c2,ppce300c3,ppce500mc,ppce500mc64,ppce5500,ppce6500, power4,power5,power6,power7,power8,power9,power10, - rs64a,mpccore,cell,ppca2,titan" + rs64a,mpccore,cell,ppca2,titan,future" (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) ;; The ISA we implement. diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index e2de03dda92..574df08bb5a 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -624,6 +624,10 @@ mieee128-constant Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save Generate (do not generate) code that uses the LXVKQ instruction. +mfuture +Target Undocumented Mask(FUTURE) Var(rs6000_isa_flags) +Generate (do not generate) future instructions. + ; Documented parameters -param=rs6000-vect-unroll-limit= diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 975ee64103f..2367d13d96f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -28971,7 +28971,7 @@ Supported values for @var{cpu_type} are @samp{401}, @samp{403}, @samp{titan}, @samp{power3}, @samp{power4}, @samp{power5}, @samp{power5+}, @samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8}, @samp{power9}, @samp{power10}, @samp{powerpc}, @samp{powerpc64}, -@samp{powerpc64le}, @samp{rs64}, and @samp{native}. +@samp{powerpc64le}, @samp{rs64}, @samp{future}, and @samp{native}. @option{-mcpu=powerpc}, @option{-mcpu=powerpc64}, and @option{-mcpu=powerpc64le} specify pure 32-bit PowerPC (either From patchwork Thu Nov 10 02:45:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 60325 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9C38E3858439 for ; Thu, 10 Nov 2022 02:46:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9C38E3858439 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668048375; bh=PHdS3+lG593jkO1zQFEk0D6YFh88CxdGOx1uh8sAXKk=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=fn1tmWszN0sH+v/p8AFrIgu6dJftpRT7KZ3Yp9f81U+DBNUFZfqFG9hnW3MW7sjhB GFS9Q8gQrqnHTK8uE9WZSqXHEVAMakTdaaWRt+nDkp2H9XyHTXm9leDK5BtUpixQ97 AeX1vIfjywKhQRynN2MGj48NbUpSS2Lr2oRPRj6Q= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 49E573857C4F for ; Thu, 10 Nov 2022 02:45:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 49E573857C4F Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2AA1PIJj026736; Thu, 10 Nov 2022 02:45:45 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krkguf1tn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:45:45 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AA2gckv016080; Thu, 10 Nov 2022 02:45:44 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krkguf1tc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:45:44 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AA2aSdR022268; Thu, 10 Nov 2022 02:45:43 GMT Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by ppma04wdc.us.ibm.com with ESMTP id 3kngpu1s4e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:45:43 +0000 Received: from smtpav03.dal12v.mail.ibm.com ([9.208.128.129]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AA2jeYM25297244 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 02:45:41 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0B89258063; Thu, 10 Nov 2022 02:45:42 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7A33D58056; Thu, 10 Nov 2022 02:45:41 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.5.6]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTPS; Thu, 10 Nov 2022 02:45:41 +0000 (GMT) Date: Wed, 9 Nov 2022 21:45:39 -0500 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt References: Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: pc_kT2-KwIrn2Th5hab6ZKAqlKV_Bil7 X-Proofpoint-GUID: g5D9GyexVBjTeKckM8xplElkuKqUnOev X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-09_06,2022-11-09_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxscore=0 malwarescore=0 mlxlogscore=999 spamscore=0 clxscore=1015 lowpriorityscore=0 adultscore=0 bulkscore=0 suspectscore=0 priorityscore=1501 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211100016 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch enables generating load and store vector pair instructions when doing certain memory copy operations when -mcpu=future is used. In doing tests on power10, it was determined that using these instructions were problematical in a few cases, so we disabled generating them by default. This patch re-enabled generating these instructions if -mcpu=future is used. The patches have been tested on the following platforms. I added the patches for PR target/107299 that I submitted on November 2nd before doing the builds so that GCC would build on systems using IEEE 128-bit long double. * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html There were no regressions with doing bootstrap builds and running the regression tests: 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). Can I check this patch into the GCC 13 master branch? 2022-11-09 Michael Meissner gcc/ * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mblock-ops-vector-pair. (POWERPC_MASKS): Likewise. --- gcc/config/rs6000/rs6000-cpus.def | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 5eac7d97e65..e8df7927055 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -89,6 +89,7 @@ /* Flags for a potential future processor that may or may not be delivered. */ #define ISA_FUTURE_MASKS (ISA_3_1_MASKS_SERVER \ + | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR \ | OPTION_MASK_FUTURE) /* Flags that need to be turned off if -mno-power9-vector. */ @@ -126,6 +127,7 @@ /* Mask of all options to set the default isa flags based on -mcpu=. */ #define POWERPC_MASKS (OPTION_MASK_ALTIVEC \ + | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR \ | OPTION_MASK_CMPB \ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DFP \ From patchwork Thu Nov 10 02:46:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 60326 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CD4FA3858039 for ; Thu, 10 Nov 2022 02:47:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CD4FA3858039 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668048438; bh=WRdKXZI+L34hXHn8Om1ZBOG0bSzwo6y4wSpHiIkecLk=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=YEmNhDDjHohpzm2iZKNyJEY20o+TBGVgAWObpMDdtPvWSct7JYnAfa4zBHYNQiDkN Z+NsBhJZ6IPAAzB2uEqk3FBcBYtMBrmIUDYPuNO0UiBtPLXFBof+p6jhcR4+pVynMH g4z4ArnbdIiJ3RrHlltGb7Ct9YJpoC0wkx8jUwd4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 36F013858039 for ; Thu, 10 Nov 2022 02:46:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 36F013858039 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2AA1kvbJ013881; Thu, 10 Nov 2022 02:46:42 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krhqpa77v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:46:41 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AA2UN1v005459; Thu, 10 Nov 2022 02:46:41 GMT Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krhqpa77n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:46:41 +0000 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AA2aBKI001204; Thu, 10 Nov 2022 02:46:40 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma01dal.us.ibm.com with ESMTP id 3kngsyj318-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:46:40 +0000 Received: from smtpav02.dal12v.mail.ibm.com ([9.208.128.128]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AA2kaP332964906 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 02:46:36 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5AE725805E; Thu, 10 Nov 2022 02:46:38 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B07945805C; Thu, 10 Nov 2022 02:46:37 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.5.6]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTPS; Thu, 10 Nov 2022 02:46:37 +0000 (GMT) Date: Wed, 9 Nov 2022 21:46:36 -0500 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt References: Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Ncqy5CA3ujlvtl1ptVYTJNCSjpxVczSm X-Proofpoint-GUID: IFAAoAjMkgWKtqKfeIrE2RYTjdBI3da- X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-09_06,2022-11-09_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 malwarescore=0 adultscore=0 mlxlogscore=999 bulkscore=0 spamscore=0 phishscore=0 priorityscore=1501 mlxscore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211100016 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The MMA system added the notion of accumulator registers. In power10, these accumulators overlapped with the FPR registers, but logically the accumulators were separate from the FPR registers. It is anticipated that in future systems, we may have a separate dense math unit and the accumulators will be mapped onto the new dense math registers (DMRs). This patch adds the support for dense math registers. These changes are preliminary. They are expected to change over time. This particular patch does not change the MMA support to use the accumulators within the dense math registers. This patch just adds the basic support for having separate DMRs. The next patch will switch the MMA support to use the accumulators if -mcpu=future is used. For testing purposes, I added an undocumented option '-mdense-math' to enable or disable the dense math support. This patch adds a new constraint (wD). If MMA is selected but dense math is not selected (i.e. -mcpu=power10), the wD constraint will match accumulators that overlap with the FPRs. If both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint will only match DMRs. This patch modifies the existing %A output modifier. If MMA is selected but dense math is not selected, then %A convert the FPR register number to the accumulator number. If both MMA and dense math are selected, then %A will only work if the register is an accumulator mapped onto a DMR. The intention is that user code using extended asm can be modified to run on both MMA without dense math and MMA with dense math: 1) If possible, don't use extended asm, but instead use the MMA built-in functions; 2) If you do need to write extended asm, change the d constraints targetting accumulators should now use wD; 3) Only use the built-in zero, assemble and disassemble functions create move data between vector quad types and dense math accumulators. I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the extended asm code. The reason is these instructions assume there is a 1-to-1 correspondence between 4 adjacent FPR registers and an accumulator that overlaps with those instructions. With accumulators now being separate registers, there no longer is a 1-to-1 correspondence. It is possible that the mangling for DMRs and the GDB register numbers may change in the future. The patches have been tested on the following platforms. I added the patches for PR target/107299 that I submitted on November 2nd before doing the builds so that GCC would build on systems using IEEE 128-bit long double. * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html There were no regressions with doing bootstrap builds and running the regression tests: 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). Can I check this patch into the GCC 13 master branch? 2022-11-09 Michael Meissner gcc/ * config/rs6000/constraints.md (wD constraint): New constraint. * config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec. (movxo): Convert into define_expand. (movxo_fpr): Version of movxo where accumulators overlap with FPRs. (movxo_dm): Dense math version of movxo. (mma_assemble_acc): Add dense match support to define_expand. (mma_assemble_acc_fpr): Rename from mma_assemble_acc, and restrict it to non dense math. (mma_assemble_acc_dm): Dense math version of mma_assemble_acc. (mma_disassemble_acc): Add dense math support to define_expand. (mma_disassemble_acc_fpr): Rename from mma_disassemble_acc, and restrict it to non dense math. (mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc. * config/rs6000/predicates.md (dmr_operand): New predicate. (accumulator_operand): Likewise. * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE. (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR. (LAST_RELOAD_REG_CLASS): Add support for DMR registers. (reload_reg_map): Likewise. (rs6000_reg_names): Likewise. (alt_reg_names): Likewise. (rs6000_hard_regno_nregs_internal): Likewise. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_option_override_internal): Add checking for -mdense-math. (rs6000_secondary_reload_memory): Add support for DMR registers. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (print_operand): Make %A handle both FPRs and DMRs. (rs6000_dmr_register_move_cost): New helper function. (rs6000_register_move_cost): Add support for DMR registers. (rs6000_memory_move_cost): Likewise. (rs6000_compute_pressure_classes): Likewise. (rs6000_debugger_regno): Likewise. (rs6000_opt_masks): Add -mdense-math. (rs6000_split_multireg_move): Add support for DMRs. * config/rs6000/rs6000.h (UNITS_PER_DMR_WORD): New macro. (FIRST_PSEUDO_REGISTER): Update for DMRs. (FIXED_REGISTERS): Add DMRs. (CALL_REALLY_USED_REGISTERS): Likewise. (REG_ALLOC_ORDER): Likewise. (enum reg_class): Add DM_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. * config/rs6000/rs6000.md (FIRST_DMR_REGNO): New constant. (LAST_DMR_REGNO): Likewise. (isa attribute): Add 'dm' and 'not_dm' attributes. (enabled attribute): Support 'dm' and 'not_dm' attributes. * config/rs6000/rs6000.opt (-mdense-math): New switch. * doc/md.texi (PowerPC constraints): Document wD constraint. --- gcc/config/rs6000/constraints.md | 3 + gcc/config/rs6000/mma.md | 114 +++++++++++++----- gcc/config/rs6000/predicates.md | 32 +++++ gcc/config/rs6000/rs6000-cpus.def | 2 + gcc/config/rs6000/rs6000.cc | 192 ++++++++++++++++++++++++++---- gcc/config/rs6000/rs6000.h | 38 +++++- gcc/config/rs6000/rs6000.md | 12 +- gcc/config/rs6000/rs6000.opt | 4 + gcc/doc/md.texi | 3 + 9 files changed, 339 insertions(+), 61 deletions(-) diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index 54fef8d9996..82856b8e8c2 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -107,6 +107,9 @@ (define_constraint "wB" (match_test "TARGET_P8_VECTOR") (match_operand 0 "s5bit_cint_operand"))) +(define_register_constraint "wD" "rs6000_constraints[RS6000_CONSTRAINT_wD]" + "Accumulator register.") + (define_constraint "wE" "@internal Vector constant that can be loaded with the XXSPLTIB instruction." (match_test "xxspltib_constant_nosplit (op, mode)")) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 032f4263cb0..fe2e9c9e63e 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -91,6 +91,7 @@ (define_c_enum "unspec" UNSPEC_MMA_XVI8GER4SPP UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC + UNSPEC_DM_ASSEMBLE_ACC ]) (define_c_enum "unspecv" @@ -311,7 +312,8 @@ (define_insn_and_split "*movoo" (set_attr "length" "*,*,8")]) -;; Vector quad support. XOmode can only live in FPRs. +;; Vector quad support. Under the original MMA, XOmode can only live in FPRs. +;; With dense math, XOmode can live in either VSX registers or DMR registers. (define_expand "movxo" [(set (match_operand:XO 0 "nonimmediate_operand") (match_operand:XO 1 "input_operand"))] @@ -333,10 +335,10 @@ (define_expand "movxo" gcc_unreachable (); }) -(define_insn_and_split "*movxo" +(define_insn_and_split "*movxo_fpr" [(set (match_operand:XO 0 "nonimmediate_operand" "=d,m,d") (match_operand:XO 1 "input_operand" "m,d,d"))] - "TARGET_MMA + "TARGET_MMA && !TARGET_DENSE_MATH && (gpc_reg_operand (operands[0], XOmode) || gpc_reg_operand (operands[1], XOmode))" "@ @@ -353,6 +355,31 @@ (define_insn_and_split "*movxo" (set_attr "length" "*,*,16") (set_attr "max_prefixed_insns" "2,2,*")]) +(define_insn_and_split "*movxo_dm" + [(set (match_operand:XO 0 "nonimmediate_operand" "=wa,m, wa,wD,wD,wa") + (match_operand:XO 1 "input_operand" "m,wa,wa,wa,wD,wD"))] + "TARGET_DENSE_MATH + && (gpc_reg_operand (operands[0], XOmode) + || gpc_reg_operand (operands[1], XOmode))" + "@ + # + # + # + dmxxinstdmr512 %0,%1,%Y1,0 + dmmr %0,%1 + dmxxextfdmr512 %0,%Y0,%1,0" + "&& reload_completed + && !dmr_operand (operands[0], XOmode) + && !dmr_operand (operands[1], XOmode)" + [(const_int 0)] +{ + rs6000_split_multireg_move (operands[0], operands[1]); + DONE; +} + [(set_attr "type" "vecload,vecstore,veclogical,mma,mma,mma") + (set_attr "length" "*,*,16,*,*,*") + (set_attr "max_prefixed_insns" "2,2,*,*,*,*")]) + (define_expand "vsx_assemble_pair" [(match_operand:OO 0 "vsx_register_operand") (match_operand:V16QI 1 "mma_assemble_input_operand") @@ -420,25 +447,38 @@ (define_insn_and_split "*vsx_disassemble_pair" }) (define_expand "mma_assemble_acc" - [(match_operand:XO 0 "fpr_reg_operand") + [(match_operand:XO 0 "register_operand") (match_operand:V16QI 1 "mma_assemble_input_operand") (match_operand:V16QI 2 "mma_assemble_input_operand") (match_operand:V16QI 3 "mma_assemble_input_operand") (match_operand:V16QI 4 "mma_assemble_input_operand")] "TARGET_MMA" { - rtx src = gen_rtx_UNSPEC_VOLATILE (XOmode, - gen_rtvec (4, operands[1], operands[2], - operands[3], operands[4]), - UNSPECV_MMA_ASSEMBLE); - emit_move_insn (operands[0], src); + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op3 = operands[3]; + rtx op4 = operands[4]; + + if (TARGET_DENSE_MATH) + { + rtx vpair1 = gen_reg_rtx (OOmode); + rtx vpair2 = gen_reg_rtx (OOmode); + emit_insn (gen_vsx_assemble_pair (vpair1, op1, op2)); + emit_insn (gen_vsx_assemble_pair (vpair2, op3, op4)); + emit_insn (gen_mma_assemble_acc_dm (op0, vpair1, vpair2)); + } + + else + emit_insn (gen_mma_assemble_acc_fpr (op0, op1, op2, op3, op4)); + DONE; }) ;; We cannot update the four output registers atomically, so mark the output -;; as an early clobber so we don't accidentally clobber the input operands. */ +;; as an early clobber so we don't accidentally clobber the input operands. -(define_insn_and_split "*mma_assemble_acc" +(define_insn_and_split "mma_assemble_acc_fpr" [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") (unspec_volatile:XO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa") @@ -446,7 +486,7 @@ (define_insn_and_split "*mma_assemble_acc" (match_operand:V16QI 3 "mma_assemble_input_operand" "mwa") (match_operand:V16QI 4 "mma_assemble_input_operand" "mwa")] UNSPECV_MMA_ASSEMBLE))] - "TARGET_MMA + "TARGET_MMA && !TARGET_DENSE_MATH && fpr_reg_operand (operands[0], XOmode)" "#" "&& reload_completed" @@ -460,28 +500,31 @@ (define_insn_and_split "*mma_assemble_acc" DONE; }) +;; On a system with dense math, we build the accumulators from two vector +;; pairs. + +(define_insn "mma_assemble_acc_dm" + [(set (match_operand:XO 0 "dmr_operand" "=wD") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand:OO 2 "vsx_register_operand" "wa")] + UNSPEC_DM_ASSEMBLE_ACC))] + "TARGET_MMA && TARGET_DENSE_MATH" + "dmxxinstdmr512 %0,%1,%2,0" + [(set_attr "type" "mma")]) + (define_expand "mma_disassemble_acc" - [(match_operand:V16QI 0 "mma_disassemble_output_operand") - (match_operand:XO 1 "fpr_reg_operand") - (match_operand 2 "const_0_to_3_operand")] - "TARGET_MMA" -{ - rtx src; - int regoff = INTVAL (operands[2]); - src = gen_rtx_UNSPEC (V16QImode, - gen_rtvec (2, operands[1], GEN_INT (regoff)), - UNSPEC_MMA_EXTRACT); - emit_move_insn (operands[0], src); - DONE; -}) + [(set (match_operand:V16QI 0 "register_operand") + (unspec:V16QI [(match_operand:XO 1 "register_operand") + (match_operand 2 "const_0_to_3_operand")] + UNSPEC_MMA_EXTRACT))] + "TARGET_MMA") -(define_insn_and_split "*mma_disassemble_acc" +(define_insn_and_split "*mma_disassemble_acc_fpr" [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa") - (unspec:V16QI [(match_operand:XO 1 "fpr_reg_operand" "d") - (match_operand 2 "const_0_to_3_operand")] + (unspec:V16QI [(match_operand:XO 1 "fpr_reg_operand" "d") + (match_operand 2 "const_0_to_3_operand")] UNSPEC_MMA_EXTRACT))] - "TARGET_MMA - && fpr_reg_operand (operands[1], XOmode)" + "TARGET_MMA" "#" "&& reload_completed" [(const_int 0)] @@ -493,9 +536,14 @@ (define_insn_and_split "*mma_disassemble_acc" DONE; }) -;; MMA instructions that do not use their accumulators as an input, still -;; must not allow their vector operands to overlap the registers used by -;; the accumulator. We enforce this by marking the output as early clobber. +(define_insn "*mma_disassemble_acc_dm" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (unspec:V16QI [(match_operand:XO 1 "dmr_operand" "wD") + (match_operand 2 "const_0_to_3_operand")] + UNSPEC_MMA_EXTRACT))] + "TARGET_DENSE_MATH" + "dmxxextfdmr256 %0,%1,2" + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index b1fcc69bb60..ede48189d3d 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -186,6 +186,38 @@ (define_predicate "vlogical_operand" return VLOGICAL_REGNO_P (REGNO (op)); }) +;; Return 1 if op is a DMR register +(define_predicate "dmr_operand" + (match_operand 0 "register_operand") +{ + if (!REG_P (op)) + return 0; + + if (!HARD_REGISTER_P (op)) + return 1; + + return DMR_REGNO_P (REGNO (op)); +}) + +;; Return 1 if op is an accumulator. On power10 systems, the accumulators +;; overlap with the FPRs, while on systems with dense math, the accumulators +;; are separate dense math registers and do not overlap with the FPR +;; registers.. +(define_predicate "accumulator_operand" + (match_operand 0 "register_operand") +{ + if (!REG_P (op)) + return 0; + + if (!HARD_REGISTER_P (op)) + return 1; + + int r = REGNO (op); + return (TARGET_DENSE_MATH + ? DMR_REGNO_P (r) + : FP_REGNO_P (r) && (r & 3) == 0); +}) + ;; Return 1 if op is the carry register. (define_predicate "ca_operand" (match_operand 0 "register_operand") diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index e8df7927055..7c00001cd96 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -90,6 +90,7 @@ /* Flags for a potential future processor that may or may not be delivered. */ #define ISA_FUTURE_MASKS (ISA_3_1_MASKS_SERVER \ | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR \ + | OPTION_MASK_DENSE_MATH \ | OPTION_MASK_FUTURE) /* Flags that need to be turned off if -mno-power9-vector. */ @@ -133,6 +134,7 @@ | OPTION_MASK_DFP \ | OPTION_MASK_DIRECT_MOVE \ | OPTION_MASK_DLMZB \ + | OPTION_MASK_DENSE_MATH \ | OPTION_MASK_EFFICIENT_UNALIGNED_VSX \ | OPTION_MASK_FLOAT128_HW \ | OPTION_MASK_FLOAT128_KEYWORD \ diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 25929502849..d224ef64bb1 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -290,7 +290,8 @@ enum rs6000_reg_type { ALTIVEC_REG_TYPE, FPR_REG_TYPE, SPR_REG_TYPE, - CR_REG_TYPE + CR_REG_TYPE, + DMR_REG_TYPE }; /* Map register class to register type. */ @@ -304,22 +305,23 @@ static enum rs6000_reg_type reg_class_to_reg_type[N_REG_CLASSES]; /* Register classes we care about in secondary reload or go if legitimate - address. We only need to worry about GPR, FPR, and Altivec registers here, - along an ANY field that is the OR of the 3 register classes. */ + address. We only need to worry about GPR, FPR, Altivec, and DMR registers + here, along an ANY field that is the OR of the 4 register classes. */ enum rs6000_reload_reg_type { RELOAD_REG_GPR, /* General purpose registers. */ RELOAD_REG_FPR, /* Traditional floating point regs. */ RELOAD_REG_VMX, /* Altivec (VMX) registers. */ - RELOAD_REG_ANY, /* OR of GPR, FPR, Altivec masks. */ + RELOAD_REG_DMR, /* DMR registers. */ + RELOAD_REG_ANY, /* OR of GPR/FPR/VMX/DMR masks. */ N_RELOAD_REG }; -/* For setting up register classes, loop through the 3 register classes mapping +/* For setting up register classes, loop through the 4 register classes mapping into real registers, and skip the ANY class, which is just an OR of the bits. */ #define FIRST_RELOAD_REG_CLASS RELOAD_REG_GPR -#define LAST_RELOAD_REG_CLASS RELOAD_REG_VMX +#define LAST_RELOAD_REG_CLASS RELOAD_REG_DMR /* Map reload register type to a register in the register class. */ struct reload_reg_map_type { @@ -331,6 +333,7 @@ static const struct reload_reg_map_type reload_reg_map[N_RELOAD_REG] = { { "Gpr", FIRST_GPR_REGNO }, /* RELOAD_REG_GPR. */ { "Fpr", FIRST_FPR_REGNO }, /* RELOAD_REG_FPR. */ { "VMX", FIRST_ALTIVEC_REGNO }, /* RELOAD_REG_VMX. */ + { "DMR", FIRST_DMR_REGNO }, /* RELOAD_REG_DMR. */ { "Any", -1 }, /* RELOAD_REG_ANY. */ }; @@ -1223,6 +1226,8 @@ char rs6000_reg_names[][8] = "0", "1", "2", "3", "4", "5", "6", "7", /* vrsave vscr sfp */ "vrsave", "vscr", "sfp", + /* DMRs */ + "0", "1", "2", "3", "4", "5", "6", "7", }; #ifdef TARGET_REGNAMES @@ -1249,6 +1254,8 @@ static const char alt_reg_names[][8] = "%cr0", "%cr1", "%cr2", "%cr3", "%cr4", "%cr5", "%cr6", "%cr7", /* vrsave vscr sfp */ "vrsave", "vscr", "sfp", + /* DMRs */ + "%dmr0", "%dmr1", "%dmr2", "%dmr3", "%dmr4", "%dmr5", "%dmr6", "%dmr7", }; #endif @@ -1820,6 +1827,9 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode mode) else if (ALTIVEC_REGNO_P (regno)) reg_size = UNITS_PER_ALTIVEC_WORD; + else if (DMR_REGNO_P (regno)) + reg_size = UNITS_PER_DMR_WORD; + else reg_size = UNITS_PER_WORD; @@ -1841,9 +1851,35 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) if (mode == OOmode) return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0); - /* MMA accumulator modes need FPR registers divisible by 4. */ + /* On power10, MMA accumulator modes need FPR registers divisible by 4. + + If dense math is enabled, allow all VSX registers plus the DMR registers. + We need to make sure we don't cross between the boundary of FPRs and + traditional Altiviec registers. */ if (mode == XOmode) - return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0); + { + if (TARGET_MMA && !TARGET_DENSE_MATH) + return (FP_REGNO_P (regno) && (regno & 3) == 0); + + else if (TARGET_DENSE_MATH) + { + if (DMR_REGNO_P (regno)) + return 1; + + if (FP_REGNO_P (regno)) + return ((regno & 1) == 0 && regno <= LAST_FPR_REGNO - 3); + + if (ALTIVEC_REGNO_P (regno)) + return ((regno & 1) == 0 && regno <= LAST_ALTIVEC_REGNO - 3); + } + + else + return 0; + } + + /* No other types other than XOmode can go in DMRs. */ + if (DMR_REGNO_P (regno)) + return 0; /* PTImode can only go in GPRs. Quad word memory operations require even/odd register combinations, and use PTImode where we need to deal with quad @@ -2286,6 +2322,7 @@ rs6000_debug_reg_global (void) rs6000_debug_reg_print (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO, "vs"); + rs6000_debug_reg_print (FIRST_DMR_REGNO, LAST_DMR_REGNO, "dmr"); rs6000_debug_reg_print (LR_REGNO, LR_REGNO, "lr"); rs6000_debug_reg_print (CTR_REGNO, CTR_REGNO, "ctr"); rs6000_debug_reg_print (CR0_REGNO, CR7_REGNO, "cr"); @@ -2306,6 +2343,7 @@ rs6000_debug_reg_global (void) "wr reg_class = %s\n" "wx reg_class = %s\n" "wA reg_class = %s\n" + "wD reg_class = %s\n" "\n", reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]], @@ -2313,7 +2351,8 @@ rs6000_debug_reg_global (void) reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]], - reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wA]]); + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wA]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wD]]); nl = "\n"; for (m = 0; m < NUM_MACHINE_MODES; ++m) @@ -2610,6 +2649,21 @@ rs6000_setup_reg_addr_masks (void) addr_mask = 0; reg = reload_reg_map[rc].reg; + /* Special case DMR registers. */ + if (rc == RELOAD_REG_DMR) + { + if (TARGET_DENSE_MATH && m2 == XOmode) + { + addr_mask = RELOAD_REG_VALID; + reg_addr[m].addr_mask[rc] = addr_mask; + any_addr_mask |= addr_mask; + } + else + reg_addr[m].addr_mask[rc] = 0; + + continue; + } + /* Can mode values go in the GPR/FPR/Altivec registers? */ if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg]) { @@ -2705,8 +2759,8 @@ rs6000_setup_reg_addr_masks (void) /* Vector pairs can do both indexed and offset loads if the instructions are enabled, otherwise they can only do offset loads - since it will be broken into two vector moves. Vector quads can - only do offset loads. */ + since it will be broken into two vector moves. Vector quads and + 1,024 bit DMR values can only do offset loads. */ else if ((addr_mask != 0) && TARGET_MMA && (m2 == OOmode || m2 == XOmode)) { @@ -2760,6 +2814,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) for (r = CR1_REGNO; r <= CR7_REGNO; ++r) rs6000_regno_regclass[r] = CR_REGS; + for (r = FIRST_DMR_REGNO; r <= LAST_DMR_REGNO; ++r) + rs6000_regno_regclass[r] = DM_REGS; + rs6000_regno_regclass[LR_REGNO] = LINK_REGS; rs6000_regno_regclass[CTR_REGNO] = CTR_REGS; rs6000_regno_regclass[CA_REGNO] = NO_REGS; @@ -2784,6 +2841,7 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE; reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE; reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE; + reg_class_to_reg_type[(int)DM_REGS] = DMR_REG_TYPE; if (TARGET_VSX) { @@ -2970,6 +3028,13 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) if (TARGET_DIRECT_MOVE_128) rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS; + /* Support for the accumulator registers, either FPR registers (aka original + mma) or DMR registers (dense math). */ + if (TARGET_DENSE_MATH) + rs6000_constraints[RS6000_CONSTRAINT_wD] = DM_REGS; + else if (TARGET_MMA) + rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS; + /* Set up the reload helper and direct move functions. */ if (TARGET_VSX || TARGET_ALTIVEC) { @@ -4399,6 +4464,14 @@ rs6000_option_override_internal (bool global_init_p) if (!TARGET_PCREL && TARGET_PCREL_OPT) rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT; + /* Dense math requires MMA. */ + if (TARGET_DENSE_MATH && !TARGET_MMA) + { + if ((rs6000_isa_flags_explicit & OPTION_MASK_DENSE_MATH) != 0) + error ("%qs requires %qs", "-mdense-math", "-mmma"); + rs6000_isa_flags &= ~OPTION_MASK_DENSE_MATH; + } + if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET) rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags); @@ -12008,6 +12081,11 @@ rs6000_secondary_reload_memory (rtx addr, addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX] & ~RELOAD_REG_AND_M16); + /* DMR registers use VSX registers, and need to generate some extra + instructions. */ + else if (rclass == DM_REGS) + return 2; + /* If the register allocator hasn't made up its mind yet on the register class to use, settle on defaults to use. */ else if (rclass == NO_REGS) @@ -12336,6 +12414,13 @@ rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type, || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE))) return true; + /* We can transfer between VSX registers and DMR registers without needing + extra registers. */ + if (TARGET_DENSE_MATH && mode == XOmode + && ((to_type == DMR_REG_TYPE && from_type == VSX_REG_TYPE) + || (to_type == VSX_REG_TYPE && from_type == DMR_REG_TYPE))) + return true; + return false; } @@ -13030,6 +13115,10 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass) machine_mode mode = GET_MODE (x); bool is_constant = CONSTANT_P (x); + /* DMR registers can't be loaded or stored. */ + if (rclass == DM_REGS) + return NO_REGS; + /* If a mode can't go in FPR/ALTIVEC/VSX registers, don't return a preferred reload class for it. */ if ((rclass == ALTIVEC_REGS || rclass == VSX_REGS) @@ -13126,7 +13215,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass) return VSX_REGS; if (mode == XOmode) - return FLOAT_REGS; + return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS; if (GET_MODE_CLASS (mode) == MODE_INT) return GENERAL_REGS; @@ -13251,6 +13340,11 @@ rs6000_secondary_reload_class (enum reg_class rclass, machine_mode mode, else regno = -1; + /* DMR registers don't have loads or stores. We have to go through the VSX + registers to load XOmode (vector quad). */ + if (TARGET_DENSE_MATH && rclass == DM_REGS) + return VSX_REGS; + /* If we have VSX register moves, prefer moving scalar values between Altivec registers and GPR by going via an FPR (and then via memory) instead of reloading the secondary memory address for Altivec moves. */ @@ -13764,8 +13858,14 @@ print_operand (FILE *file, rtx x, int code) output_operand. */ case 'A': - /* Write the MMA accumulator number associated with VSX register X. */ - if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) + /* Write the MMA accumulator number associated with VSX register X. On + dense math systems, only allow DMR accumulators, not accumulators + overlapping with the FPR registers. */ + if (!REG_P (x)) + output_operand_lossage ("invalid %%A value"); + else if (TARGET_DENSE_MATH && DMR_REGNO_P (REGNO (x))) + fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO); + else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) output_operand_lossage ("invalid %%A value"); else fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4); @@ -22367,6 +22467,31 @@ rs6000_debug_address_cost (rtx x, machine_mode mode, } +/* Subroutine to determine the move cost of dense math registers. If we are + moving to/from VSX_REGISTER registers, the cost is either 1 move (for + 512-bit accumulators) or 2 moves (for 1,024 dmr registers). If we are + moving to anything else like GPR registers, make the cost very high. */ + +static int +rs6000_dmr_register_move_cost (machine_mode mode, reg_class_t rclass) +{ + const int reg_move_base = 2; + HARD_REG_SET vsx_set = (reg_class_contents[rclass] + & reg_class_contents[VSX_REGS]); + + if (TARGET_DENSE_MATH && !hard_reg_set_empty_p (vsx_set)) + { + /* __vector_quad (i.e. XOmode) is tranfered in 1 instruction. */ + if (mode == XOmode) + return reg_move_base; + + else + return reg_move_base * 2 * hard_regno_nregs (FIRST_DMR_REGNO, mode); + } + + return 1000 * 2 * hard_regno_nregs (FIRST_DMR_REGNO, mode); +} + /* A C expression returning the cost of moving data from a register of class CLASS1 to one of CLASS2. */ @@ -22380,17 +22505,28 @@ rs6000_register_move_cost (machine_mode mode, if (TARGET_DEBUG_COST) dbg_cost_ctrl++; + HARD_REG_SET to_vsx, from_vsx; + to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS]; + from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS]; + + /* Special case DMR registers, that can only move to/from VSX registers. */ + if (from == DM_REGS && to == DM_REGS) + ret = 2 * hard_regno_nregs (FIRST_DMR_REGNO, mode); + + else if (from == DM_REGS) + ret = rs6000_dmr_register_move_cost (mode, to); + + else if (to == DM_REGS) + ret = rs6000_dmr_register_move_cost (mode, from); + /* If we have VSX, we can easily move between FPR or Altivec registers, otherwise we can only easily move within classes. Do this first so we give best-case answers for union classes containing both gprs and vsx regs. */ - HARD_REG_SET to_vsx, from_vsx; - to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS]; - from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS]; - if (!hard_reg_set_empty_p (to_vsx) - && !hard_reg_set_empty_p (from_vsx) - && (TARGET_VSX - || hard_reg_set_intersect_p (to_vsx, from_vsx))) + else if (!hard_reg_set_empty_p (to_vsx) + && !hard_reg_set_empty_p (from_vsx) + && (TARGET_VSX + || hard_reg_set_intersect_p (to_vsx, from_vsx))) { int reg = FIRST_FPR_REGNO; if (TARGET_VSX @@ -22486,6 +22622,9 @@ rs6000_memory_move_cost (machine_mode mode, reg_class_t rclass, ret = 4 * hard_regno_nregs (32, mode); else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS)) ret = 4 * hard_regno_nregs (FIRST_ALTIVEC_REGNO, mode); + else if (reg_classes_intersect_p (rclass, DM_REGS)) + ret = (rs6000_dmr_register_move_cost (mode, VSX_REGS) + + rs6000_memory_move_cost (mode, VSX_REGS, false)); else ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS); @@ -23694,6 +23833,8 @@ rs6000_compute_pressure_classes (enum reg_class *pressure_classes) if (TARGET_HARD_FLOAT) pressure_classes[n++] = FLOAT_REGS; } + if (TARGET_DENSE_MATH) + pressure_classes[n++] = DM_REGS; pressure_classes[n++] = CR_REGS; pressure_classes[n++] = SPECIAL_REGS; @@ -23858,6 +23999,10 @@ rs6000_debugger_regno (unsigned int regno, unsigned int format) return 67; if (regno == 64) return 64; + /* XXX: This is a guess. The GCC register number for FIRST_DMR_REGNO is 111, + but the frame pointer regnum uses that. */ + if (DMR_REGNO_P (regno)) + return regno - FIRST_DMR_REGNO + 112; gcc_unreachable (); } @@ -24073,6 +24218,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "crypto", OPTION_MASK_CRYPTO, false, true }, { "direct-move", OPTION_MASK_DIRECT_MOVE, false, true }, { "dlmzb", OPTION_MASK_DLMZB, false, true }, + { "dense-math", OPTION_MASK_DENSE_MATH, false, true }, { "efficient-unaligned-vsx", OPTION_MASK_EFFICIENT_UNALIGNED_VSX, false, true }, { "float128", OPTION_MASK_FLOAT128_KEYWORD, false, true }, @@ -27159,7 +27305,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) || XINT (src, 1) == UNSPECV_MMA_ASSEMBLE); gcc_assert (REG_P (dst)); if (GET_MODE (src) == XOmode) - gcc_assert (FP_REGNO_P (REGNO (dst))); + gcc_assert ((TARGET_DENSE_MATH + ? VSX_REGNO_P (REGNO (dst)) + : FP_REGNO_P (REGNO (dst)))); if (GET_MODE (src) == OOmode) gcc_assert (VSX_REGNO_P (REGNO (dst))); diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index e35a14da81a..27f7067ef52 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -662,6 +662,7 @@ extern unsigned char rs6000_recip_bits[]; #define UNITS_PER_FP_WORD 8 #define UNITS_PER_ALTIVEC_WORD 16 #define UNITS_PER_VSX_WORD 16 +#define UNITS_PER_DMR_WORD 128 /* Type used for ptrdiff_t, as a string used in a declaration. */ #define PTRDIFF_TYPE "int" @@ -789,7 +790,7 @@ enum data_align { align_abi, align_opt, align_both }; Another pseudo (not included in DWARF_FRAME_REGISTERS) is soft frame pointer, which is eventually eliminated in favor of SP or FP. */ -#define FIRST_PSEUDO_REGISTER 111 +#define FIRST_PSEUDO_REGISTER 119 /* Use standard DWARF numbering for DWARF debugging information. */ #define DEBUGGER_REGNO(REGNO) rs6000_debugger_regno ((REGNO), 0) @@ -826,7 +827,9 @@ enum data_align { align_abi, align_opt, align_both }; /* cr0..cr7 */ \ 0, 0, 0, 0, 0, 0, 0, 0, \ /* vrsave vscr sfp */ \ - 1, 1, 1 \ + 1, 1, 1, \ + /* DMR registers. */ \ + 0, 0, 0, 0, 0, 0, 0, 0 \ } /* Like `CALL_USED_REGISTERS' except this macro doesn't require that @@ -850,7 +853,9 @@ enum data_align { align_abi, align_opt, align_both }; /* cr0..cr7 */ \ 1, 1, 0, 0, 0, 1, 1, 1, \ /* vrsave vscr sfp */ \ - 0, 0, 0 \ + 0, 0, 0, \ + /* DMR registers. */ \ + 0, 0, 0, 0, 0, 0, 0, 0 \ } #define TOTAL_ALTIVEC_REGS (LAST_ALTIVEC_REGNO - FIRST_ALTIVEC_REGNO + 1) @@ -887,6 +892,7 @@ enum data_align { align_abi, align_opt, align_both }; v2 (not saved; incoming vector arg reg; return value) v19 - v14 (not saved or used for anything) v31 - v20 (saved; order given to save least number) + dmr0 - dmr7 (not saved) vrsave, vscr (fixed) sfp (fixed) */ @@ -929,6 +935,9 @@ enum data_align { align_abi, align_opt, align_both }; 66, \ 83, 82, 81, 80, 79, 78, \ 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, \ + /* DMR registers. */ \ + 111, 112, 113, 114, 115, 116, 117, 118, \ + /* Vrsave, vscr, sfp. */ \ 108, 109, \ 110 \ } @@ -955,6 +964,9 @@ enum data_align { align_abi, align_opt, align_both }; /* True if register is a VSX register. */ #define VSX_REGNO_P(N) (FP_REGNO_P (N) || ALTIVEC_REGNO_P (N)) +/* True if register is a DMR register. */ +#define DMR_REGNO_P(N) ((N) >= FIRST_DMR_REGNO && (N) <= LAST_DMR_REGNO) + /* Alternate name for any vector register supporting floating point, no matter which instruction set(s) are available. */ #define VFLOAT_REGNO_P(N) \ @@ -1090,6 +1102,7 @@ enum reg_class FLOAT_REGS, ALTIVEC_REGS, VSX_REGS, + DM_REGS, VRSAVE_REGS, VSCR_REGS, GEN_OR_FLOAT_REGS, @@ -1119,6 +1132,7 @@ enum reg_class "FLOAT_REGS", \ "ALTIVEC_REGS", \ "VSX_REGS", \ + "DM_REGS", \ "VRSAVE_REGS", \ "VSCR_REGS", \ "GEN_OR_FLOAT_REGS", \ @@ -1153,6 +1167,8 @@ enum reg_class { 0x00000000, 0x00000000, 0xffffffff, 0x00000000 }, \ /* VSX_REGS. */ \ { 0x00000000, 0xffffffff, 0xffffffff, 0x00000000 }, \ + /* DM_REGS. */ \ + { 0x00000000, 0x00000000, 0x00000000, 0x007f8000 }, \ /* VRSAVE_REGS. */ \ { 0x00000000, 0x00000000, 0x00000000, 0x00001000 }, \ /* VSCR_REGS. */ \ @@ -1180,7 +1196,7 @@ enum reg_class /* CA_REGS. */ \ { 0x00000000, 0x00000000, 0x00000000, 0x00000004 }, \ /* ALL_REGS. */ \ - { 0xffffffff, 0xffffffff, 0xffffffff, 0x00007fff } \ + { 0xffffffff, 0xffffffff, 0xffffffff, 0x007fffff } \ } /* The same information, inverted: @@ -1204,6 +1220,7 @@ enum r6000_reg_class_enum { RS6000_CONSTRAINT_wr, /* GPR register if 64-bit */ RS6000_CONSTRAINT_wx, /* FPR register for STFIWX */ RS6000_CONSTRAINT_wA, /* BASE_REGS if 64-bit. */ + RS6000_CONSTRAINT_wD, /* Accumulator regs if MMA/Dense Math. */ RS6000_CONSTRAINT_MAX }; @@ -2077,7 +2094,16 @@ extern char rs6000_reg_names[][8]; /* register names (0 vs. %r0). */ &rs6000_reg_names[108][0], /* vrsave */ \ &rs6000_reg_names[109][0], /* vscr */ \ \ - &rs6000_reg_names[110][0] /* sfp */ \ + &rs6000_reg_names[110][0], /* sfp */ \ + \ + &rs6000_reg_names[111][0], /* dmr0 */ \ + &rs6000_reg_names[112][0], /* dmr1 */ \ + &rs6000_reg_names[113][0], /* dmr2 */ \ + &rs6000_reg_names[114][0], /* dmr3 */ \ + &rs6000_reg_names[115][0], /* dmr4 */ \ + &rs6000_reg_names[116][0], /* dmr5 */ \ + &rs6000_reg_names[117][0], /* dmr6 */ \ + &rs6000_reg_names[118][0], /* dmr7 */ \ } /* Table of additional register names to use in user input. */ @@ -2131,6 +2157,8 @@ extern char rs6000_reg_names[][8]; /* register names (0 vs. %r0). */ {"vs52", 84}, {"vs53", 85}, {"vs54", 86}, {"vs55", 87}, \ {"vs56", 88}, {"vs57", 89}, {"vs58", 90}, {"vs59", 91}, \ {"vs60", 92}, {"vs61", 93}, {"vs62", 94}, {"vs63", 95}, \ + {"dmr0", 111}, {"dmr1", 112}, {"dmr2", 113}, {"dmr3", 114}, \ + {"dmr4", 115}, {"dmr5", 116}, {"dmr6", 117}, {"dmr7", 118}, \ } /* This is how to output an element of a case-vector that is relative. */ diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index be4623aeb69..4a5007dc539 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -51,6 +51,8 @@ (define_constants (VRSAVE_REGNO 108) (VSCR_REGNO 109) (FRAME_POINTER_REGNUM 110) + (FIRST_DMR_REGNO 111) + (LAST_DMR_REGNO 118) ]) ;; @@ -354,7 +356,7 @@ (define_attr "cpu" (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) ;; The ISA we implement. -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,dm,not_dm" (const_string "any")) ;; Is this alternative enabled for the current CPU/ISA/etc.? @@ -402,6 +404,14 @@ (define_attr "enabled" "" (and (eq_attr "isa" "p10") (match_test "TARGET_POWER10")) (const_int 1) + + (and (eq_attr "isa" "dm") + (match_test "TARGET_DENSE_MATH")) + (const_int 1) + + (and (eq_attr "isa" "not_dm") + (match_test "!TARGET_DENSE_MATH")) + (const_int 1) ] (const_int 0))) ;; If this instruction is microcoded on the CELL processor diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 574df08bb5a..6872d359952 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -628,6 +628,10 @@ mfuture Target Undocumented Mask(FUTURE) Var(rs6000_isa_flags) Generate (do not generate) future instructions. +mdense-math +Target Undocumented Mask(DENSE_MATH) Var(rs6000_isa_flags) +Generate (do not generate) dense math instructions. + ; Documented parameters -param=rs6000-vect-unroll-limit= diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index d0a71ecbb80..65ccb1de667 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3264,6 +3264,9 @@ Like @code{d}, if @option{-mpowerpc-gfxopt} is used; otherwise, @code{NO_REGS}. @item wA Like @code{b}, if @option{-mpowerpc64} is used; otherwise, @code{NO_REGS}. +@item wD +Accumulator register for either @option{-mma} or @option{-mdense-math}. + @item wB Signed 5-bit constant integer that can be loaded into an Altivec register. From patchwork Thu Nov 10 02:50:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 60327 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B76F53858D28 for ; Thu, 10 Nov 2022 02:51:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B76F53858D28 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668048673; bh=RWRfIQ/T4J3XxZrD50j729Dmd2/T+ikSQuGTpxliSWY=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=MwlGrlfU1WGHnp8FCuAirKwc2qwqSR3eH7nPMoyjf7UCTlyiClrZ+u/MyWAePl9q6 dwbrvmQlSU0XIkC1minC6PCXv+XQrvgzkVqBD0w6EhwqjrydzV6lZCrLAHRrO+XTyT mvAKD4jmF5FmO5fBDHDdi+bij/CTvjSvk7jm7tL4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 05B943858D28 for ; Thu, 10 Nov 2022 02:50:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 05B943858D28 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2AA2U2p3022372; Thu, 10 Nov 2022 02:50:39 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krrjsgb17-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:50:38 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AA2U115021074; Thu, 10 Nov 2022 02:50:38 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krrjsgb0v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:50:38 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AA2Z1ZK017756; Thu, 10 Nov 2022 02:50:37 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma03dal.us.ibm.com with ESMTP id 3kngpjj4hd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:50:37 +0000 Received: from smtpav04.dal12v.mail.ibm.com ([9.208.128.131]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AA2obEx32702732 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 02:50:37 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 750445805A; Thu, 10 Nov 2022 02:50:35 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D542D5805E; Thu, 10 Nov 2022 02:50:34 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.5.6]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTPS; Thu, 10 Nov 2022 02:50:34 +0000 (GMT) Date: Wed, 9 Nov 2022 21:50:24 -0500 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt References: Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: TT5rjbTocLKcIP9Ue4Vbs7JnF5qP5wbw X-Proofpoint-ORIG-GUID: Rw7HUzW6vkzJUbDKi-_V2z6AkiZWeuky X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-09_06,2022-11-09_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 suspectscore=0 adultscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 spamscore=0 clxscore=1015 bulkscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211100016 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch changes the MMA instructions to use either FPR registers (-mcpu=power10) or DMRs (-mcpu=future). In this patch, the existing MMA instruction names are used. A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs. The patches have been tested on the following platforms. I added the patches for PR target/107299 that I submitted on November 2nd before doing the builds so that GCC would build on systems using IEEE 128-bit long double. * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html There were no regressions with doing bootstrap builds and running the regression tests: 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). Can I check this patch into the GCC 13 master branch? 2022-11-09 Michael Meissner gcc/ * config/rs6000/mma.md (mma_): New define_expand to handle mma_ for dense math and non dense math. (mma_ insn): Restrict to non dense math. (mma_xxsetaccz): Convert to define_expand to handle non dense math and dense math. (mma_xxsetaccz_p10): Rename from mma_xxsetaccz and restrict usage to non dense math. (mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz. (mma_): Add support for dense math. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define __PPC_DMR__ if we have dense math instructions. * config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if dense math and only FPRs if not dense math. (rs6000_split_multireg_move): Do not generate accumulator prime or de-prime instructions if dense math. --- gcc/config/rs6000/mma.md | 247 +++++++++++++++++++++------------- gcc/config/rs6000/rs6000-c.cc | 3 + gcc/config/rs6000/rs6000.cc | 35 ++--- 3 files changed, 176 insertions(+), 109 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index fe2e9c9e63e..835f34e8e00 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -545,190 +545,249 @@ (define_insn "*mma_disassemble_acc_dm" "dmxxextfdmr256 %0,%1,2" [(set_attr "type" "mma")]) -(define_insn "mma_" +;; MMA instructions that do not use their accumulators as an input, still must +;; not allow their vector operands to overlap the registers used by the +;; accumulator. We enforce this by marking the output as early clobber. If we +;; have dense math, we don't need the whole prime/de-prime action, so just make +;; thse instructions be NOPs. + +(define_expand "mma_" + [(set (match_operand:XO 0 "register_operand") + (unspec:XO [(match_operand:XO 1 "register_operand")] + MMA_ACC))] + "TARGET_MMA" +{ + if (TARGET_DENSE_MATH) + { + if (!rtx_equal_p (operands[0], operands[1])) + emit_move_insn (operands[0], operands[1]); + DONE; + } + + /* Generate the prime/de-prime code. */ +}) + +(define_insn "*mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")] MMA_ACC))] - "TARGET_MMA" + "TARGET_MMA && !TARGET_DENSE_MATH" " %A0" [(set_attr "type" "mma")]) ;; We can't have integer constants in XOmode so we wrap this in an -;; UNSPEC_VOLATILE. +;; UNSPEC_VOLATILE for the non-dense math case. For dense math, we don't need +;; to disable optimization and we can do a normal UNSPEC. -(define_insn "mma_xxsetaccz" - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") +(define_expand "mma_xxsetaccz" + [(set (match_operand:XO 0 "register_operand") (unspec_volatile:XO [(const_int 0)] UNSPECV_MMA_XXSETACCZ))] "TARGET_MMA" +{ + if (TARGET_DENSE_MATH) + { + emit_insn (gen_mma_xxsetaccz_dm (operands[0])); + DONE; + } +}) + +(define_insn "*mma_xxsetaccz_p10" + [(set (match_operand:XO 0 "fpr_reg_operand" "=d") + (unspec_volatile:XO [(const_int 0)] + UNSPECV_MMA_XXSETACCZ))] + "TARGET_MMA && !TARGET_DENSE_MATH" "xxsetaccz %A0" [(set_attr "type" "mma")]) + +(define_insn "mma_xxsetaccz_dm" + [(set (match_operand:XO 0 "dmr_operand" "=wD") + (unspec:XO [(const_int 0)] + UNSPECV_MMA_XXSETACCZ))] + "TARGET_DENSE_MATH" + "dmsetaccz %0" + [(set_attr "type" "mma")]) + (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] MMA_VV))] "TARGET_MMA" " %A0,%x1,%x2" - [(set_attr "type" "mma")]) + [(set_attr "type" "mma") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")] MMA_AVV))] "TARGET_MMA" " %A0,%x2,%x3" - [(set_attr "type" "mma")]) + [(set_attr "type" "mma") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] MMA_PV))] "TARGET_MMA" " %A0,%x1,%x2" - [(set_attr "type" "mma")]) + [(set_attr "type" "mma") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:OO 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")] MMA_APV))] "TARGET_MMA" " %A0,%x2,%x3" - [(set_attr "type" "mma")]) + [(set_attr "type" "mma") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "u8bit_cint_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")] MMA_VVI4I4I8))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n") - (match_operand:SI 6 "u8bit_cint_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 6 "u8bit_cint_operand" "n,n,n")] MMA_AVVI4I4I8))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_3_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")] MMA_VVI4I4I2))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n") - (match_operand:SI 6 "const_0_to_3_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 6 "const_0_to_3_operand" "n,n,n")] MMA_AVVI4I4I2))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")] MMA_VVI4I4))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")] MMA_AVVI4I4))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_3_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_3_operand" "n,n,n")] MMA_PVI4I2))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:OO 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_3_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")] MMA_APVI4I2))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")] MMA_VVI4I4I4))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n") - (match_operand:SI 6 "const_0_to_15_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 6 "const_0_to_15_operand" "n,n,n")] MMA_AVVI4I4I4))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 0d7b43f8edb..6b08e4839e5 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -600,6 +600,9 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags) /* Tell the user if we support the MMA instructions. */ if ((flags & OPTION_MASK_MMA) != 0) rs6000_define_or_undefine_macro (define_p, "__MMA__"); + /* Tell the user if we support the dense math instructions. */ + if ((flags & OPTION_MASK_DENSE_MATH) != 0) + rs6000_define_or_undefine_macro (define_p, "__PPC_DMR__"); /* Whether pc-relative code is being generated. */ if ((flags & OPTION_MASK_PCREL) != 0) rs6000_define_or_undefine_macro (define_p, "__PCREL__"); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index d224ef64bb1..361fd87aa8c 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -13863,8 +13863,13 @@ print_operand (FILE *file, rtx x, int code) overlapping with the FPR registers. */ if (!REG_P (x)) output_operand_lossage ("invalid %%A value"); - else if (TARGET_DENSE_MATH && DMR_REGNO_P (REGNO (x))) - fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO); + else if (TARGET_DENSE_MATH) + { + if (DMR_REGNO_P (REGNO (x))) + fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO); + else + output_operand_lossage ("%%A operand is not a DMR"); + } else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) output_operand_lossage ("invalid %%A value"); else @@ -27257,7 +27262,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) /* If we are reading an accumulator register, we have to deprime it before we can access it. */ - if (TARGET_MMA + if (TARGET_MMA && !TARGET_DENSE_MATH && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -27289,9 +27294,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) emit_insn (gen_rtx_SET (dst2, src2)); } - /* If we are writing an accumulator register, we have to - prime it after we've written it. */ - if (TARGET_MMA + /* If we are writing an accumulator register that overlaps with the + FPR registers, we have to prime it after we've written it. */ + if (TARGET_MMA && !TARGET_DENSE_MATH && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst)); @@ -27360,9 +27365,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) emit_insn (gen_rtx_SET (dst_i, op)); } - /* We are writing an accumulator register, so we have to - prime it after we've written it. */ - if (GET_MODE (src) == XOmode) + /* On systems without dense math where accumulators overlap with the + vector registers, we have to prime it after we've written it. */ + if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH) emit_insn (gen_mma_xxmtacc (dst, dst)); return; @@ -27373,9 +27378,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst))) { - /* If we are reading an accumulator register, we have to - deprime it before we can access it. */ - if (TARGET_MMA + /* If we are reading an accumulator register and we don't have dense + math, we have to deprime it before we can access it. */ + if (TARGET_MMA && !TARGET_DENSE_MATH && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -27403,7 +27408,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) /* If we are writing an accumulator register, we have to prime it after we've written it. */ - if (TARGET_MMA + if (TARGET_MMA && !TARGET_DENSE_MATH && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst)); } @@ -27540,7 +27545,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) /* If we are reading an accumulator register, we have to deprime it before we can access it. */ - if (TARGET_MMA && REG_P (src) + if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src) && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -27572,7 +27577,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) /* If we are writing an accumulator register, we have to prime it after we've written it. */ - if (TARGET_MMA && REG_P (dst) + if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst) && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst)); From patchwork Thu Nov 10 02:51:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 60328 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 793963865C0E for ; Thu, 10 Nov 2022 02:52:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 793963865C0E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668048745; bh=SJ8ZLhddnXoAcQ6lttnAoatkAWI+5HnvdA7xdBsGKcU=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=ng3uWUMaMGhJBenhba/OVXautJYxmlmTJy6pzrMDjNjmzh53EQ3ELhuWp8LYeN+3c iy6YE9c5EjTYnWJN385Up/lyXbi5Ts9q5d24qzf4mw/8EfKgRSiif3Xiy9+4mLktrY 4oRLhHRa5tEd0o2ULqkswCPvl5cok9BsndjuGwY4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id EAB573858D28 for ; Thu, 10 Nov 2022 02:51:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EAB573858D28 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2AA2kpae004451; Thu, 10 Nov 2022 02:51:53 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3krrtp82m7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:51:53 +0000 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AA2nNpp010934; Thu, 10 Nov 2022 02:51:52 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3krrtp82m1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:51:52 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AA2o4n0008464; Thu, 10 Nov 2022 02:51:52 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma03wdc.us.ibm.com with ESMTP id 3kngs7ht5h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:51:52 +0000 Received: from smtpav04.dal12v.mail.ibm.com ([9.208.128.131]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AA2ptYd61276548 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 02:51:55 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9E14F58063; Thu, 10 Nov 2022 02:51:50 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 08DF75805A; Thu, 10 Nov 2022 02:51:50 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.5.6]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTPS; Thu, 10 Nov 2022 02:51:49 +0000 (GMT) Date: Wed, 9 Nov 2022 21:51:48 -0500 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt References: Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: tr_0UNOQ16BNGL7aK7HR0i_cElWlpukZ X-Proofpoint-ORIG-GUID: 50r1JgESzdyuklhlefBN45TAXAp9-n6F X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-09_06,2022-11-09_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 spamscore=0 suspectscore=0 clxscore=1015 mlxscore=0 phishscore=0 mlxlogscore=999 adultscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211100016 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch changes the assembler instruction names for MMA instructions from the original name used in power10 to the new name when used with the dense math system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the same bits for either spelling. The patches have been tested on the following platforms. I added the patches for PR target/107299 that I submitted on November 2nd before doing the builds so that GCC would build on systems using IEEE 128-bit long double. * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html There were no regressions with doing bootstrap builds and running the regression tests: 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). Can I check this patch into the GCC 13 master branch? 2022-11-09 Michael Meissner gcc/ * config/rs6000/mma.md (vvi4i4i8_dm): New int attribute. (avvi4i4i8_dm): Likewise. (vvi4i4i2_dm): Likewise. (avvi4i4i2_dm): Likewise. (vvi4i4_dm): Likewise. (avvi4i4_dm): Likewise. (pvi4i2_dm): Likewise. (apvi4i2_dm): Likewise. (vvi4i4i4_dm): Likewise. (avvi4i4i4_dm): Likewise. (mma_): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. gcc/testsuite/ * gcc.target/powerpc/dm-double-test.c: New test. * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New target test. --- gcc/config/rs6000/mma.md | 98 +++++++-- .../gcc.target/powerpc/dm-double-test.c | 194 ++++++++++++++++++ gcc/testsuite/lib/target-supports.exp | 19 ++ 3 files changed, 299 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 835f34e8e00..cca1fa71f75 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -227,13 +227,22 @@ (define_int_attr apv [(UNSPEC_MMA_XVF64GERPP "xvf64gerpp") (define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) +(define_int_attr vvi4i4i8_dm [(UNSPEC_MMA_PMXVI4GER8 "pmdmxvi4ger8")]) + (define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "pmxvi4ger8pp")]) +(define_int_attr avvi4i4i8_dm [(UNSPEC_MMA_PMXVI4GER8PP "pmdmxvi4ger8pp")]) + (define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2 "pmxvi16ger2") (UNSPEC_MMA_PMXVI16GER2S "pmxvi16ger2s") (UNSPEC_MMA_PMXVF16GER2 "pmxvf16ger2") (UNSPEC_MMA_PMXVBF16GER2 "pmxvbf16ger2")]) +(define_int_attr vvi4i4i2_dm [(UNSPEC_MMA_PMXVI16GER2 "pmdmxvi16ger2") + (UNSPEC_MMA_PMXVI16GER2S "pmdmxvi16ger2s") + (UNSPEC_MMA_PMXVF16GER2 "pmdmxvf16ger2") + (UNSPEC_MMA_PMXVBF16GER2 "pmdmxvbf16ger2")]) + (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") (UNSPEC_MMA_PMXVI16GER2SPP "pmxvi16ger2spp") (UNSPEC_MMA_PMXVF16GER2PP "pmxvf16ger2pp") @@ -245,25 +254,54 @@ (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") (UNSPEC_MMA_PMXVBF16GER2NP "pmxvbf16ger2np") (UNSPEC_MMA_PMXVBF16GER2NN "pmxvbf16ger2nn")]) +(define_int_attr avvi4i4i2_dm [(UNSPEC_MMA_PMXVI16GER2PP "pmdmxvi16ger2pp") + (UNSPEC_MMA_PMXVI16GER2SPP "pmdmxvi16ger2spp") + (UNSPEC_MMA_PMXVF16GER2PP "pmdmxvf16ger2pp") + (UNSPEC_MMA_PMXVF16GER2PN "pmdmxvf16ger2pn") + (UNSPEC_MMA_PMXVF16GER2NP "pmdmxvf16ger2np") + (UNSPEC_MMA_PMXVF16GER2NN "pmdmxvf16ger2nn") + (UNSPEC_MMA_PMXVBF16GER2PP "pmdmxvbf16ger2pp") + (UNSPEC_MMA_PMXVBF16GER2PN "pmdmxvbf16ger2pn") + (UNSPEC_MMA_PMXVBF16GER2NP "pmdmxvbf16ger2np") + (UNSPEC_MMA_PMXVBF16GER2NN "pmdmxvbf16ger2nn")]) + (define_int_attr vvi4i4 [(UNSPEC_MMA_PMXVF32GER "pmxvf32ger")]) +(define_int_attr vvi4i4_dm [(UNSPEC_MMA_PMXVF32GER "pmdmxvf32ger")]) + (define_int_attr avvi4i4 [(UNSPEC_MMA_PMXVF32GERPP "pmxvf32gerpp") (UNSPEC_MMA_PMXVF32GERPN "pmxvf32gerpn") (UNSPEC_MMA_PMXVF32GERNP "pmxvf32gernp") (UNSPEC_MMA_PMXVF32GERNN "pmxvf32gernn")]) +(define_int_attr avvi4i4_dm [(UNSPEC_MMA_PMXVF32GERPP "pmdmxvf32gerpp") + (UNSPEC_MMA_PMXVF32GERPN "pmdmxvf32gerpn") + (UNSPEC_MMA_PMXVF32GERNP "pmdmxvf32gernp") + (UNSPEC_MMA_PMXVF32GERNN "pmdmxvf32gernn")]) + (define_int_attr pvi4i2 [(UNSPEC_MMA_PMXVF64GER "pmxvf64ger")]) +(define_int_attr pvi4i2_dm [(UNSPEC_MMA_PMXVF64GER "pmdmxvf64ger")]) + (define_int_attr apvi4i2 [(UNSPEC_MMA_PMXVF64GERPP "pmxvf64gerpp") (UNSPEC_MMA_PMXVF64GERPN "pmxvf64gerpn") (UNSPEC_MMA_PMXVF64GERNP "pmxvf64gernp") (UNSPEC_MMA_PMXVF64GERNN "pmxvf64gernn")]) +(define_int_attr apvi4i2_dm [(UNSPEC_MMA_PMXVF64GERPP "pmdmxvf64gerpp") + (UNSPEC_MMA_PMXVF64GERPN "pmdmxvf64gerpn") + (UNSPEC_MMA_PMXVF64GERNP "pmdmxvf64gernp") + (UNSPEC_MMA_PMXVF64GERNN "pmdmxvf64gernn")]) + (define_int_attr vvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4 "pmxvi8ger4")]) +(define_int_attr vvi4i4i4_dm [(UNSPEC_MMA_PMXVI8GER4 "pmdmxvi8ger4")]) + (define_int_attr avvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4PP "pmxvi8ger4pp") (UNSPEC_MMA_PMXVI8GER4SPP "pmxvi8ger4spp")]) +(define_int_attr avvi4i4i4_dm [(UNSPEC_MMA_PMXVI8GER4PP "pmdmxvi8ger4pp") + (UNSPEC_MMA_PMXVI8GER4SPP "pmdmxvi8ger4spp")]) ;; Vector pair support. OOmode can only live in VSRs. (define_expand "movoo" @@ -615,7 +653,10 @@ (define_insn "mma_" (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] MMA_VV))] "TARGET_MMA" - " %A0,%x1,%x2" + "@ + dm %A0,%x1,%x2 + %A0,%x1,%x2 + %A0,%x1,%x2" [(set_attr "type" "mma") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -636,7 +677,10 @@ (define_insn "mma_" (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] MMA_PV))] "TARGET_MMA" - " %A0,%x1,%x2" + "@ + dm %A0,%x1,%x2 + %A0,%x1,%x2 + %A0,%x1,%x2" [(set_attr "type" "mma") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -647,7 +691,10 @@ (define_insn "mma_" (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")] MMA_APV))] "TARGET_MMA" - " %A0,%x2,%x3" + "@ + dm %A0,%x2,%x3 + %A0,%x2,%x3 + %A0,%x2,%x3" [(set_attr "type" "mma") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -660,7 +707,10 @@ (define_insn "mma_" (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")] MMA_VVI4I4I8))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4,%5" + "@ + dm %A0,%x1,%x2,%3,%4,%5 + %A0,%x1,%x2,%3,%4,%5 + %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -689,7 +739,10 @@ (define_insn "mma_" (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")] MMA_VVI4I4I2))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4,%5" + "@ + %A0,%x1,%x2,%3,%4,%5 + %A0,%x1,%x2,%3,%4,%5 + %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -704,7 +757,10 @@ (define_insn "mma_" (match_operand:SI 6 "const_0_to_3_operand" "n,n,n")] MMA_AVVI4I4I2))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5,%6" + "@ + %A0,%x2,%x3,%4,%5,%6 + %A0,%x2,%x3,%4,%5,%6 + %A0,%x2,%x3,%4,%5,%6" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -717,7 +773,10 @@ (define_insn "mma_" (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")] MMA_VVI4I4))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4" + "@ + %A0,%x1,%x2,%3,%4 + %A0,%x1,%x2,%3,%4 + %A0,%x1,%x2,%3,%4" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -731,7 +790,10 @@ (define_insn "mma_" (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")] MMA_AVVI4I4))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5" + "@ + %A0,%x2,%x3,%4,%5 + %A0,%x2,%x3,%4,%5 + %A0,%x2,%x3,%4,%5" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -744,7 +806,10 @@ (define_insn "mma_" (match_operand:SI 4 "const_0_to_3_operand" "n,n,n")] MMA_PVI4I2))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4" + "@ + %A0,%x1,%x2,%3,%4 + %A0,%x1,%x2,%3,%4 + %A0,%x1,%x2,%3,%4" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -758,7 +823,10 @@ (define_insn "mma_" (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")] MMA_APVI4I2))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5" + "@ + %A0,%x2,%x3,%4,%5 + %A0,%x2,%x3,%4,%5 + %A0,%x2,%x3,%4,%5" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -772,7 +840,10 @@ (define_insn "mma_" (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")] MMA_VVI4I4I4))] "TARGET_MMA" - " %A0,%x1,%x2,%3,%4,%5" + "@ + %A0,%x1,%x2,%3,%4,%5 + %A0,%x1,%x2,%3,%4,%5 + %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) @@ -787,7 +858,10 @@ (define_insn "mma_" (match_operand:SI 6 "const_0_to_15_operand" "n,n,n")] MMA_AVVI4I4I4))] "TARGET_MMA" - " %A0,%x2,%x3,%4,%5,%6" + "@ + %A0,%x2,%x3,%4,%5,%6 + %A0,%x2,%x3,%4,%5,%6 + %A0,%x2,%x3,%4,%5,%6" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) diff --git a/gcc/testsuite/gcc.target/powerpc/dm-double-test.c b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c new file mode 100644 index 00000000000..eaa01426c78 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c @@ -0,0 +1,194 @@ +/* Test derived from mma-double-1.c, modified for dense math. */ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_dense_math_ok } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +#include +#include +#include + +typedef unsigned char vec_t __attribute__ ((vector_size (16))); +typedef double v4sf_t __attribute__ ((vector_size (16))); +#define SAVE_ACC(ACC, ldc, J) \ + __builtin_mma_disassemble_acc (result, ACC); \ + rowC = (v4sf_t *) &CO[0*ldc+J]; \ + rowC[0] += result[0]; \ + rowC = (v4sf_t *) &CO[1*ldc+J]; \ + rowC[0] += result[1]; \ + rowC = (v4sf_t *) &CO[2*ldc+J]; \ + rowC[0] += result[2]; \ + rowC = (v4sf_t *) &CO[3*ldc+J]; \ + rowC[0] += result[3]; + +void +DM (int m, int n, int k, double *A, double *B, double *C) +{ + __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; + v4sf_t result[4]; + v4sf_t *rowC; + for (int l = 0; l < n; l += 4) + { + double *CO; + double *AO; + AO = A; + CO = C; + C += m * 4; + for (int j = 0; j < m; j += 16) + { + double *BO = B; + __builtin_mma_xxsetaccz (&acc0); + __builtin_mma_xxsetaccz (&acc1); + __builtin_mma_xxsetaccz (&acc2); + __builtin_mma_xxsetaccz (&acc3); + __builtin_mma_xxsetaccz (&acc4); + __builtin_mma_xxsetaccz (&acc5); + __builtin_mma_xxsetaccz (&acc6); + __builtin_mma_xxsetaccz (&acc7); + unsigned long i; + + for (i = 0; i < k; i++) + { + vec_t *rowA = (vec_t *) & AO[i * 16]; + __vector_pair rowB; + vec_t *rb = (vec_t *) & BO[i * 4]; + __builtin_mma_assemble_pair (&rowB, rb[1], rb[0]); + __builtin_mma_xvf64gerpp (&acc0, rowB, rowA[0]); + __builtin_mma_xvf64gerpp (&acc1, rowB, rowA[1]); + __builtin_mma_xvf64gerpp (&acc2, rowB, rowA[2]); + __builtin_mma_xvf64gerpp (&acc3, rowB, rowA[3]); + __builtin_mma_xvf64gerpp (&acc4, rowB, rowA[4]); + __builtin_mma_xvf64gerpp (&acc5, rowB, rowA[5]); + __builtin_mma_xvf64gerpp (&acc6, rowB, rowA[6]); + __builtin_mma_xvf64gerpp (&acc7, rowB, rowA[7]); + } + SAVE_ACC (&acc0, m, 0); + SAVE_ACC (&acc2, m, 4); + SAVE_ACC (&acc1, m, 2); + SAVE_ACC (&acc3, m, 6); + SAVE_ACC (&acc4, m, 8); + SAVE_ACC (&acc6, m, 12); + SAVE_ACC (&acc5, m, 10); + SAVE_ACC (&acc7, m, 14); + AO += k * 16; + BO += k * 4; + CO += 16; + } + B += k * 4; + } +} + +void +init (double *matrix, int row, int column) +{ + for (int j = 0; j < column; j++) + { + for (int i = 0; i < row; i++) + { + matrix[j * row + i] = (i * 16 + 2 + j) / 0.123; + } + } +} + +void +init0 (double *matrix, double *matrix1, int row, int column) +{ + for (int j = 0; j < column; j++) + for (int i = 0; i < row; i++) + matrix[j * row + i] = matrix1[j * row + i] = 0; +} + + +void +print (const char *name, const double *matrix, int row, int column) +{ + printf ("Matrix %s has %d rows and %d columns:\n", name, row, column); + for (int i = 0; i < row; i++) + { + for (int j = 0; j < column; j++) + { + printf ("%f ", matrix[j * row + i]); + } + printf ("\n"); + } + printf ("\n"); +} + +int +main (int argc, char *argv[]) +{ + int rowsA, colsB, common; + int i, j, k; + int ret = 0; + + for (int t = 16; t <= 128; t += 16) + { + for (int t1 = 4; t1 <= 16; t1 += 4) + { + rowsA = t; + colsB = t1; + common = 1; + /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */ + double A[rowsA * common]; + double B[common * colsB]; + double C[rowsA * colsB]; + double D[rowsA * colsB]; + + + init (A, rowsA, common); + init (B, common, colsB); + init0 (C, D, rowsA, colsB); + DM (rowsA, colsB, common, A, B, C); + + for (i = 0; i < colsB; i++) + { + for (j = 0; j < rowsA; j++) + { + D[i * rowsA + j] = 0; + for (k = 0; k < common; k++) + { + D[i * rowsA + j] += + A[k * rowsA + j] * B[k + common * i]; + } + } + } + for (i = 0; i < colsB; i++) + { + for (j = 0; j < rowsA; j++) + { + for (k = 0; k < common; k++) + { + if (D[i * rowsA + j] != C[i * rowsA + j]) + { + printf ("Error %d,%d,%d\n",i,j,k); + ret++; + } + } + } + } + if (ret) + { + print ("A", A, rowsA, common); + print ("B", B, common, colsB); + print ("C", C, rowsA, colsB); + print ("D", D, rowsA, colsB); + } + } + } + +#ifdef VERBOSE + if (ret) + printf ("DM double test fail: %d errors\n",ret); + else + printf ("DM double test success: 0 DM errors\n"); +#else + if (ret) + abort(); +#endif + + return ret; +} + +/* { dg-final { scan-assembler {\mdmsetaccz\M} } } */ +/* { dg-final { scan-assembler {\mdmxvf64gerpp\M} } } */ +/* { dg-final { scan-assembler {\mdmxxextfdmr512\M} } } */ + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index c7f583d6d14..b70ebf963f9 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -6534,6 +6534,25 @@ proc check_effective_target_power10_ok { } { } } +# Return 1 if this is a PowerPC target supporting -mcpu=future or -mdense-math +# which enables the dense math operations. +proc check_effective_target_powerpc_dense_math_ok { } { + return [check_no_compiler_messages_nocache powerpc_dense_math_ok assembly { + __vector_quad vq; + void test (void) + { + #ifndef __PPC_DMR__ + #error "target does not have dense math support." + #else + /* Make sure we have dense math support. */ + __vector_quad dmr; + __asm__ ("dmsetaccz %A0" : "=wD" (dmr)); + vq = dmr; + #endif + } + } "-mcpu=future"] +} + # Return 1 if this is a PowerPC target supporting -mfloat128 via either # software emulation on power7/power8 systems or hardware support on power9. From patchwork Thu Nov 10 02:52:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 60329 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 881E63858039 for ; Thu, 10 Nov 2022 02:53:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 881E63858039 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668048811; bh=3WxOJU1BXdMLPXHAAe8+ZUdryfDTxzl80kR7lH2l9xw=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=H/IL9yGe7VSebrmpaDrbSJRboY1UhGandAuK2J1QpQ39kthDSZz7IVGashVwo3+wk ++hxnSd7eK8l6LV+TxwZrVB7mtSvrkKsLaAMSb+3+eeqH6mP3IjA+MXVaz3rzO3fyg HC2/4IJZ2vPOBUjrdFo6G8/dBtPRA0T17SOQUffw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 051073858408 for ; Thu, 10 Nov 2022 02:52:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 051073858408 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2AA05F4Y001259; Thu, 10 Nov 2022 02:52:55 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krhqpaakf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:52:54 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AA2pKil008966; Thu, 10 Nov 2022 02:52:54 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3krhqpaaka-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:52:54 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AA2p1nT003940; Thu, 10 Nov 2022 02:52:53 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma03dal.us.ibm.com with ESMTP id 3kngpjj4u3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:52:53 +0000 Received: from smtpav05.dal12v.mail.ibm.com ([9.208.128.132]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AA2qn9X3998210 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 02:52:49 GMT Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5343358068; Thu, 10 Nov 2022 02:52:51 +0000 (GMT) Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id ACA2B58052; Thu, 10 Nov 2022 02:52:50 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.5.6]) by smtpav05.dal12v.mail.ibm.com (Postfix) with ESMTPS; Thu, 10 Nov 2022 02:52:50 +0000 (GMT) Date: Wed, 9 Nov 2022 21:52:49 -0500 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt References: Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 57cwnHfKLYDIHJOdobd8_SzZK7hEnd2b X-Proofpoint-GUID: 1vCRrm2UOgZTW-CeNAJf8zVb5ufCkPWs X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-09_06,2022-11-09_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 malwarescore=0 adultscore=0 mlxlogscore=999 bulkscore=0 spamscore=0 phishscore=0 priorityscore=1501 mlxscore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211100016 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch is a prelimianry patch to add the full 1,024 bit dense math register (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the DMR register. This patch only adds the new 1,024 bit register support. It does not add support for any instructions that need 1,024 bit registers instead of 512 bit registers. I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit registers. The 'wD' constraint added in previous patches is used for these registers. I added support to do load and store of DMRs via the VSX registers, since there are no load/store dense math instructions. I added the new keyword '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. At present, the tree constant propigation patch does not work with 1,024 bit DMRs. I believe this is due to the CCP pass not skipping opaque modes. I hope once this patch is committed, we can work on the machine independent changes to allow the CCP pass not to issue an internal error when a DMR is used. The patches have been tested on the following platforms. I added the patches for PR target/107299 that I submitted on November 2nd before doing the builds so that GCC would build on systems using IEEE 128-bit long double. * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html There were no regressions with doing bootstrap builds and running the regression tests: 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). Can I check this patch into the GCC 13 master branch? 2022-11-09 Michael Meissner gcc/ * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. (UNSPEC_DM_INSERT512_LOWER): Likewise. (UNSPEC_DM_EXTRACT512): Likewise. (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. (movtdo): New define_expand and define_insn_and_split to implement 1,024 bit DMR registers. (movtdo_insert512_upper): New insn. (movtdo_insert512_lower): Likewise. (movtdo_extract512): Likewise. (reload_dmr_from_memory): Likewise. (reload_dmr_to_memory): Likewise. * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR support. (rs6000_init_builtins): Add support for __dmr keyword. * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support for TDOmode. (rs6000_function_arg): Likewise. * config/rs6000/rs6000-modes.def (TDOmode): New mode. * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add support for TDOmode. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_hard_regno_mode_ok): Likewise. (rs6000_modes_tieable_p): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload hooks for DMR mode. (reg_offset_addressing_ok_p): Add support for TDOmode. (rs6000_emit_move): Likewise. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_secondary_reload_class): Likewise. (rs6000_mangle_type): Add mangling for __dmr type. (rs6000_dmr_register_move_cost): Add support for TDOmode. (rs6000_split_multireg_move): Likewise. (rs6000_invalid_conversion): Likewise. * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. (enum rs6000_builtin_type_index): Add DMR type nodes. (dmr_type_node): Likewise. (ptr_dmr_type_node): Likewise. gcc/testsuite/ * gcc.target/powerpc/dm-1024bit.c: New test. --- gcc/config/rs6000/mma.md | 152 ++++++++++++++++++ gcc/config/rs6000/rs6000-builtin.cc | 13 ++ gcc/config/rs6000/rs6000-call.cc | 13 +- gcc/config/rs6000/rs6000-modes.def | 4 + gcc/config/rs6000/rs6000.cc | 125 ++++++++++---- gcc/config/rs6000/rs6000.h | 7 +- gcc/testsuite/gcc.target/powerpc/dm-1024bit.c | 68 ++++++++ 7 files changed, 350 insertions(+), 32 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index cca1fa71f75..2c08ad7619a 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -92,6 +92,11 @@ (define_c_enum "unspec" UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC UNSPEC_DM_ASSEMBLE_ACC + UNSPEC_DM_INSERT512_UPPER + UNSPEC_DM_INSERT512_LOWER + UNSPEC_DM_EXTRACT512 + UNSPEC_DMR_RELOAD_FROM_MEMORY + UNSPEC_DMR_RELOAD_TO_MEMORY ]) (define_c_enum "unspecv" @@ -865,3 +870,150 @@ (define_insn "mma_" [(set_attr "type" "mma") (set_attr "prefixed" "yes") (set_attr "isa" "dm,not_dm,not_dm")]) + + +;; TDOmode (i.e. __dmr). +(define_expand "movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand") + (match_operand:TDO 1 "input_operand"))] + "TARGET_DENSE_MATH" +{ + rs6000_emit_move (operands[0], operands[1], TDOmode); + DONE; +}) + +(define_insn_and_split "*movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand" "=wa,m,wa,wD,wD,wa") + (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))] + "TARGET_DENSE_MATH + && (gpc_reg_operand (operands[0], TDOmode) + || gpc_reg_operand (operands[1], TDOmode))" + "@ + # + # + # + # + dmmr %0,%1 + #" + "&& reload_completed + && (!dmr_operand (operands[0], TDOmode) || !dmr_operand (operands[1], TDOmode))" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + + if (REG_P (op0) && REG_P (op1)) + { + int regno0 = REGNO (op0); + int regno1 = REGNO (op1); + + if (DMR_REGNO_P (regno0) && VSX_REGNO_P (regno1)) + { + rtx op1_upper = gen_rtx_REG (XOmode, regno1); + rtx op1_lower = gen_rtx_REG (XOmode, regno1 + 4); + emit_insn (gen_movtdo_insert512_upper (op0, op1_upper)); + emit_insn (gen_movtdo_insert512_lower (op0, op0, op1_lower)); + DONE; + } + + else if (VSX_REGNO_P (regno0) && DMR_REGNO_P (regno1)) + { + rtx op0_upper = gen_rtx_REG (XOmode, regno0); + rtx op0_lower = gen_rtx_REG (XOmode, regno0 + 4); + emit_insn (gen_movtdo_extract512 (op0_upper, op1, const0_rtx)); + emit_insn (gen_movtdo_extract512 (op0_lower, op1, const1_rtx)); + DONE; + } + } + + rs6000_split_multireg_move (operands[0], operands[1]); + DONE; +} + [(set_attr "type" "vecload,vecstore,vecmove,vecmove,vecmove,vecmove") + (set_attr "length" "*,*,32,8,*,8") + (set_attr "max_prefixed_insns" "4,4,*,*,*,*")]) + +;; Move from VSX registers to DMR registers via two insert 512 bit +;; instructions. +(define_insn "movtdo_insert512_upper" + [(set (match_operand:TDO 0 "dmr_operand" "=wD") + (unspec:TDO [(match_operand:XO 1 "vsx_register_operand" "wa")] + UNSPEC_DM_INSERT512_UPPER))] + "TARGET_DENSE_MATH" + "dmxxinstdmr512 %0,%1,%Y1,0" + [(set_attr "type" "mma")]) + +(define_insn "movtdo_insert512_lower" + [(set (match_operand:TDO 0 "dmr_operand" "=wD") + (unspec:TDO [(match_operand:TDO 1 "dmr_operand" "0") + (match_operand:XO 2 "vsx_register_operand" "wa")] + UNSPEC_DM_INSERT512_LOWER))] + "TARGET_DENSE_MATH" + "dmxxinstdmr512 %0,%2,%Y2,1" + [(set_attr "type" "mma")]) + +;; Move from DMR registers to VSX registers via two extract 512 bit +;; instructions. +(define_insn "movtdo_extract512" + [(set (match_operand:XO 0 "vsx_register_operand" "=wa") + (unspec:XO [(match_operand:TDO 1 "dmr_operand" "wD") + (match_operand 2 "const_0_to_1_operand" "n")] + UNSPEC_DM_EXTRACT512))] + "TARGET_DENSE_MATH" + "dmxxextfdmr512 %0,%Y0,%1,%2" + [(set_attr "type" "mma")]) + +;; Reload DMR registers from memory +(define_insn_and_split "reload_dmr_from_memory" + [(set (match_operand:TDO 0 "dmr_operand" "=wD") + (unspec:TDO [(match_operand:TDO 1 "memory_operand" "m")] + UNSPEC_DMR_RELOAD_FROM_MEMORY)) + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] + "TARGET_DENSE_MATH" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + rtx tmp = operands[2]; + rtx mem_upper = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 0 : 32); + rtx mem_lower = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 32 : 0); + + emit_move_insn (tmp, mem_upper); + emit_insn (gen_movtdo_insert512_upper (dest, tmp)); + + emit_move_insn (tmp, mem_lower); + emit_insn (gen_movtdo_insert512_lower (dest, dest, tmp)); + DONE; +} + [(set_attr "length" "16") + (set_attr "max_prefixed_insns" "2") + (set_attr "type" "vecload")]) + +;; Reload dense math registers to memory +(define_insn_and_split "reload_dmr_to_memory" + [(set (match_operand:TDO 0 "memory_operand" "=m") + (unspec:TDO [(match_operand:TDO 1 "dmr_operand" "wD")] + UNSPEC_DMR_RELOAD_TO_MEMORY)) + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] + "TARGET_DENSE_MATH" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + rtx tmp = operands[2]; + rtx mem_upper = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 0 : 32); + rtx mem_lower = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 32 : 0); + + emit_insn (gen_movtdo_extract512 (tmp, src, const0_rtx)); + emit_move_insn (mem_upper, tmp); + + emit_insn (gen_movtdo_extract512 (tmp, src, const1_rtx)); + emit_move_insn (mem_lower, tmp); + DONE; +} + [(set_attr "length" "16") + (set_attr "max_prefixed_insns" "2")]) diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index e5298f45363..f4eba184db8 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -495,6 +495,8 @@ const char *rs6000_type_string (tree type_node) return "__vector_pair"; else if (type_node == vector_quad_type_node) return "__vector_quad"; + else if (type_node == dmr_type_node) + return "__dmr"; return "unknown"; } @@ -784,6 +786,17 @@ rs6000_init_builtins (void) t = build_qualified_type (vector_quad_type_node, TYPE_QUAL_CONST); ptr_vector_quad_type_node = build_pointer_type (t); + dmr_type_node = make_node (OPAQUE_TYPE); + SET_TYPE_MODE (dmr_type_node, TDOmode); + TYPE_SIZE (dmr_type_node) = bitsize_int (GET_MODE_BITSIZE (TDOmode)); + TYPE_PRECISION (dmr_type_node) = GET_MODE_BITSIZE (TDOmode); + TYPE_SIZE_UNIT (dmr_type_node) = size_int (GET_MODE_SIZE (TDOmode)); + SET_TYPE_ALIGN (dmr_type_node, 512); + TYPE_USER_ALIGN (dmr_type_node) = 0; + lang_hooks.types.register_builtin_type (dmr_type_node, "__dmr"); + t = build_qualified_type (dmr_type_node, TYPE_QUAL_CONST); + ptr_dmr_type_node = build_pointer_type (t); + tdecl = add_builtin_type ("__bool char", bool_char_type_node); TYPE_NAME (bool_char_type_node) = tdecl; diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc index 6da4de67137..13eacd3a84d 100644 --- a/gcc/config/rs6000/rs6000-call.cc +++ b/gcc/config/rs6000/rs6000-call.cc @@ -437,7 +437,8 @@ rs6000_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED) if (cfun && !cfun->machine->mma_return_type_error && TREE_TYPE (cfun->decl) == fntype - && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode)) + && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode + || TYPE_MODE (type) == TDOmode)) { /* Record we have now handled function CFUN, so the next time we are called, we do not re-report the same error. */ @@ -1641,6 +1642,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const function_arg_info &arg) return NULL_RTX; } + if (mode == TDOmode) + { + if (TYPE_CANONICAL (type) != NULL_TREE) + type = TYPE_CANONICAL (type); + error ("invalid use of dense math operand of type %qs as a function " + "parameter", + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)))); + return NULL_RTX; + } + /* Return a marker to indicate whether CR1 needs to set or clear the bit that V.4 uses to say fp args were passed in registers. Assume that we don't need the marker for software floating point, diff --git a/gcc/config/rs6000/rs6000-modes.def b/gcc/config/rs6000/rs6000-modes.def index 8ef910869c5..a1384d5dd91 100644 --- a/gcc/config/rs6000/rs6000-modes.def +++ b/gcc/config/rs6000/rs6000-modes.def @@ -86,3 +86,7 @@ PARTIAL_INT_MODE (TI, 128, PTI); /* Modes used by __vector_pair and __vector_quad. */ OPAQUE_MODE (OO, 32); OPAQUE_MODE (XO, 64); + +/* Modes used by __dmr. */ +OPAQUE_MODE (TDO, 128); + diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 361fd87aa8c..dba37df3c61 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1820,7 +1820,9 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode mode) 128-bit floating point that can go in vector registers, which has VSX memory addressing. */ if (FP_REGNO_P (regno)) - reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode) + reg_size = (VECTOR_MEM_VSX_P (mode) + || VECTOR_ALIGNMENT_P (mode) + || mode == TDOmode ? UNITS_PER_VSX_WORD : UNITS_PER_FP_WORD); @@ -1853,9 +1855,9 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) /* On power10, MMA accumulator modes need FPR registers divisible by 4. - If dense math is enabled, allow all VSX registers plus the DMR registers. - We need to make sure we don't cross between the boundary of FPRs and - traditional Altiviec registers. */ + If dense math is enabled, allow all VSX registers plus the dense math + registers. We need to make sure we don't cross between the boundary of + FPRs and traditional Altiviec registers. */ if (mode == XOmode) { if (TARGET_MMA && !TARGET_DENSE_MATH) @@ -1877,7 +1879,27 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) return 0; } - /* No other types other than XOmode can go in DMRs. */ + /* Dense math register modes need DMR registers or VSX registers divisible by + 2. We need to make sure we don't cross between the boundary of FPRs and + traditional Altiviec registers. */ + if (mode == TDOmode) + { + if (!TARGET_DENSE_MATH) + return 0; + + if (DMR_REGNO_P (regno)) + return 1; + + if (FP_REGNO_P (regno)) + return ((regno & 1) == 0 && regno <= LAST_FPR_REGNO - 7); + + if (ALTIVEC_REGNO_P (regno)) + return ((regno & 1) == 0 && regno <= LAST_ALTIVEC_REGNO - 7); + + return 0; + } + + /* No other types other than XOmode or TDOmode can go in DMRs. */ if (DMR_REGNO_P (regno)) return 0; @@ -1985,9 +2007,11 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode) GPR registers, and TImode can go in any GPR as well as VSX registers (PR 57744). - Similarly, don't allow OOmode (vector pair, restricted to even VSX - registers) or XOmode (vector quad, restricted to FPR registers divisible - by 4) to tie with other modes. + Similarly, don't allow OOmode (vector pair), XOmode (vector quad), or + TDOmode (dmr register) to pair with anything else. Vector pairs are + restricted to even/odd VSX registers. Without dense math, vector quads are + limited to FPR registers divisible by 4. With dense math, vector quads are + limited to even VSX registers or DMR registers. Altivec/VSX vector tests were moved ahead of scalar float mode, so that IEEE 128-bit floating point on VSX systems ties with other vectors. */ @@ -1996,7 +2020,8 @@ static bool rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2) { if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode - || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode) + || mode1 == TDOmode || mode2 == PTImode || mode2 == OOmode + || mode2 == XOmode || mode2 == TDOmode) return mode1 == mode2; if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1)) @@ -2287,6 +2312,7 @@ rs6000_debug_reg_global (void) V4DFmode, OOmode, XOmode, + TDOmode, CCmode, CCUNSmode, CCEQmode, @@ -2652,7 +2678,7 @@ rs6000_setup_reg_addr_masks (void) /* Special case DMR registers. */ if (rc == RELOAD_REG_DMR) { - if (TARGET_DENSE_MATH && m2 == XOmode) + if (TARGET_DENSE_MATH && (m2 == XOmode || m2 == TDOmode)) { addr_mask = RELOAD_REG_VALID; reg_addr[m].addr_mask[rc] = addr_mask; @@ -2762,7 +2788,7 @@ rs6000_setup_reg_addr_masks (void) since it will be broken into two vector moves. Vector quads and 1,024 bit DMR values can only do offset loads. */ else if ((addr_mask != 0) && TARGET_MMA - && (m2 == OOmode || m2 == XOmode)) + && (m2 == OOmode || m2 == XOmode || m2 == TDOmode)) { addr_mask |= RELOAD_REG_OFFSET; if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX) @@ -2990,6 +3016,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) rs6000_vector_align[XOmode] = 512; } + /* Add support for 1,024 bit DMR registers. */ + if (TARGET_DENSE_MATH) + { + rs6000_vector_unit[TDOmode] = VECTOR_NONE; + rs6000_vector_mem[TDOmode] = VECTOR_VSX; + rs6000_vector_align[TDOmode] = 512; + } + /* Register class constraints for the constraints that depend on compile switches. When the VSX code was added, different constraints were added based on the type (DFmode, V2DFmode, V4SFmode). For the vector types, all @@ -3203,6 +3237,12 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) } } + if (TARGET_DENSE_MATH) + { + reg_addr[TDOmode].reload_load = CODE_FOR_reload_dmr_from_memory; + reg_addr[TDOmode].reload_store = CODE_FOR_reload_dmr_to_memory; + } + /* Precalculate HARD_REGNO_NREGS. */ for (r = 0; HARD_REGISTER_NUM_P (r); ++r) for (m = 0; m < NUM_MACHINE_MODES; ++m) @@ -8678,12 +8718,15 @@ reg_offset_addressing_ok_p (machine_mode mode) return mode_supports_dq_form (mode); break; - /* The vector pair/quad types support offset addressing if the - underlying vectors support offset addressing. */ + /* The vector pair/quad types and the dense math types support offset + addressing if the underlying vectors support offset addressing. */ case E_OOmode: case E_XOmode: return TARGET_MMA; + case E_TDOmode: + return TARGET_DENSE_MATH; + case E_SDmode: /* If we can do direct load/stores of SDmode, restrict it to reg+reg addressing for the LFIWZX and STFIWX instructions. */ @@ -10962,6 +11005,12 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode) (mode == OOmode) ? "__vector_pair" : "__vector_quad"); break; + case E_TDOmode: + if (CONST_INT_P (operands[1])) + error ("%qs is an opaque type, and you cannot set it to constants", + "__dmr"); + break; + case E_SImode: case E_DImode: /* Use default pattern for address of ELF small data */ @@ -12416,7 +12465,7 @@ rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type, /* We can transfer between VSX registers and DMR registers without needing extra registers. */ - if (TARGET_DENSE_MATH && mode == XOmode + if (TARGET_DENSE_MATH && (mode == XOmode || mode == TDOmode) && ((to_type == DMR_REG_TYPE && from_type == VSX_REG_TYPE) || (to_type == VSX_REG_TYPE && from_type == DMR_REG_TYPE))) return true; @@ -13217,6 +13266,9 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass) if (mode == XOmode) return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS; + if (mode == TDOmode) + return VSX_REGS; + if (GET_MODE_CLASS (mode) == MODE_INT) return GENERAL_REGS; } @@ -13340,8 +13392,9 @@ rs6000_secondary_reload_class (enum reg_class rclass, machine_mode mode, else regno = -1; - /* DMR registers don't have loads or stores. We have to go through the VSX - registers to load XOmode (vector quad). */ + /* Dense math registers don't have loads or stores. We have to go through + the VSX registers to load XOmode (vector quad) and TDOmode (dmr 1024 + bit). */ if (TARGET_DENSE_MATH && rclass == DM_REGS) return VSX_REGS; @@ -20367,6 +20420,8 @@ rs6000_mangle_type (const_tree type) return "u13__vector_pair"; if (type == vector_quad_type_node) return "u13__vector_quad"; + if (type == dmr_type_node) + return "u5__dmr"; /* For all other types, use the default mangling. */ return NULL; @@ -22490,6 +22545,10 @@ rs6000_dmr_register_move_cost (machine_mode mode, reg_class_t rclass) if (mode == XOmode) return reg_move_base; + /* __dmr (i.e. TDOmode) is transferred in 2 instructions. */ + else if (mode == TDOmode) + return reg_move_base * 2; + else return reg_move_base * 2 * hard_regno_nregs (FIRST_DMR_REGNO, mode); } @@ -27189,9 +27248,10 @@ rs6000_split_multireg_move (rtx dst, rtx src) mode = GET_MODE (dst); nregs = hard_regno_nregs (reg, mode); - /* If we have a vector quad register for MMA, and this is a load or store, - see if we can use vector paired load/stores. */ - if (mode == XOmode && TARGET_MMA + /* If we have a vector quad register for MMA or DMR register for dense math, + and this is a load or store, see if we can use vector paired + load/stores. */ + if ((mode == XOmode || mode == TDOmode) && TARGET_MMA && (MEM_P (dst) || MEM_P (src))) { reg_mode = OOmode; @@ -27199,7 +27259,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) } /* If we have a vector pair/quad mode, split it into two/four separate vectors. */ - else if (mode == OOmode || mode == XOmode) + else if (mode == OOmode || mode == XOmode || mode == TDOmode) reg_mode = V1TImode; else if (FP_REGNO_P (reg)) reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode : @@ -27245,13 +27305,13 @@ rs6000_split_multireg_move (rtx dst, rtx src) return; } - /* The __vector_pair and __vector_quad modes are multi-register - modes, so if we have to load or store the registers, we have to be - careful to properly swap them if we're in little endian mode - below. This means the last register gets the first memory - location. We also need to be careful of using the right register - numbers if we are splitting XO to OO. */ - if (mode == OOmode || mode == XOmode) + /* The __vector_pair, __vector_quad, and __dmr modes are multi-register + modes, so if we have to load or store the registers, we have to be careful + to properly swap them if we're in little endian mode below. This means + the last register gets the first memory location. We also need to be + careful of using the right register numbers if we are splitting XO to + OO. */ + if (mode == OOmode || mode == XOmode || mode == TDOmode) { nregs = hard_regno_nregs (reg, mode); int reg_mode_nregs = hard_regno_nregs (reg, reg_mode); @@ -27388,7 +27448,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) overlap. */ int i; /* XO/OO are opaque so cannot use subregs. */ - if (mode == OOmode || mode == XOmode ) + if (mode == OOmode || mode == XOmode || mode == TDOmode) { for (i = nregs - 1; i >= 0; i--) { @@ -27562,7 +27622,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) continue; /* XO/OO are opaque so cannot use subregs. */ - if (mode == OOmode || mode == XOmode ) + if (mode == OOmode || mode == XOmode || mode == TDOmode) { rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j); rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j); @@ -28543,7 +28603,8 @@ rs6000_invalid_conversion (const_tree fromtype, const_tree totype) if (frommode != tomode) { - /* Do not allow conversions to/from XOmode and OOmode types. */ + /* Do not allow conversions to/from XOmode, OOmode, and TDOmode + types. */ if (frommode == XOmode) return N_("invalid conversion from type %<__vector_quad%>"); if (tomode == XOmode) @@ -28552,6 +28613,10 @@ rs6000_invalid_conversion (const_tree fromtype, const_tree totype) return N_("invalid conversion from type %<__vector_pair%>"); if (tomode == OOmode) return N_("invalid conversion to type %<__vector_pair%>"); + if (frommode == TDOmode) + return N_("invalid conversion from type %<__dmr%>"); + if (tomode == TDOmode) + return N_("invalid conversion to type %<__dmr%>"); } /* Conversion allowed. */ diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 27f7067ef52..f2b63c3cd71 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1006,7 +1006,8 @@ enum data_align { align_abi, align_opt, align_both }; /* Modes that are not vectors, but require vector alignment. Treat these like vectors in terms of loads and stores. */ #define VECTOR_ALIGNMENT_P(MODE) \ - (FLOAT128_VECTOR_P (MODE) || (MODE) == OOmode || (MODE) == XOmode) + (FLOAT128_VECTOR_P (MODE) || (MODE) == OOmode || (MODE) == XOmode \ + || (MODE) == TDOmode) #define ALTIVEC_VECTOR_MODE(MODE) \ ((MODE) == V16QImode \ @@ -2292,6 +2293,7 @@ enum rs6000_builtin_type_index RS6000_BTI_const_str, /* pointer to const char * */ RS6000_BTI_vector_pair, /* unsigned 256-bit types (vector pair). */ RS6000_BTI_vector_quad, /* unsigned 512-bit types (vector quad). */ + RS6000_BTI_dmr, /* unsigned 1,024-bit types (dmr). */ RS6000_BTI_const_ptr_void, /* const pointer to void */ RS6000_BTI_ptr_V16QI, RS6000_BTI_ptr_V1TI, @@ -2330,6 +2332,7 @@ enum rs6000_builtin_type_index RS6000_BTI_ptr_dfloat128, RS6000_BTI_ptr_vector_pair, RS6000_BTI_ptr_vector_quad, + RS6000_BTI_ptr_dmr, RS6000_BTI_ptr_long_long, RS6000_BTI_ptr_long_long_unsigned, RS6000_BTI_MAX @@ -2387,6 +2390,7 @@ enum rs6000_builtin_type_index #define const_str_type_node (rs6000_builtin_types[RS6000_BTI_const_str]) #define vector_pair_type_node (rs6000_builtin_types[RS6000_BTI_vector_pair]) #define vector_quad_type_node (rs6000_builtin_types[RS6000_BTI_vector_quad]) +#define dmr_type_node (rs6000_builtin_types[RS6000_BTI_dmr]) #define pcvoid_type_node (rs6000_builtin_types[RS6000_BTI_const_ptr_void]) #define ptr_V16QI_type_node (rs6000_builtin_types[RS6000_BTI_ptr_V16QI]) #define ptr_V1TI_type_node (rs6000_builtin_types[RS6000_BTI_ptr_V1TI]) @@ -2425,6 +2429,7 @@ enum rs6000_builtin_type_index #define ptr_dfloat128_type_node (rs6000_builtin_types[RS6000_BTI_ptr_dfloat128]) #define ptr_vector_pair_type_node (rs6000_builtin_types[RS6000_BTI_ptr_vector_pair]) #define ptr_vector_quad_type_node (rs6000_builtin_types[RS6000_BTI_ptr_vector_quad]) +#define ptr_dmr_type_node (rs6000_builtin_types[RS6000_BTI_ptr_dmr]) #define ptr_long_long_integer_type_node (rs6000_builtin_types[RS6000_BTI_ptr_long_long]) #define ptr_long_long_unsigned_type_node (rs6000_builtin_types[RS6000_BTI_ptr_long_long_unsigned]) diff --git a/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c b/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c new file mode 100644 index 00000000000..4d57ce826bb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c @@ -0,0 +1,68 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_dense_math_ok } */ + +/* Note tree constant proprigation needs to be tweaked to allow skipping opaque + modes. At the moment just to verify that basic loads and stores are handled + of the new type, just disable CCP for now. By the time GCC 13 is shipped, + this needed to be fixed. */ +/* { dg-options "-mdejagnu-cpu=future -O2 -fno-tree-ccp" } */ + +/* Test basic load/store for __dmr type. */ + +#ifndef CONSTRAINT +#if defined(USE_D) +#define CONSTRAINT "d" + +#elif defined(USE_V) +#define CONSTRAINT "v" + +#elif defined(USE_WA) +#define CONSTRAINT "wa" + +#else +#define CONSTRAINT "wD" +#endif +#endif +const char constraint[] = CONSTRAINT; + +void foo_mem_asm (__dmr *p, __dmr *q) +{ + /* 2 LXVP instructions. */ + __dmr vq = *p; + + /* 2 DMXXINSTDMR512 instructions to transfer VSX to DMR. */ + __asm__ ("# foo (" CONSTRAINT ") %A0" : "+" CONSTRAINT (vq)); + /* 2 DMXXEXTFDMR512 instructions to transfer DMR to VSX. */ + + /* 2 STXVP instructions. */ + *q = vq; +} + +void foo_mem_asm2 (__dmr *p, __dmr *q) +{ + /* 2 LXVP instructions. */ + __dmr vq = *p; + __dmr vq2; + __dmr vq3; + + /* 2 DMXXINSTDMR512 instructions to transfer VSX to DMR. */ + __asm__ ("# foo1 (" CONSTRAINT ") %A0" : "+" CONSTRAINT (vq)); + /* 2 DMXXEXTFDMR512 instructions to transfer DMR to VSX. */ + + vq2 = vq; + __asm__ ("# foo2 (wa) %0" : "+wa" (vq2)); + + /* 2 STXVP instructions. */ + *q = vq2; +} + +void foo_mem (__dmr *p, __dmr *q) +{ + /* 2 LXVP, 2 STXVP instructions, no DMR transfer. */ + *q = *p; +} + +/* { dg-final { scan-assembler-times {\mdmxxextfdmr512\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mdmxxinstdmr512\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 12 } } */