From patchwork Fri Feb  3 21:16:01 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.ibm.com>
X-Patchwork-Id: 55461
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 806A23858C20
	for <patchwork@sourceware.org>; Fri,  3 Feb 2023 21:16:37 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 806A23858C20
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1675458997;
	bh=1qD+TYKnMwSFsbqUdXWdBGirnKD+GKNQpYtgIXSg/ZM=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=nU4T0Z8acKn3m4U94ilSUFxA6vC0KEDoaE22zCTJmDgBWJhsB03fO6Q3e/cSdbU9+
	 WR6PgyDSdomXZOwtfFErAiQqbft1imc6kwoB4gF4Uqa1a/BdHh0rHEYx9UPP6fnM4L
	 uBYidt7ZXE5N5bym2odKUnXYSSsNC+Cq+4eenzgs=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 32C513858C52
 for <gcc-patches@gcc.gnu.org>; Fri,  3 Feb 2023 21:16:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 32C513858C52
Received: from pps.filterd (m0127361.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 313LCIrZ025408; Fri, 3 Feb 2023 21:16:07 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nh7hrjy0u-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 03 Feb 2023 21:16:06 +0000
Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1])
 by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 313LDv9w030516;
 Fri, 3 Feb 2023 21:16:06 GMT
Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com
 [169.55.85.253])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nh7hrjy0n-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 03 Feb 2023 21:16:06 +0000
Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1])
 by ppma01wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 313JKSaf026752;
 Fri, 3 Feb 2023 21:16:05 GMT
Received: from smtprelay04.wdc07v.mail.ibm.com ([9.208.129.114])
 by ppma01wdc.us.ibm.com (PPS) with ESMTPS id 3ncvtnjs3a-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 03 Feb 2023 21:16:05 +0000
Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com
 [10.241.53.102])
 by smtprelay04.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 313LG32o16319076
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 3 Feb 2023 21:16:04 GMT
Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 9867758064;
 Fri,  3 Feb 2023 21:16:03 +0000 (GMT)
Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 0F8EE58060;
 Fri,  3 Feb 2023 21:16:03 +0000 (GMT)
Received: from toto.the-meissners.org (unknown [9.65.233.34])
 by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTPS;
 Fri,  3 Feb 2023 21:16:02 +0000 (GMT)
Date: Fri, 3 Feb 2023 16:16:01 -0500
To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
Subject: [PATCH 0/8] PowerPC future support for Dense Math
Message-ID: <Y915kdrgaQxmJl9A@toto.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>,
 gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>,
 David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
Content-Disposition: inline
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: Ys28LaK4GevpZ0Q6oKuEdDt0Qd9kuNVz
X-Proofpoint-ORIG-GUID: OkEduBJ10VC9jvk-1O2f7Ehj8Pd07O1n
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1
 definitions=2023-02-03_19,2023-02-03_01,2022-06-22_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 suspectscore=0 phishscore=0
 mlxscore=0 mlxlogscore=999 malwarescore=0 adultscore=0 bulkscore=0
 clxscore=1015 priorityscore=1501 spamscore=0 lowpriorityscore=0
 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2212070000 definitions=main-2302030189
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Michael Meissner via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Michael Meissner <meissner@linux.ibm.com>
Reply-To: Michael Meissner <meissner@linux.ibm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

These patches were originally posted on November 10th.  Segher has asked that I
repost them.  These patches are somewhat changed since the original posting to
address some of the comments.

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html

In the first patch (adding -mcpu=future), I have taken out the code of making
-mtune=future act as -mtune=power10.  Instead I went through all of the places
that look at the tuning (mostly in power10.md and rs6000.cc), and added future
as an option.  Obviously at a later time, we will provide a separate tuning
file for future (or whatever the new name will be if the instructions are added
officially).  But for now, it will suffice.

In patch #3, I fixed the opcode for clearing a dense math register that Peter
had noticed.  I was using the name based on the existing clear instruction,
instead of the new instruction.

In patch #6, I fixed the code, relying on the changes for setting the precision
field to 16 bits.  Since that patch will not be able to go into GCC 13 at
present, we might skip that support for now.  The important thing for existing
users of the MMA code is the support for accumulators being in the separate
dense math registers rather than overlapping does need to go in, and we can
probably delay the 1,024 bit register support, or implement in a different
fashion.

In the insn names, I tried to switch to using _vsx instead of _fpr for the
existing MMA support instructions.  I also tried to clear up the comments to
specify ISA 3.1 instead of power10 when talking about the existing MMA
support.

The following is from the original posting (slightly modified):

This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 8 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators store in DMRs.  This patch
enables the register allocation, but it does not move the existing MMA to use
these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.  [As I mentioned above, at the moment, this patch
is problematical as is]

7) The seventh patch is not DMR related.  It adds support for variants of the
load/store vector with length instruction that may be added in future PowerPC
processors.  These variants eliminate having to shift the byte length left by
56 bits.

8) The eighth patch is also not DMR related.  It adds support for a saturating
subtract operation that may be added to future PowerPC processors.

In terms of changes, we now use the wD constraint for accumulators.  If you
compile with -mcpu=power10, the wD constraint will match the equivalent VSX
register (0..31) that overlaps with the accumulator.  If you compile with
-mcpu=future, the wD constraint will match the DMR register and not the FPR
register.

This patch also modifies the print_operand %A output modifier to print out DMR
register numbers if -mcpu=future, and continue to print out the FPR register
number divided by 4 for -mcpu=power10.

In general, if you only use the built-in functions, things work between the two
systems.  If you use extended asm, you will likely need to modify the code.
Going forward, hopefully if you modify your code to use the wD constraint and
%A output modifier, you can write code that switches more easily between the
two systems.

Again, these are preliminary patches for a potential future machine.  Things
will likely change in terms of implementation and usage over time.