From patchwork Fri Nov  5 04:02:28 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.ibm.com>
X-Patchwork-Id: 47082
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 1F0B23857C5F
	for <patchwork@sourceware.org>; Fri,  5 Nov 2021 04:03:05 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1F0B23857C5F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1636084985;
	bh=S2DMxJuldSajfh53xjaxv4uFPFLeY8QEkMJKkKZ+cCI=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=IMWtUlMkflwb4pKB6sZb7pDLDkeBt0bybMNfaspAw0tpnKJvu9R0zBDrjBP1eCx7H
	 Oi43cENmPotrhCGbz3uOEl2hJa6ho33X36TYPu2ppgEmB5OCQljHVgDzQ4ckPmQIue
	 lhIgo0slyUl60t/D0ETLRQOc4XA8rHqac2noO+t8=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 7C4363858D35
 for <gcc-patches@gcc.gnu.org>; Fri,  5 Nov 2021 04:02:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7C4363858D35
Received: from pps.filterd (m0187473.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id
 1A51qnjs014590;
 Fri, 5 Nov 2021 04:02:33 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3c4ub29v4e-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 05 Nov 2021 04:02:33 +0000
Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1A53iRV1014687;
 Fri, 5 Nov 2021 04:02:32 GMT
Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com
 [169.63.121.186])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3c4ub29v45-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 05 Nov 2021 04:02:32 +0000
Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1])
 by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1A53rKQF024050;
 Fri, 5 Nov 2021 04:02:31 GMT
Received: from b03cxnp08028.gho.boulder.ibm.com
 (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20])
 by ppma03wdc.us.ibm.com with ESMTP id 3c4t4eajy9-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 05 Nov 2021 04:02:31 +0000
Received: from b03ledav005.gho.boulder.ibm.com
 (b03ledav005.gho.boulder.ibm.com [9.17.130.236])
 by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 1A542UGR39649730
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 5 Nov 2021 04:02:30 GMT
Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 5D3A8BE059;
 Fri,  5 Nov 2021 04:02:30 +0000 (GMT)
Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id C908ABE05A;
 Fri,  5 Nov 2021 04:02:29 +0000 (GMT)
Received: from toto.the-meissners.org (unknown [9.65.76.254])
 by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTPS;
 Fri,  5 Nov 2021 04:02:29 +0000 (GMT)
Date: Fri, 5 Nov 2021 00:02:28 -0400
To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 David Edelsohn <dje.gcc@gmail.com>, Bill Schmidt <wschmidt@linux.ibm.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
Subject: [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions
Message-ID: <YYSs1OZk3bR0VxED@toto.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>,
 gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>,
 David Edelsohn <dje.gcc@gmail.com>,
 Bill Schmidt <wschmidt@linux.ibm.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
MIME-Version: 1.0
Content-Disposition: inline
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: kRrXCG-qEXNDNEAYSV421jyad6sXWzbt
X-Proofpoint-ORIG-GUID: GM5rJcaovuk2fiVKyjUAWg84g3I_OYWY
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475
 definitions=2021-11-05_01,2021-11-03_01,2020-04-07_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 phishscore=0
 priorityscore=1501 lowpriorityscore=0 impostorscore=0 mlxscore=0
 suspectscore=0 malwarescore=0 clxscore=1011 adultscore=0 mlxlogscore=999
 bulkscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2110150000 definitions=main-2111050021
X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, KAM_MANYTO, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Michael Meissner via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Michael Meissner <meissner@linux.ibm.com>
Reply-To: Michael Meissner <meissner@linux.ibm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

These patches are a refinement of the patches to add XXSPLTIDP support on
September 13th.  These patches generate instructions that load up a VSX
register with certain constants instead of using PLXV to load the constant.

On the Power10:

 * XXSPLTIDP is a prefixed instruction that takes a value encoded as a SFmode
   constant, converts it to DFmode, and splats that value to the two 64-bit
   parts of the register.

 * XXSPLTIW is a prefixed instruction that takes a 32-bit value and splats this
   value into the 4 32-bit parts of the vector register, i.e. it can be used to
   generate V4SImode and V4SFmode vector constants where all of the elements
   are the same.

 * XXSPLTI32DX is a prefixed instruction that takes a 32-bit value and splats
   this value into either the 2 even 32-bit parts of the vector register or 2
   odd 32-bit parts.  Thus 2 XXSPLTI32DX instructions can generate a 64-bit
   constant that cannot be generated by XXSPLTIDP.  Note, in the current set of
   patches, I do not add support for XXSPLTI32DX.  I have done so in previous
   patches, and I could add it if desired.  Because it is 2 back-to-back
   prefixed instructions that are serially dependent on each other, I don't
   think it is worthwhile to use XXSPLTI32DX.

 * LXVKQ is a non-prefixed instruction that loads up certain 128-bit values the
   match particular IEEE 128-bit constants (-0.0f128, 1.0f128, 2.0f128, etc.).

There are 5 patches in this set.

One of the takeaways from the last review was it would be desirable to generate
the instruction if it generates a value that matches the vector constant, even
if the vector type is not the native vector type for the instruction.

For example, the following code:

	vector unsigned long long
	foo (void)
	{
	#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
	  return (vector unsigned long long) { 0, 1ULL << 63 };
	#else
	  return (vector unsigned long long) { 1ULL << 63, 0 };
	#endif
	}

should generate:

	foo:
		lxvkq 34,16
	        blr

To that end, I added support to create a data structure that takes a vector or
scalar constant and represents it as a series of bytes, half-words, words, and
double-words.  Then the recognizer functions use this data structure to decide
if a given instruction can be generated.

This way functions like easy_vector_constant can avoid repeatedly taking a
vector constant and converting it into internal format before trying to decide
if a given instruction can be generated.  For example, this is the part in
easy_vector_constant that determines if a vector constant can generate LXVKQ,
XXSPLTIDP, or XXSPLTIW:

      /* Constants that can be generated with ISA 3.1 instructions are
         easy.  */
      vec_const_128bit_type vsx_const;
      if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
	{
	  if (constant_generates_lxvkq (&vsx_const))
	    return true;

	  if (constant_generates_xxspltiw (&vsx_const))
	    return true;

	  if (constant_generates_xxspltidp (&vsx_const))
	    return true;
	}

In theory, a lot of the altivec constant functions could be converted to use
this data structure, but I haven't rewritten those instructions.

The 5 patches are:

1) Add the data structure and function converting vector/scalar constants to
   that data structure.  Note, this function is not used in the current patch,
   but the remaining 4 patches depend on it.
   
2) Add support to recognize when we could generate the LXVKQ instruction.

3) Add support to recognize when we could generate the XXSPLTIW instruction.

4) Add support to recognize when we could generate the XXSPLTIDP instruction
   for vector constants.

5) Add support to recognize when we could generate the XXSPLTIDP instruction
   for SFmode and DFmode constants.

I have built these patches on power9 and power10 little endian systems with no
regressions in the current tests.  I am kicking off a build on a power8 big
endian system as I write this post.  I have run previous versions of the patch
on the big endian system without problems.  I would like to check this into the
GCC 12 trunk branch.

At the moment, I am not asking to be able to back-port the patches to GCC 11,
but we can do this if it is deemed desirable.