From patchwork Fri Nov 5 04:02:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 47082 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1F0B23857C5F for ; Fri, 5 Nov 2021 04:03:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1F0B23857C5F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1636084985; bh=S2DMxJuldSajfh53xjaxv4uFPFLeY8QEkMJKkKZ+cCI=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=IMWtUlMkflwb4pKB6sZb7pDLDkeBt0bybMNfaspAw0tpnKJvu9R0zBDrjBP1eCx7H Oi43cENmPotrhCGbz3uOEl2hJa6ho33X36TYPu2ppgEmB5OCQljHVgDzQ4ckPmQIue lhIgo0slyUl60t/D0ETLRQOc4XA8rHqac2noO+t8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 7C4363858D35 for ; Fri, 5 Nov 2021 04:02:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7C4363858D35 Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1A51qnjs014590; Fri, 5 Nov 2021 04:02:33 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3c4ub29v4e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Nov 2021 04:02:33 +0000 Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1A53iRV1014687; Fri, 5 Nov 2021 04:02:32 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 3c4ub29v45-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Nov 2021 04:02:32 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1A53rKQF024050; Fri, 5 Nov 2021 04:02:31 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma03wdc.us.ibm.com with ESMTP id 3c4t4eajy9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Nov 2021 04:02:31 +0000 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1A542UGR39649730 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 5 Nov 2021 04:02:30 GMT Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5D3A8BE059; Fri, 5 Nov 2021 04:02:30 +0000 (GMT) Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C908ABE05A; Fri, 5 Nov 2021 04:02:29 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.65.76.254]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTPS; Fri, 5 Nov 2021 04:02:29 +0000 (GMT) Date: Fri, 5 Nov 2021 00:02:28 -0400 To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt Subject: [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: kRrXCG-qEXNDNEAYSV421jyad6sXWzbt X-Proofpoint-ORIG-GUID: GM5rJcaovuk2fiVKyjUAWg84g3I_OYWY X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-05_01,2021-11-03_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 priorityscore=1501 lowpriorityscore=0 impostorscore=0 mlxscore=0 suspectscore=0 malwarescore=0 clxscore=1011 adultscore=0 mlxlogscore=999 bulkscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111050021 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_MANYTO, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" These patches are a refinement of the patches to add XXSPLTIDP support on September 13th. These patches generate instructions that load up a VSX register with certain constants instead of using PLXV to load the constant. On the Power10: * XXSPLTIDP is a prefixed instruction that takes a value encoded as a SFmode constant, converts it to DFmode, and splats that value to the two 64-bit parts of the register. * XXSPLTIW is a prefixed instruction that takes a 32-bit value and splats this value into the 4 32-bit parts of the vector register, i.e. it can be used to generate V4SImode and V4SFmode vector constants where all of the elements are the same. * XXSPLTI32DX is a prefixed instruction that takes a 32-bit value and splats this value into either the 2 even 32-bit parts of the vector register or 2 odd 32-bit parts. Thus 2 XXSPLTI32DX instructions can generate a 64-bit constant that cannot be generated by XXSPLTIDP. Note, in the current set of patches, I do not add support for XXSPLTI32DX. I have done so in previous patches, and I could add it if desired. Because it is 2 back-to-back prefixed instructions that are serially dependent on each other, I don't think it is worthwhile to use XXSPLTI32DX. * LXVKQ is a non-prefixed instruction that loads up certain 128-bit values the match particular IEEE 128-bit constants (-0.0f128, 1.0f128, 2.0f128, etc.). There are 5 patches in this set. One of the takeaways from the last review was it would be desirable to generate the instruction if it generates a value that matches the vector constant, even if the vector type is not the native vector type for the instruction. For example, the following code: vector unsigned long long foo (void) { #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ return (vector unsigned long long) { 0, 1ULL << 63 }; #else return (vector unsigned long long) { 1ULL << 63, 0 }; #endif } should generate: foo: lxvkq 34,16 blr To that end, I added support to create a data structure that takes a vector or scalar constant and represents it as a series of bytes, half-words, words, and double-words. Then the recognizer functions use this data structure to decide if a given instruction can be generated. This way functions like easy_vector_constant can avoid repeatedly taking a vector constant and converting it into internal format before trying to decide if a given instruction can be generated. For example, this is the part in easy_vector_constant that determines if a vector constant can generate LXVKQ, XXSPLTIDP, or XXSPLTIW: /* Constants that can be generated with ISA 3.1 instructions are easy. */ vec_const_128bit_type vsx_const; if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const)) { if (constant_generates_lxvkq (&vsx_const)) return true; if (constant_generates_xxspltiw (&vsx_const)) return true; if (constant_generates_xxspltidp (&vsx_const)) return true; } In theory, a lot of the altivec constant functions could be converted to use this data structure, but I haven't rewritten those instructions. The 5 patches are: 1) Add the data structure and function converting vector/scalar constants to that data structure. Note, this function is not used in the current patch, but the remaining 4 patches depend on it. 2) Add support to recognize when we could generate the LXVKQ instruction. 3) Add support to recognize when we could generate the XXSPLTIW instruction. 4) Add support to recognize when we could generate the XXSPLTIDP instruction for vector constants. 5) Add support to recognize when we could generate the XXSPLTIDP instruction for SFmode and DFmode constants. I have built these patches on power9 and power10 little endian systems with no regressions in the current tests. I am kicking off a build on a power8 big endian system as I write this post. I have run previous versions of the patch on the big endian system without problems. I would like to check this into the GCC 12 trunk branch. At the moment, I am not asking to be able to back-port the patches to GCC 11, but we can do this if it is deemed desirable.