From patchwork Tue Jun 7 00:31:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 54857 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2E46B382F099 for ; Tue, 7 Jun 2022 00:31:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2E46B382F099 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1654561916; bh=OUfMDmk6yu9glRJWSOvq0bD/cbaDmDWivOIs+3hyjYU=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=UVjAN79831goJRUb98x3WjQHIbLoPp1xyDVi+FExZwW+iOfRckIJIy+N1cELFkSAl 78AHqtXB9/wS248s7rgxbWmV0ifejf+Qnhlk5vY8LQlVu5TNa63bxETo3113K3sPUP YJydx4ZH1w8Za0Bd2dTd3CM0JnZe16QSXbMXJvcc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 7E233382F091 for ; Tue, 7 Jun 2022 00:31:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7E233382F091 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 256KTme5018474; Tue, 7 Jun 2022 00:31:23 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ghrewbhs9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 07 Jun 2022 00:31:22 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2570TGqP010374; Tue, 7 Jun 2022 00:31:22 GMT Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ghrewbhru-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 07 Jun 2022 00:31:22 +0000 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2570K3k7001509; Tue, 7 Jun 2022 00:31:20 GMT Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by ppma05wdc.us.ibm.com with ESMTP id 3gfy19gwmc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 07 Jun 2022 00:31:20 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2570VJfC27787560 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 7 Jun 2022 00:31:19 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 917F06E050; Tue, 7 Jun 2022 00:31:19 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 016576E04E; Tue, 7 Jun 2022 00:31:19 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.87.14]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTPS; Tue, 7 Jun 2022 00:31:18 +0000 (GMT) Date: Mon, 6 Jun 2022 20:31:17 -0400 To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH, V3] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293 Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: lcVs_bqOt4vX_ctTib1NHHSLMSiGeIiL X-Proofpoint-GUID: 4GnOF0a0mSIAS7E0anaGxt964h_cCckH X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-06_07,2022-06-03_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 adultscore=0 mlxlogscore=999 spamscore=0 malwarescore=0 suspectscore=0 lowpriorityscore=0 bulkscore=0 phishscore=0 mlxscore=0 impostorscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206070000 X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293. This is version 3 of the patch. The original patch was: | Date: Mon, 28 Mar 2022 12:26:02 -0400 | Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293. | Message-ID: | https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html Version 2 of the patch was: | Date: Fri, 13 May 2022 10:49:26 -0400 | Subject: [PATCH] Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293 | Message-ID: | https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594797.html The differences between version 2 and version 3 was to clean up the description of what the patch does, and to make the example test case clear. In PR target/99293, it was pointed out that doing: vector long long dest0, dest1, src; /* ... */ dest0 = vec_splats (vec_extract (src, 0)); dest1 = vec_splats (vec_extract (src, 1)); would generate slower code. It generates the following code on power8: ;; vec_splats (vec_extract (src, 0)) xxpermdi 0,34,34,3 xxpermdi 34,0,0,0 ;; vec_splats (vec_extract (src, 1)) xxlor 0,34,34 xxpermdi 34,0,0,0 However on power9 and power10 it generates: ;; vec_splats (vec_extract (src, 0)) mfvsld 3,34 mtvsrdd 34,9,9 ;; vec_splats (vec_extract (src, 1)) mfvsrd 9,34 mtvsrdd 34,9,9 This is due to the power9 having the mfvsrld instruction which can extract either 64-bit element into a GPR. While there are alternatives for both vector registers and GPR registers, the register allocator prefers to put DImode into GPR registers. In this case, it is better to have a single combiner pattern that can generate a single xxpermdi, instead of 2 insnsns (the extract and then the concat). This is true if the two operations are move from vector register and move to vector register. As Segher pointed out in a previous version of the patch, the combiner already tries doing creating a (vec_duplicate (vec_select ...)) pattern, but we didn't provide one. This patch reworks vsx_xxspltd_ for V2DImode and V2DFmode so that it now uses VEC_DUPLICATE, which the combiner checks for. I have built Spec 2017 with this patch installed, and the cam4_r benchmark is the only benchmark that generated different code (3 mfvsrld/mtvsrdd pairs of instructions were replaced with xxpermdi). I have built bootstrap versions on the following systems and I have run the regression tests. There were no regressions in the runs: Power9 little endian, --with-cpu=power9 Power10 little endian, --with-cpu=power10 Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests) Can I install this into the trunk? After a burn-in period, can I backport and install this into GCC 11 and GCC 10 branches? 2022-06-06 Michael Meissner gcc/ PR target/99293 * config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove UNSPEC_VSX_XXSPLTD case. * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete. (vsx_xxspltd_): Rewrite to use VEC_DUPLICATE. gcc/testsuite: PR target/99293 * gcc.target/powerpc/builtins-1.c: Update insn count. * gcc.target/powerpc/pr99293.c: New test. --- gcc/config/rs6000/rs6000-p8swap.cc | 1 - gcc/config/rs6000/vsx.md | 19 +++---- gcc/testsuite/gcc.target/powerpc/builtins-1.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr99293.c | 51 +++++++++++++++++++ 4 files changed, 62 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc index 275702fee1b..3160fcbdeca 100644 --- a/gcc/config/rs6000/rs6000-p8swap.cc +++ b/gcc/config/rs6000/rs6000-p8swap.cc @@ -807,7 +807,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special) case UNSPEC_VUPKLU_V4SF: return 0; case UNSPEC_VSPLT_DIRECT: - case UNSPEC_VSX_XXSPLTD: *special = SH_SPLAT; return 1; case UNSPEC_REDUC_PLUS: diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 1b75538f42f..a1a1ce95195 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -296,7 +296,6 @@ (define_c_enum "unspec" UNSPEC_VSX_XXPERM UNSPEC_VSX_XXSPLTW - UNSPEC_VSX_XXSPLTD UNSPEC_VSX_DIVSD UNSPEC_VSX_DIVUD UNSPEC_VSX_DIVSQ @@ -4673,16 +4672,18 @@ (define_insn "vsx_vsplt_di" ;; V2DF/V2DI splat for use by vec_splat builtin (define_insn "vsx_xxspltd_" [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") - (unspec:VSX_D [(match_operand:VSX_D 1 "vsx_register_operand" "wa") - (match_operand:QI 2 "u5bit_cint_operand" "i")] - UNSPEC_VSX_XXSPLTD))] + (vec_duplicate:VSX_D + (vec_select: + (match_operand:VSX_D 1 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 2 "const_0_to_1_operand" "i")]))))] "VECTOR_MEM_VSX_P (mode)" { - if ((BYTES_BIG_ENDIAN && INTVAL (operands[2]) == 0) - || (!BYTES_BIG_ENDIAN && INTVAL (operands[2]) == 1)) - return "xxpermdi %x0,%x1,%x1,0"; - else - return "xxpermdi %x0,%x1,%x1,3"; + HOST_WIDE_INT dword = INTVAL (operands[2]); + if (!BYTES_BIG_ENDIAN) + dword = !dword; + + operands[3] = GEN_INT (3*dword); + return "xxpermdi %x0,%x1,%x1,%3"; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.c b/gcc/testsuite/gcc.target/powerpc/builtins-1.c index 28cd1aa6b1a..98783668bce 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c @@ -1035,4 +1035,4 @@ foo156 (vector unsigned short usa) /* { dg-final { scan-assembler-times {\mvmrglb\M} 3 } } */ /* { dg-final { scan-assembler-times {\mvmrgew\M} 4 } } */ /* { dg-final { scan-assembler-times {\mvsplth|xxsplth\M} 4 } } */ -/* { dg-final { scan-assembler-times {\mxxpermdi\M} 44 } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 42 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c b/gcc/testsuite/gcc.target/powerpc/pr99293.c new file mode 100644 index 00000000000..7aaa95d06ad --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +/* Test for PR 99263, which wants to do: + __builtin_vec_splats (__builtin_vec_extract (v, n)) + + where v is a V2DF or V2DI vector and n is either 0 or 1. + + Previously for: + __builtin_vec_splats (__builtin_vec_extract (v, 0)); + + GCC would generate the following code on power8: + + xxpermdi 34,34,34,3 + xxpermdi 34,34,34,0 + + and the following code on power9 and power10: + + mfvsrld 9,34 + mtvsrdd 34,9,9 + + Now it generates the following code on power8, power9, and power10: + + xxpermdi 34,34,34,3. */ + +vector long long +splat_dup_ll_0 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); +} + +vector long long +splat_dup_ll_1 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 1)); +} + +vector double +splat_dup_d_0 (vector double v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); +} + +vector double +splat_dup_d_1 (vector double v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 1)); +} + +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 4 } } */