From patchwork Mon Mar 28 16:26:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 52418 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5C8D73858C74 for ; Mon, 28 Mar 2022 16:26:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5C8D73858C74 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1648484798; bh=codQLX+/C4E2L8T8N3Wl1Ne/PcXDOEDVeESIsNA0USs=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=dOx36SLggXcPZ2G/NtVhAZXGjJpw8EpRzk+nfehveXw+Ndgdyd18eGfEW76BNzCWv zm9hjpIp7PVnWoLKa1GnVIhhEy6NehCx6Use00i3/chGF1ScryEu+a5KTGmiLKlSUL GE08RA0eFyFURbjHRTBPLStdaipGvBukPrzk34S4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 994063858C51 for ; Mon, 28 Mar 2022 16:26:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 994063858C51 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 22SG1Kle002927; Mon, 28 Mar 2022 16:26:07 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3f3g638hs5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Mar 2022 16:26:06 +0000 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 22SGHuUd011830; Mon, 28 Mar 2022 16:26:06 GMT Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com with ESMTP id 3f3g638hrw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Mar 2022 16:26:06 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 22SGHOAY022077; Mon, 28 Mar 2022 16:26:05 GMT Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by ppma04dal.us.ibm.com with ESMTP id 3f1tf9j1kp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Mar 2022 16:26:05 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 22SGQ4HC2884600 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Mar 2022 16:26:04 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4E3FFAE067; Mon, 28 Mar 2022 16:26:04 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0F185AE05C; Mon, 28 Mar 2022 16:26:04 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.65.244.27]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTPS; Mon, 28 Mar 2022 16:26:03 +0000 (GMT) Date: Mon, 28 Mar 2022 12:26:02 -0400 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner , Will Schmidt References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: tTqHp3lkvh76-L2AAok8Q8w2n_b5FtAF X-Proofpoint-GUID: AkQXpiLSPgZyhjgncIrxmxX1WuVWj3Ow X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.850,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-03-28_07,2022-03-28_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 suspectscore=0 malwarescore=0 bulkscore=0 priorityscore=1501 phishscore=0 mlxlogscore=999 adultscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2203280090 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293. In PR target/99293, it was pointed out that doing: vector long long dest0, dest1, src; /* ... */ dest0 = vec_splats (vec_extract (src, 0)); dest1 = vec_splats (vec_extract (src, 1)); would generate slower code. It generates the following code on power8: ;; vec_splats (vec_extract (src, 0)) xxpermdi 0,34,34,3 xxpermdi 34,0,0,0 ;; vec_splats (vec_extract (src, 1)) xxlor 0,34,34 xxpermdi 34,0,0,0 However on power9 and power10 it generates: ;; vec_splats (vec_extract (src, 0)) mfvsld 3,34 mtvsrdd 34,9,9 ;; vec_splats (vec_extract (src, 1)) mfvsrd 9,34 mtvsrdd 34,9,9 This is due to the power9 having the mfvsrld instruction which can extract either 64-bit element into a GPR. While there are alternatives for both vector registers and GPR registers, the register allocator prefers to put DImode into GPR registers. However in this case, it is better to have a single combiner pattern that can generate a single xxpermdi, instead of doing 2 insnsns (the extract and then the concat). This is particularly true if the two operations are move from vector register and move to vector register. I have built Spec 2017 with this patch installed, and the cam4_r benchmark is the only benchmark that generated different code. On a power9, I did not notice any significant changes in the runtime of cam4_r. I have built bootstrap versions on the following systems. There were no regressions in the runs: Power9 little endian, --with-cpu=power9 Power10 little endian, --with-cpu=power10 Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests) Can I install this into the trunk? After a burn-in period, can I backport and install this into GCC 11 and GCC 10 branches? 2022-03-28 Michael Meissner gcc/ PR target/99293 * config/rs6000/vsx.md (vsx_splat_const_extract_): New combiner insn. gcc/testsuite: PR target/99293 * gcc.target/powerpc/pr99293.c: New test. --- gcc/config/rs6000/vsx.md | 19 ++++++++++++ gcc/testsuite/gcc.target/powerpc/pr99293.c | 36 ++++++++++++++++++++++ 2 files changed, 55 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 15bd86dfdfb..ddafbe471dd 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4616,6 +4616,25 @@ (define_insn "vsx_splat_v4si_di" [(set_attr "type" "vecperm") (set_attr "isa" "p8v,*")]) +;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element +;; PR target/99293 +(define_insn "*vsx_splat_const_extract_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_duplicate:VSX_D + (vec_select: + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(match_operand 2 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (mode)" +{ + int which_word = INTVAL (operands[2]); + if (!BYTES_BIG_ENDIAN) + which_word = 1 - which_word; + + operands[3] = GEN_INT (which_word ? 3 : 0); + return "xxpermdi %x0,%x1,%x1,%3"; +} + [(set_attr "type" "vecperm")]) + ;; V4SF splat (ISA 3.0) (define_insn_and_split "vsx_splat_v4sf" [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,wa") diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c b/gcc/testsuite/gcc.target/powerpc/pr99293.c new file mode 100644 index 00000000000..13b5ed5b0b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c @@ -0,0 +1,36 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +/* Test for PR 99263, which wants to do: + __builtin_vec_splats (__builtin_vec_extract (v, n)) + + where v is a V2DF or V2DI vector and n is either 0 or 1. Previously the + compiler would do a direct move to the GPR registers to select the item and + a direct move from the GPR registers to do the splat. */ + +vector long long +splat_dup_ll_0 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); +} + +vector long long +splat_dup_ll_1 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 1)); +} + +vector double +splat_dup_d_0 (vector double v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); +} + +vector double +splat_dup_d_1 (vector double v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 1)); +} + +/* { dg-final { scan-assembler-times "xxpermdi" 4 } } */