From patchwork Mon Apr 22 02:35:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 88824 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 66DDF384AB4E for ; Mon, 22 Apr 2024 02:35:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 490B3384AB49; Mon, 22 Apr 2024 02:35:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 490B3384AB49 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 490B3384AB49 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713753318; cv=none; b=Hs21BFcuxxEzE6Z0U//wY5ZPK7evrapfvkXW6C7HxNYU1MTv/yLomuEpybH7AM4Z4fhGal5jPT8LlcOuYpcVL/ELwF3Os5Mq1zWj0wPCmd2CRDPL9Kn1BdIGQH0k06f100ZrbjDkcfTrt9gcpxbcowP/MJ/616k39VRfQ8jz07M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713753318; c=relaxed/simple; bh=kZeN+DEglHD9T15D2hWK5F1Ecboa0ba7RmIG2qPhgc0=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=oOas0yZWAZTh+qqMuhUPqn0Dlhc8xIH+q08j/riZ7jHO9BSDuXCM99kQ4mtfbQ8w8DxVrMO8vVHhepxQ+ua9g+VvYEJxrFhS6dIaSVkkuDr7MCiEY1be5Nwdz6fqv959xlZQjVpjG7shLbi4/gHp7CSTWdDbIiaZjBn05AZl86A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 43M2CfnZ007435; Mon, 22 Apr 2024 02:35:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=rIARMFfdMnB6q4Chtgj1DcsuoK50ytj8+/LboJUijxY=; b=ofFQfyDu8XImoICcV5FF31Eweb7tN8mmGdrTZsoUjXSriBIS/sXCw+BfWecM6qBa2qKt 8W/r1ysq3e2ALpeuF9K3q7cAb1eFs6TNWYp2+uwvG5okxmw/kUX7j6ml4SPq3uyUngFW ZoapGTs7BUuTqfHFgRAlOUj9fniVNzS3mQgad4tc9rUkCJ5MHJqqj10WAyFP0x/jQuY3 dB1hDPQJl9s523D2HMV638N1KNs5NwYXC0nGyqa24o81hxQonp1rXwPXDzDKr8Tnw/+h BcOSphM5F8SxS08pAMdAakYARv9s30UeIdQFRs+la7wXBKScqb1EHTSV4jJFn9YPRgKG bw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xneemg2en-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Apr 2024 02:35:12 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 43M2W5Ql002643; Mon, 22 Apr 2024 02:35:11 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xneemg2ej-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Apr 2024 02:35:11 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 43M20xlo020886; Mon, 22 Apr 2024 02:35:11 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3xmrdyndg3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Apr 2024 02:35:11 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 43M2Z5nv42664326 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Apr 2024 02:35:07 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 53E1B20049; Mon, 22 Apr 2024 02:35:05 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 40F5120040; Mon, 22 Apr 2024 02:35:04 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Apr 2024 02:35:04 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, bergner@linux.ibm.com, guojiufu@linux.ibm.com Subject: [PATCH] add rlwinm pattern for DImode for constant building Date: Mon, 22 Apr 2024 10:35:03 +0800 Message-Id: <20240422023503.179552-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: gkypc4DBe9DGwqgdeG1UST5ibd3paNYx X-Proofpoint-ORIG-GUID: h1YRLUte3rjzZnaeSZuDypdgUs1W5_QC X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-21_22,2024-04-19_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 suspectscore=0 spamscore=0 bulkscore=0 lowpriorityscore=0 phishscore=0 impostorscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2404010000 definitions=main-2404220011 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Hi, 'rlwinm' pattern is already well used for SImode. As this instruction can touch the whole 64bit register, so some constants in 64bit(DImode) can be built via 'lis/li+rlwinm'. To achieve this, a new pattern for 'rlwinm' is added, and 'rs6000_emit_set_long_const' is updated to check if a constant is able to be built by 'lis/li; rlwinm'. Bootstrap and regtest pass on ppc64{,le}. Is this patch ok for trunk (when stage1 is open)? Jeff (Jiufu Guo). gcc/ChangeLog: * config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): Add new parameter. * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rlwinm): New function. (rs6000_emit_set_long_const): Generate 'lis/li+rlwinm'. (can_be_rotated_to_lowbits): Add new parameter. * config/rs6000/rs6000.md (rlwinm_di_mask): New pattern. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr93012.c: Update to match 'rlwinm'. * gcc.target/powerpc/rlwinm4di-1.c: New test. * gcc.target/powerpc/rlwinm4di-2.c: New test. * gcc.target/powerpc/rlwinm4di.c: New test. * gcc.target/powerpc/rlwinm4di.h: New test. --- gcc/config/rs6000/rs6000-protos.h | 2 +- gcc/config/rs6000/rs6000.cc | 65 ++++++++++++++++++- gcc/config/rs6000/rs6000.md | 18 +++++ gcc/testsuite/gcc.target/powerpc/pr93012.c | 2 +- .../gcc.target/powerpc/rlwinm4di-1.c | 25 +++++++ .../gcc.target/powerpc/rlwinm4di-2.c | 19 ++++++ gcc/testsuite/gcc.target/powerpc/rlwinm4di.c | 6 ++ gcc/testsuite/gcc.target/powerpc/rlwinm4di.h | 25 +++++++ 8 files changed, 158 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di.h diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 09a57a806fa..10505a8061a 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -36,7 +36,7 @@ extern bool vspltisw_vupkhsw_constant_p (rtx, machine_mode, int * = nullptr); extern int vspltis_shifted (rtx); extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); extern bool macho_lo_sum_memory_operand (rtx, machine_mode); -extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *); +extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *, bool = false); extern bool can_be_rotated_to_positive_16bits (HOST_WIDE_INT); extern bool can_be_rotated_to_negative_15bits (HOST_WIDE_INT); extern int num_insns_constant (rtx, machine_mode); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 6ba9df4f02e..853eaede673 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -10454,6 +10454,51 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask) return false; } +/* Check if value C can be generated by 2 instructions, one instruction + is li/lis, another instruction is rlwinm. */ + +static bool +can_be_built_by_li_lis_and_rlwinm (HOST_WIDE_INT c, HOST_WIDE_INT *val, + int *shift, HOST_WIDE_INT *mask) +{ + unsigned HOST_WIDE_INT low = c & 0xFFFFFFFFULL; + unsigned HOST_WIDE_INT high = (c >> 32) & 0xFFFFFFFFULL; + unsigned HOST_WIDE_INT v; + + /* diff of high and low (high ^ low) should be the mask position. */ + unsigned HOST_WIDE_INT m = low ^ high; + int tz = ctz_hwi (m); + int lz = clz_hwi (m); + if (m != 0) + m = ((HOST_WIDE_INT_M1U >> (lz + tz)) << tz); + if (high != 0) + m = ~m; + v = high != 0 ? high : ((low | ~m) & 0xFFFFFFFF); + + if ((high != 0) && ((v & m) != low || lz < 33 || tz < 1)) + return false; + + /* rotl32 on positive/negative value of 'li' 15/16bits. */ + int n; + if (!can_be_rotated_to_lowbits (v, 15, &n, true) + && !can_be_rotated_to_lowbits ((~v) & 0xFFFFFFFFULL, 15, &n, true)) + { + /* rotate32 from a negative value of 'lis'. */ + if (!can_be_rotated_to_lowbits (v & 0xFFFFFFFFULL, 16, &n, true)) + return false; + n += 16; + } + n = 32 - (n % 32); + n %= 32; + v = ((v >> n) | (v << (32 - n))) & 0xFFFFFFFF; + if (v & 0x80000000ULL) + v |= HOST_WIDE_INT_M1U << 32; + *mask = m; + *val = v; + *shift = n; + return true; +} + /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode. Output insns to set DEST equal to the constant C as a series of lis, ori and shl instructions. If NUM_INSNS is not NULL, then @@ -10553,6 +10598,18 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c, int *num_insns) return; } + HOST_WIDE_INT val; + if (can_be_built_by_li_lis_and_rlwinm (c, &val, &shift, &mask)) + { + /* li/lis; rlwinm */ + count_or_emit_insn (temp, GEN_INT (val)); + rtx low = temp ? gen_lowpart (SImode, temp) : nullptr; + rtx m = GEN_INT (mask); + rtx n = GEN_INT (shift); + count_or_emit_insn (gen_rlwinm_di_mask (dest, low, n, m)); + return; + } + if (ud3 == 0 && ud4 == 0) { gcc_assert ((ud2 & 0x8000) && ud1 != 0); @@ -15220,7 +15277,8 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code) Return false otherwise. */ bool -can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT c, int lowbits, int *rot) +can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT c, int lowbits, int *rot, + bool rotl32) { int clz = HOST_BITS_PER_WIDE_INT - lowbits; @@ -15244,7 +15302,10 @@ can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT c, int lowbits, int *rot) ^bit -> Vbit, , then zeros are at head or tail. 00...00xxx100, 'clz - 1' >= 'bits of xxxx'. */ const int rot_bits = lowbits + 1; - unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1)); + unsigned HOST_WIDE_INT rc; + rc = rotl32 ? ((((c & 0xFFFFFFFFULL) >> rot_bits) + | ((c << (32 - rot_bits)) & 0xFFFFFFFFULL))) + : (c >> rot_bits) | (c << (clz - 1)); tz = ctz_hwi (rc); if (clz_hwi (rc) + tz >= clz) { diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index bc8bc6ab060..8a82ba3e26c 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4213,6 +4213,24 @@ (define_insn_and_split "*rotl3_mask_dot2" (set_attr "dot" "yes") (set_attr "length" "4,8")]) +; define an insn about rlwinm for DI mode (with high part content) +(define_insn "rlwinm_di_mask" + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") + (and:DI (plus:DI + (ashift:DI (subreg:DI + (rotate:SI (match_operand:SI 1 "gpc_reg_operand" "r") + (match_operand:SI 2 "const_int_operand" "n")) 0) + (const_int 32)) + (zero_extend:DI (rotate:SI (match_dup 1) (match_dup 2)))) + (match_operand:DI 3 "const_int_operand" "n")))] + "rs6000_is_valid_and_mask (operands[3], SImode)" +{ + return UINTVAL (operands[3]) == -1ULL ? + "rlwinm %0,%1,%h2,1,0" : "rlwinm %0,%1,%h2,%3"; +} + [(set_attr "type" "shift") + (set_attr "maybe_var_shift" "yes")]) + ; Special case for less-than-0. We can do it with just one machine ; instruction, but the generic optimizers do not realise it is cheap. (define_insn "*lt0_di" diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c b/gcc/testsuite/gcc.target/powerpc/pr93012.c index 4f764d0576f..70ddfaa21da 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0xffff9234ffff9234ULL; } unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; } unsigned long long mskse() { return 0xffff1234ffff1234ULL; } -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */ +/* { dg-final { scan-assembler-times {\mrlwinm\M|\mrldimi\M} 7 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm4di-1.c b/gcc/testsuite/gcc.target/powerpc/rlwinm4di-1.c new file mode 100644 index 00000000000..8959578143b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm4di-1.c @@ -0,0 +1,25 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include "rlwinm4di.h" + +long long arr1[] = { + 0x0000400100000001ULL, 0x0000000200000002ULL, 0xffff8000bfff8000ULL, + 0xffff8001ffff8001ULL, 0x0000800100000001ULL, 0x0000800100008001ULL, + 0x0000800200000002ULL, 0x0000800000008000ULL, 0x0000000080008000ULL, + 0xffff0001bfff0001ULL, 0xffff0001ffff0001ULL, 0x0001000200000002ULL, + 0x8001000080010000ULL, 0x0004000100000001ULL, 0x0004000100040001ULL, + 0x00000000bfffe001ULL, 0x0003fffe0001fffeULL, 0x0003fffe0003fffeULL, + 0x0002000100000001ULL, 0x0002000100020001ULL, +}; + +int +main () +{ + long long a[sizeof (arr1) / sizeof (arr1[0])]; + + foo (a); + if (__builtin_memcmp (a, arr1, sizeof (arr1)) != 0) + __builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm4di-2.c b/gcc/testsuite/gcc.target/powerpc/rlwinm4di-2.c new file mode 100644 index 00000000000..9494d0327b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm4di-2.c @@ -0,0 +1,19 @@ +/* { dg-options "-O2 -mno-prefixed" } */ +/* { dg-do compile { target has_arch_ppc64 } } */ + +#define N 5 +#define MASK 0xffffffffe0000003ULL + +typedef unsigned long long int64; + +int64 +foo (int64 v) +{ + unsigned int v1 = v; + unsigned int v2 = ((v1 << N) | (v1 >> (32 - N))); + return ((int64) v2 | ((int64) v2 << 32)) & MASK; +} + +/* { dg-final { scan-assembler-not {\mor\M} } } */ +/* { dg-final { scan-assembler-not {\mrldicl\M} } } */ +/* { dg-final { scan-assembler-times {\mrlwinm\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm4di.c b/gcc/testsuite/gcc.target/powerpc/rlwinm4di.c new file mode 100644 index 00000000000..fcbc8f8d742 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm4di.c @@ -0,0 +1,6 @@ +/* { dg-options "-O2 -mno-prefixed" } */ +/* { dg-do compile { target has_arch_ppc64 } } */ +#include "rlwinm4di.h" + +/* { dg-final { scan-assembler-times {\mrlwinm\M} 20 } } */ + diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm4di.h b/gcc/testsuite/gcc.target/powerpc/rlwinm4di.h new file mode 100644 index 00000000000..59fe739ca85 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm4di.h @@ -0,0 +1,25 @@ +/* using 2 instructions(rlwinm) to build constants. */ +void __attribute__ ((__noinline__, __noclone__)) +foo (long long *arg) +{ + *arg++ = 0x0000400100000001ULL; + *arg++ = 0x0000000200000002ULL; + *arg++ = 0xffff8000bfff8000ULL; + *arg++ = 0xffff8001ffff8001ULL; + *arg++ = 0x0000800100000001ULL; + *arg++ = 0x0000800100008001ULL; + *arg++ = 0x0000800200000002ULL; + *arg++ = 0x0000800000008000ULL; + *arg++ = 0x0000000080008000ULL; + *arg++ = 0xffff0001bfff0001ULL; + *arg++ = 0xffff0001ffff0001ULL; + *arg++ = 0x0001000200000002ULL; + *arg++ = 0x8001000080010000ULL; + *arg++ = 0x0004000100000001ULL; + *arg++ = 0x0004000100040001ULL; + *arg++ = 0x00000000bfffe001ULL; + *arg++ = 0x0003fffe0001fffeULL; + *arg++ = 0x0003fffe0003fffeULL; + *arg++ = 0x0002000100000001ULL; + *arg++ = 0x0002000100020001ULL; +}