From patchwork Fri Jan 14 09:54:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 50014 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3995B3857809 for ; Fri, 14 Jan 2022 09:55:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id CB6433858C39 for ; Fri, 14 Jan 2022 09:54:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CB6433858C39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=cyhCTvyXw66PHoYJCgjbQp8/QKxpIhvcXZT5UM0shBw=; b=Mi7ft6W61YT7Z4bWx9h519HBaR l5hknVJf2RjZeyfEBA7coEajSk94AWe/RnpeCVl8c06TERPvGtsvIzQiYrBHQD33o7oMeaVi/wqjt TzFixPYsvJ/KBYeoG9N0cgRjP5cFNX8XhcnFMIAX8ZcByqJ2qsKfSYO/dTZwA67x5GpHiMwZn4mO3 QBwbdwX0RCiseSU7E0vQh4gByp81qT2o9joQ4km2RKNPFETN5tfAj4CZFST0YkQ5Q26+Q2wUNpG5Y HNIM2z7QgBEHCiNF2g34HzvWoesl3GBZ+DThcfd8nBS/xJfci2bZhK3aF8K97qNkKwA6w/Ertu7Y1 D9YQMBnQ==; Received: from host86-160-23-130.range86-160.btcentralplus.com ([86.160.23.130]:55002 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1n8JIC-0001Rk-9a; Fri, 14 Jan 2022 04:54:44 -0500 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] nvptx: Add support for 64-bit mul.hi (and other) instructions. Date: Fri, 14 Jan 2022 09:54:40 -0000 Message-ID: <040001d8092c$c090b670$41b22350$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdgJK6LYfYpe/e2IT0m+kr0KOMHW6A== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Now that the middle-end MULT_HIGHPART_EXPR pieces are in place, this patch adds support for nvptx's mul.hi.s64 and mul.hi.u64 instructions, as previously reviewed (provisionally pre-approved) back in August 2020: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551373.html Since then a few things have changed, so this patch uses the new SMUL_HIGHPART and UMUL_HIGHPART RTX expressions, but the test cases remain the same. Like the x86_64 backend, this patch retains the "trunc" forms of these instructions (while the RTL optimizers/combine may still generate them). Given that we're rapidly approaching stage 4, I also took the liberty of including support in nvptx.md for a few other instructions. With the new 64-bit highpart multiplication instructions added above, we can now provide a define_expand for efficient 64-bit (to 128-bit) widening multiplications. This patch also adds support for nvptx's testp.infinite instruction (for implementing __builtin_isinf) and the not.pred instruction. As an example of the code generation improvements, the function int foo(double x) { return __builtin_isinf(x); } previously generated with -O2: mov.f64 %r26, %ar0; abs.f64 %r28, %r26; setp.leu.f64 %r31, %r28, 0d7fefffffffffffff; selp.u32 %r30, 1, 0, %r31; mov.u32 %r29, %r30; cvt.u16.u8 %r35, %r29; mov.u16 %r33, %r35; xor.b16 %r32, %r33, 1; cvt.u32.u16 %r34, %r32; cvt.u32.u8 %value, %r34; and with this patch now generates: mov.f64 %r23, %ar0; testp.infinite.f64 %r24, %r23; selp.u32 %value, 1, 0, %r24; This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. Ok for mainline? 2022-01-14 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.md (UNSPEC_ISINF): New UNSPEC. (one_cmplbi2): New define_insn for not.pred. (mulditi3): New define_expand for signed widening multiply. (umulditi3): New define_expand for unsigned widening multiply. (smul3_highpart): New define_insn for signed highpart mult. (umul3_highpart): New define_insn for unsigned highpart mult. (*smulhi3_highpart_2): Renamed from smulhi3_highpart. (*smulsi3_highpart_2): Renamed from smulsi3_highpart. (*umulhi3_highpart_2): Renamed from umulhi3_highpart. (*umulsi3_highpart_2): Renamed from umulsi3_highpart. (*setcc_from_not_bi): New define_insn. (*setcc_isinf): New define_insn for testp.infinite. (isinf2): New define_expand. gcc/testsuite/ChangeLog * gcc.target/nvptx/mul-hi64.c: New test case. * gcc.target/nvptx/umul-hi64.c: New test case. * gcc.target/nvptx/mul-wide64.c: New test case. * gcc.target/nvptx/umul-wide64.c: New test case. * gcc.target/nvptx/isinf.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index ce74672..a36ea59 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -27,6 +27,7 @@ UNSPEC_SIN UNSPEC_COS UNSPEC_TANH + UNSPEC_ISINF UNSPEC_FPINT_FLOOR UNSPEC_FPINT_BTRUNC @@ -592,6 +593,12 @@ "" "%.\\tnot.b%T0\\t%0, %1;") +(define_insn "one_cmplbi2" + [(set (match_operand:BI 0 "nvptx_register_operand" "=R") + (not:BI (match_operand:BI 1 "nvptx_register_operand" "R")))] + "" + "%.\\tnot.pred\\t%0, %1;") + (define_insn "*cnot2" [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") (eq:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") @@ -667,7 +674,57 @@ "" "%.\\tmul.wide.u32\\t%0, %1, %2;") -(define_insn "smulhi3_highpart" +(define_expand "mulditi3" + [(set (match_operand:TI 0 "nvptx_register_operand") + (mult:TI (sign_extend:TI + (match_operand:DI 1 "nvptx_register_operand")) + (sign_extend:DI + (match_operand:DI 2 "nvptx_nonmemory_operand"))))] + "" +{ + rtx hi = gen_reg_rtx (DImode); + rtx lo = gen_reg_rtx (DImode); + emit_insn (gen_smuldi3_highpart (hi, operands[1], operands[2])); + emit_insn (gen_muldi3 (lo, operands[1], operands[2])); + emit_move_insn (gen_highpart (DImode, operands[0]), hi); + emit_move_insn (gen_lowpart (DImode, operands[0]), lo); + DONE; +}) + +(define_expand "umulditi3" + [(set (match_operand:TI 0 "nvptx_register_operand") + (mult:TI (zero_extend:TI + (match_operand:DI 1 "nvptx_register_operand")) + (zero_extend:DI + (match_operand:DI 2 "nvptx_nonmemory_operand"))))] + "" +{ + rtx hi = gen_reg_rtx (DImode); + rtx lo = gen_reg_rtx (DImode); + emit_insn (gen_umuldi3_highpart (hi, operands[1], operands[2])); + emit_insn (gen_muldi3 (lo, operands[1], operands[2])); + emit_move_insn (gen_highpart (DImode, operands[0]), hi); + emit_move_insn (gen_lowpart (DImode, operands[0]), lo); + DONE; +}) + +(define_insn "smul3_highpart" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (smul_highpart:HSDIM + (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tmul.hi.s%T0\\t%0, %1, %2;") + +(define_insn "umul3_highpart" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (umul_highpart:HSDIM + (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tmul.hi.u%T0\\t%0, %1, %2;") + +(define_insn "*smulhi3_highpart_2" [(set (match_operand:HI 0 "nvptx_register_operand" "=R") (truncate:HI (lshiftrt:SI @@ -679,7 +736,7 @@ "" "%.\\tmul.hi.s16\\t%0, %1, %2;") -(define_insn "smulsi3_highpart" +(define_insn "*smulsi3_highpart_2" [(set (match_operand:SI 0 "nvptx_register_operand" "=R") (truncate:SI (lshiftrt:DI @@ -691,7 +748,7 @@ "" "%.\\tmul.hi.s32\\t%0, %1, %2;") -(define_insn "umulhi3_highpart" +(define_insn "*umulhi3_highpart_2" [(set (match_operand:HI 0 "nvptx_register_operand" "=R") (truncate:HI (lshiftrt:SI @@ -703,7 +760,7 @@ "" "%.\\tmul.hi.u16\\t%0, %1, %2;") -(define_insn "umulsi3_highpart" +(define_insn "*umulsi3_highpart_2" [(set (match_operand:SI 0 "nvptx_register_operand" "=R") (truncate:SI (lshiftrt:DI @@ -873,6 +930,13 @@ "" "%.\\tselp%t0\\t%0, 1, 0, %1;") +(define_insn "*setcc_from_not_bi" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (eq:HSDIM (match_operand:BI 1 "nvptx_register_operand" "R") + (const_int 0)))] + "" + "%.\\tselp%t0\\t%0, 0, 1, %1;") + (define_insn "extendbi2" [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") (sign_extend:HSDIM @@ -1133,6 +1197,25 @@ "flag_unsafe_math_optimizations" "%.\\tex2.approx%t0\\t%0, %1;") +(define_insn "setcc_isinf" + [(set (match_operand:BI 0 "nvptx_register_operand" "=R") + (unspec:BI [(match_operand:SDFM 1 "nvptx_register_operand" "R")] + UNSPEC_ISINF))] + "" + "%.\\ttestp.infinite%t1\\t%0, %1;") + +(define_expand "isinf2" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec:SI [(match_operand:SDFM 1 "nvptx_register_operand" "R")] + UNSPEC_ISINF))] + "" +{ + rtx pred = gen_reg_rtx (BImode); + emit_insn (gen_setcc_isinf (pred, operands[1])); + emit_insn (gen_setccsi_from_bi (operands[0], pred)); + DONE; +}) + ;; HFmode floating point arithmetic. (define_insn "addhf3" diff --git a/gcc/testsuite/gcc.target/nvptx/isinf.c b/gcc/testsuite/gcc.target/nvptx/isinf.c new file mode 100644 index 0000000..c433612 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/isinf.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int foo(double x) +{ + return __builtin_isinf(x); +} + +/* { dg-final { scan-assembler-times "testp.infinite.f64" 1 } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/mul-hi64.c b/gcc/testsuite/gcc.target/nvptx/mul-hi64.c new file mode 100644 index 0000000..a45a0e3 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/mul-hi64.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -Wno-long-long" } */ + +typedef unsigned int __attribute ((mode(TI))) uti_t; +typedef int __attribute ((mode(TI))) ti_t; + +long test1(long x, long y) +{ + return ((ti_t)x * (ti_t)y) >> 64; +} + +long test2(long x) +{ + return ((ti_t)x * 19065) >> 64; +} + +long test3(long x, long y) +{ + return (uti_t)((ti_t)x * (ti_t)y) >> 64; +} + +long test4(long x) +{ + return (uti_t)((ti_t)x * 19065) >> 64; +} + +ti_t test5(long x, long y) +{ + return ((ti_t)x * (ti_t)y) >> 64; +} + +ti_t test6(long x) +{ + return ((ti_t)x * 19065) >> 64; +} + +uti_t test7(long x, long y) +{ + return (uti_t)((ti_t)x * (ti_t)y) >> 64; +} + +uti_t test8(long x) +{ + return (uti_t)((ti_t)x * 19065) >> 64; +} + +/* { dg-final { scan-assembler-times "mul.hi.s64" 8 } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/mul-wide64.c b/gcc/testsuite/gcc.target/nvptx/mul-wide64.c new file mode 100644 index 0000000..a273c66 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/mul-wide64.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef int __attribute ((mode(TI))) ti_t; + +ti_t foo(long x, long y) +{ + return (ti_t)x * (ti_t)y; +} + +/* { dg-final { scan-assembler-times "mul.lo.u64" 1 } } */ +/* { dg-final { scan-assembler-times "mul.hi.s64" 1 } } */ + diff --git a/gcc/testsuite/gcc.target/nvptx/umul-hi64.c b/gcc/testsuite/gcc.target/nvptx/umul-hi64.c new file mode 100644 index 0000000..6e09b74 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/umul-hi64.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -Wno-long-long" } */ + +typedef unsigned int __attribute ((mode(TI))) uti_t; +typedef int __attribute ((mode(TI))) ti_t; + +unsigned long test1(unsigned long x, unsigned long y) +{ + return ((uti_t)x * (uti_t)y) >> 64; +} + +unsigned long test2(unsigned long x) +{ + return ((uti_t)x * 19065) >> 64; +} + +unsigned long test3(unsigned long x, unsigned long y) +{ + return (ti_t)((uti_t)x * (uti_t)y) >> 64; +} + +unsigned long test4(unsigned long x) +{ + return (ti_t)((uti_t)x * 19065) >> 64; +} + +uti_t test5(unsigned long x, unsigned long y) +{ + return ((uti_t)x * (uti_t)y) >> 64; +} + +uti_t test6(unsigned long x) +{ + return ((uti_t)x * 19065) >> 64; +} + +ti_t test7(unsigned long x, unsigned long y) +{ + return (ti_t)((uti_t)x * (uti_t)y) >> 64; +} + +ti_t test8(unsigned long x) +{ + return (ti_t)((uti_t)x * 19065) >> 64; +} + +/* { dg-final { scan-assembler-times "mul.hi.u64" 8 } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/umul-wide64.c b/gcc/testsuite/gcc.target/nvptx/umul-wide64.c new file mode 100644 index 0000000..4cbdf0b --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/umul-wide64.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef unsigned int __attribute ((mode(TI))) uti_t; + +uti_t foo(unsigned long x, unsigned long y) +{ + return (uti_t)x * (uti_t)y; +} + +/* { dg-final { scan-assembler-times "mul.lo.u64" 1 } } */ +/* { dg-final { scan-assembler-times "mul.hi.u64" 1 } } */ +