From patchwork Sat Jan 8 12:21:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 49752 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C660F385AC28 for ; Sat, 8 Jan 2022 12:22:10 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 19C943858401 for ; Sat, 8 Jan 2022 12:21:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 19C943858401 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=I4hP0fUqLdNPgeFgApqN1Ig27ls9sAJUIaA1KyprRFQ=; b=XK5VZT/wBljhIHDY9YBWUuuMnr tQOUH5eTitkfeM2f9V/lQRtOxdWeYloD6Rzia8+jr/4DWMVkk0lEH3nApoH9+jifhxMCyvxZRArks nw4zApoOOyj6N96TVBuKuq04rZit0uFSYWeW0nbm5NdPLZDLFMcw4ju9smmeOPiEjLF8OhtBrUpoA IZAN0Qg45mlAq1YWi+D7ase/90Jr/JOXhdVHHIfZ9gb6zRnV1QG7vxlIX2i+bPFiBCFU+3pJrVIFg RAd0kUDd/wEKR96AUQ2hvcGZUXwpKvQMDAPhL1JCKwDyL4y4UHegvLMDbeIDfNwgSoPeQ+xcpwjnY rywjdMGg==; Received: from host86-160-23-130.range86-160.btcentralplus.com ([86.160.23.130]:49696 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1n6AjI-0006si-Bl; Sat, 08 Jan 2022 07:21:52 -0500 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] nvptx: Improved support for HFMode including neghf2 and abshf2. Date: Sat, 8 Jan 2022 12:21:49 -0000 Message-ID: <02c201d8048a$50c6bf70$f2543e50$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdgEiZOaHaA82qITR1aclxO80+erew== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds more support for _Float16 (HFmode) to the nvptx backend. Currently negation, absolute value and floating point comparisons are implemented by promoting to float (SFmode). This patch adds suitable define_insns to nvptx.md, most conditional on TARGET_SM53 (-misa=sm_53). This patch also adds support for HFmode fused multiply-add. One subtlety is that neghf2 and abshf2 are implemented by (HImode) bit manipulation operations to update the sign bit. The NVidia PTX ISA documentation for neg.f16 and abs.f16 contains the caution "Future implementations may comply with the IEEE 754 standard by preserving the (NaN) payload and modifying only the sign bit". Given the availability of suitable replacements, I thought it best to provide IEEE 754 compliant implementations. If anyone observes a performance penalty from this choice I'm happy to provide a -ffast-math variant (or revisit this decision). This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. Ok for mainline? 2022-01-08 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.md (*cmpf): New define_insn. (cstorehf4): New define_expand. (fmahf4): New define_insn. (neghf2): New define_insn. (abshf2): New define_insn. gcc/testsuite/ChangeLog * gcc.target/nvptx/float16-3.c: New test case for neghf2. * gcc.target/nvptx/float16-4.c: New test case for abshf2. * gcc.target/nvptx/float16-5.c: New test case for fmahf4. * gcc.target/nvptx/float16-6.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index ce74672..a6046d7 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -779,6 +779,14 @@ "" "%.\\tsetp%c1\\t%0, %2, %3;") +(define_insn "*cmphf" + [(set (match_operand:BI 0 "nvptx_register_operand" "=R") + (match_operator:BI 1 "nvptx_float_comparison_operator" + [(match_operand:HF 2 "nvptx_register_operand" "R") + (match_operand:HF 3 "nvptx_nonmemory_operand" "RF")]))] + "TARGET_SM53" + "%.\\tsetp%c1\\t%0, %2, %3;") + (define_insn "jump" [(set (pc) (label_ref (match_operand 0 "" "")))] @@ -969,6 +977,21 @@ DONE; }) +(define_expand "cstorehf4" + [(set (match_operand:SI 0 "nvptx_register_operand") + (match_operator:SI 1 "nvptx_float_comparison_operator" + [(match_operand:HF 2 "nvptx_register_operand") + (match_operand:HF 3 "nvptx_nonmemory_operand")]))] + "TARGET_SM53" +{ + rtx reg = gen_reg_rtx (BImode); + rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), BImode, + operands[2], operands[3]); + emit_move_insn (reg, cmp); + emit_insn (gen_setccsi_from_bi (operands[0], reg)); + DONE; +}) + ;; Calls (define_insn "call_insn_" @@ -1156,6 +1179,26 @@ "TARGET_SM53" "%.\\tmul.f16\\t%0, %1, %2;") +(define_insn "fmahf4" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (fma:HF (match_operand:HF 1 "nvptx_register_operand" "R") + (match_operand:HF 2 "nvptx_nonmemory_operand" "RF") + (match_operand:HF 3 "nvptx_nonmemory_operand" "RF")))] + "TARGET_SM53" + "%.\\tfma%#.f16\\t%0, %1, %2, %3;") + +(define_insn "neghf2" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (neg:HF (match_operand:HF 1 "nvptx_register_operand" "R")))] + "" + "%.\\txor.b16\\t%0, %1, -32768;") + +(define_insn "abshf2" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (abs:HF (match_operand:HF 1 "nvptx_register_operand" "R")))] + "" + "%.\\tand.b16\\t%0, %1, 32767;") + (define_insn "exp2hf2" [(set (match_operand:HF 0 "nvptx_register_operand" "=R") (unspec:HF [(match_operand:HF 1 "nvptx_register_operand" "R")] diff --git a/gcc/testsuite/gcc.target/nvptx/float16-3.c b/gcc/testsuite/gcc.target/nvptx/float16-3.c new file mode 100644 index 0000000..914282a --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/float16-3.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */ + +_Float16 var; + +void neg() +{ + var = -var; +} + +/* { dg-final { scan-assembler "xor.b16" } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/float16-4.c b/gcc/testsuite/gcc.target/nvptx/float16-4.c new file mode 100644 index 0000000..b11f17a --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/float16-4.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */ + +_Float16 var; + +void foo() +{ + var = (var < (_Float16)0.0) ? -var : var; +} + +/* { dg-final { scan-assembler "and.b16" } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/float16-5.c b/gcc/testsuite/gcc.target/nvptx/float16-5.c new file mode 100644 index 0000000..5fe15ec --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/float16-5.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */ + +_Float16 a; +_Float16 b; +_Float16 c; +_Float16 d; + +void foo() +{ + a = (_Float16)(b*c) + d; +} + +/* { dg-final { scan-assembler "fma.rn.f16" } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/float16-6.c b/gcc/testsuite/gcc.target/nvptx/float16-6.c new file mode 100644 index 0000000..8fe4fa3 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/float16-6.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */ + +_Float16 x; +_Float16 y; + +_Bool eq() +{ + return x == y; +} + +_Bool ne() +{ + return x != y; +} + +_Bool lt() +{ + return x < y; +} + +_Bool le() +{ + return x <= y; +} + +_Bool gt() +{ + return x < y; +} + +_Bool ge() +{ + return x >= y; +} + +/* { dg-final { scan-assembler-times "setp\.\[a-z\]*\.f16" 6 } } */ +/* { dg-final { scan-assembler-not "cvt.f32.f16" } } */