From patchwork Fri Sep 17 15:41:25 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Roger Sayle <roger@nextmovesoftware.com>
X-Patchwork-Id: 45134
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 04E77385AC19
	for <patchwork@sourceware.org>; Fri, 17 Sep 2021 15:43:01 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id 78AAC385AC38
 for <gcc-patches@gcc.gnu.org>; Fri, 17 Sep 2021 15:41:27 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 78AAC385AC38
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=DDO1fW7XPWhzP0IWiJ6vojWBZq5NZmI2qV1Jr6isYls=; b=g/SdwN/7FPhfn8wuXf/2Oqk8g+
 0/maZ20ezFtB2lZC2JHISQFQuEpCXte1XiIwCDz6ojlNyNthtWeTI85Y50aT1mBy8hxSS1i59wGq2
 mNQ6r4XErIVZ+ReRLMPvfCU8s8skcgqqslP4bOCN5psTkSKgSHZfpl2PIG8HWPKVbCFfYD8VT5LhX
 sCP+HNpOKBDAqplohl5iIl5TMmWKypPWFFCwo87W3Tp2JZwxNy7zRlNaOruKQnt3IuJx8GPxLZO7h
 +jEFCAyDZCeo8CwHQt9s6Ak2uR3bQzrq2R68vN0opVgw0GeRPPGbr0YnsSpbWqQjZl/Ly1uw0dw95
 bBqvXFYw==;
Received: from [185.62.158.67] (port=52103 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <roger@nextmovesoftware.com>)
 id 1mRFzS-0008VK-Qg; Fri, 17 Sep 2021 11:41:27 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: [PATCH] nvptx: Adds uses of -misa=sm_75 and -misa=sm_80
Date: Fri, 17 Sep 2021 16:41:25 +0100
Message-ID: <001a01d7abda$79e43c60$6dacb520$@nextmovesoftware.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: Ader2U1M3oLGuge+SfiiDtkzQfAe6g==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com:
 roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT,
 KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: 'Tobias Burnus' <tobias@codesourcery.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This patch adds upon my previous patch to prototype HFmode support on
nvptx, which includes adding new target macros TARGET_SM75 and TARGET_SM80.
Tobias Burnus has questioned "whether it makes sense to add those
flags if no use is made of those flags".  I had hoped that it might
be possible to split these patch submissions into smaller parts to
assist the review process, but failing that, here's part 2, that
adds support for __builtin_tanhf, HFmode exp2/tanh and also
for HFmode min/max, controlled by TARGET_SM75 and TARGET_SM80 respectively.

The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu
(on top of my previous patch) with a "make" and "make -k check" with no
new failures.  Please ignore the hunks in the git diff that were described
in the previous patch (hopefully I'll be able to resume submitting
patches sequentially in future).  Are both parts Ok for mainline?


2020-09-17  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/nvptx/nvptx.md (define_c_enum "unspec"): New UNSPEC_TANH.
	(define_mode_iterator HSFM): New iterator for HFmode and SFmode.
	(exp2hf2): New define_insn controlled by TARGET_SM75.
	(tanh<mode>2): New define_insn controlled by TARGET_SM75.
	(sminhf3, smaxhf3): New define_isnns controlled by TARGET_SM80.

gcc/testsuite/ChangeLog
	* gcc.target/nvptx/float16-2.c: New test case.
	* gcc.target/nvptx/tanh-1.c: New test case.

Roger
---


-----Original Message-----
From: Tobias Burnus <tobias@codesourcery.com> 
Sent: 17 September 2021 09:25
To: Roger Sayle <roger@nextmovesoftware.com>; 'GCC Patches'
<gcc-patches@gcc.gnu.org>; Tom de Vries <tdevries@suse.de>
Subject: Re: [PATCH] nvptx: Add (experimental) support for HFmode with
-misa=sm_53

Hi Roger,

some more generic remarks not specific to using new ISA features.

On 17.09.21 00:53, Roger Sayle wrote:

> Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points 
> where other useful instructions were added to the ISA.

First, my impression was that already sm_70 added lots of useful stuff, but
granted sm_75 adds some more. In any case, the question is whether it makes
sense to add those flags if no use is made of those flags.

In particular, sm_80 is according to the following webpage only supported
with PTX ISA 7.0 of CUDA 11.0. But GCC currently only supports
-mptx=3.6 (default) and -mptx=6.3 (= CUDA 10).
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-no
tes

Note that you missed to update gcc/config/nvptx/t-omp-device for the new
sm_*  and likewise the "-misa=@var{ISA-string}" section in
gcc/gcc/doc/invoke.texi.

Additionally, I wonder whether the preprocessor macros __nvptx__,
__nvptx_softstack__, __nvptx_unisimt__ and __PTX_SM__  should be documented
somewhere as well. As all but one are related to command-line options, I
wonder whether the respective section in invoke.texi would be a good place
for them.

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
Registergericht München, HRB 106955

/* { dg-do compile } */
/* { dg-options "-O2 -misa=sm_80 -mptx=6.3 -ffast-math" } */

_Float16 x;
_Float16 y;
_Float16 t;

void foo()
{
  t = x < y ? x : y;
}

void bar()
{
  t = x > y ? x : y;
}

/* { dg-final { scan-assembler "min.f16" } } */
/* { dg-final { scan-assembler "max.f16" } } */
/* { dg-do compile } */
/* { dg-options "-O2 -ffast-math -misa=sm_75" } */

float foo(float x)
{
  return __builtin_tanhf(x);
}

/* { dg-final { scan-assembler "tanh.approx.f32" } } */

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 108de1c..1d0a197 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -26,6 +26,7 @@
    UNSPEC_EXP2
    UNSPEC_SIN
    UNSPEC_COS
+   UNSPEC_TANH
 
    UNSPEC_FPINT_FLOOR
    UNSPEC_FPINT_BTRUNC
@@ -196,6 +197,7 @@
 (define_mode_iterator QHIM [QI HI])
 (define_mode_iterator QHSIM [QI HI SI])
 (define_mode_iterator SDFM [SF DF])
+(define_mode_iterator HSFM [HF SF])
 (define_mode_iterator SDCM [SC DC])
 (define_mode_iterator BITS [SI SF])
 (define_mode_iterator BITD [DI DF])
@@ -273,6 +275,48 @@
 }
   [(set_attr "subregs_ok" "true")])
 
+(define_insn "*movhf_insn"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=R,R,m")
+	(match_operand:HF 1 "nonimmediate_operand" "R,m,R"))]
+  "!MEM_P (operands[0]) || REG_P (operands[1])"
+  "@
+   %.\\tmov.b16\\t%0, %1;
+   %.\\tld.b16\\t%0, %1;
+   %.\\tst.b16\\t%0, %1;")
+
+(define_expand "movhf"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "")
+	(match_operand:HF 1 "nonimmediate_operand" ""))]
+  ""
+{
+  /* Load HFmode constants as SFmode with an explicit FLOAT_TRUNCATE.  */
+  if (CONST_DOUBLE_P (operands[1]))
+    {
+      rtx tmp1 = gen_reg_rtx (SFmode);
+      REAL_VALUE_TYPE d = *CONST_DOUBLE_REAL_VALUE (operands[1]);
+      real_convert (&d, SFmode, &d);
+      emit_move_insn (tmp1, const_double_from_real_value (d, SFmode));
+
+      if (!REG_P (operands[0]))
+	{
+	  rtx tmp2 = gen_reg_rtx (HFmode);
+	  emit_insn (gen_truncsfhf2 (tmp2, tmp1));
+	  emit_move_insn (operands[0], tmp2);
+	}
+      else
+        emit_insn (gen_truncsfhf2 (operands[0], tmp1));
+      DONE;
+    }
+     
+  if (MEM_P (operands[0]) && !REG_P (operands[1]))
+    {
+      rtx tmp = gen_reg_rtx (HFmode);
+      emit_move_insn (tmp, operands[1]);
+      emit_move_insn (operands[0], tmp);
+      DONE;
+    }
+})
+
 (define_insn "load_arg_reg<mode>"
   [(set (match_operand:QHIM 0 "nvptx_register_operand" "=R")
 	(unspec:QHIM [(match_operand 1 "const_int_operand" "n")]
@@ -1052,6 +1096,57 @@
   "flag_unsafe_math_optimizations"
   "%.\\tex2.approx%t0\\t%0, %1;")
 
+(define_insn "exp2hf2"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(unspec:HF [(match_operand:HF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_EXP2))]
+  "TARGET_SM75 && flag_unsafe_math_optimizations"
+  "%.\\tex2.approx.f16\\t%0, %1;")
+
+(define_insn "tanh<mode>2"
+  [(set (match_operand:HSFM 0 "nvptx_register_operand" "=R")
+	(unspec:HSFM [(match_operand:HSFM 1 "nvptx_register_operand" "R")]
+		     UNSPEC_TANH))]
+  "TARGET_SM75 && flag_unsafe_math_optimizations"
+  "%.\\ttanh.approx%t0\\t%0, %1;")
+
+;; HFmode floating point arithmetic.
+
+(define_insn "addhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(plus:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		 (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tadd.f16\\t%0, %1, %2;")
+
+(define_insn "subhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(minus:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		  (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tsub.f16\\t%0, %1, %2;")
+
+(define_insn "mulhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(mult:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		 (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tmul.f16\\t%0, %1, %2;")
+
+(define_insn "sminhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(smin:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		 (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM80"
+  "%.\\tmin.f16\\t%0, %1, %2;")
+
+(define_insn "smaxhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(smax:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		 (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM80"
+  "%.\\tmax.f16\\t%0, %1, %2;")
+
 ;; Conversions involving floating point
 
 (define_insn "extendsfdf2"
@@ -1145,6 +1240,18 @@
   ""
   "%.\\tcvt<FPINT2:fpint2_roundingmode>.s%T0%t1\\t%0, %1;")
 
+(define_insn "extendhf<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(float_extend:SDFM (match_operand:HF 1 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tcvt%t0%t1\\t%0, %1;")
+
+(define_insn "trunc<mode>hf2"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(float_truncate:HF (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tcvt%#%t0%t1\\t%0, %1;")
+
 ;; Vector operations
 
 (define_insn "*vec_set<mode>_0"