nvptx: Adds uses of -misa=sm_75 and -misa=sm_80

Message ID 001a01d7abda$79e43c60$6dacb520$@nextmovesoftware.com
State New
Headers
Series nvptx: Adds uses of -misa=sm_75 and -misa=sm_80 |

Commit Message

Roger Sayle Sept. 17, 2021, 3:41 p.m. UTC
  This patch adds upon my previous patch to prototype HFmode support on
nvptx, which includes adding new target macros TARGET_SM75 and TARGET_SM80.
Tobias Burnus has questioned "whether it makes sense to add those
flags if no use is made of those flags".  I had hoped that it might
be possible to split these patch submissions into smaller parts to
assist the review process, but failing that, here's part 2, that
adds support for __builtin_tanhf, HFmode exp2/tanh and also
for HFmode min/max, controlled by TARGET_SM75 and TARGET_SM80 respectively.

The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu
(on top of my previous patch) with a "make" and "make -k check" with no
new failures.  Please ignore the hunks in the git diff that were described
in the previous patch (hopefully I'll be able to resume submitting
patches sequentially in future).  Are both parts Ok for mainline?


2020-09-17  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/nvptx/nvptx.md (define_c_enum "unspec"): New UNSPEC_TANH.
	(define_mode_iterator HSFM): New iterator for HFmode and SFmode.
	(exp2hf2): New define_insn controlled by TARGET_SM75.
	(tanh<mode>2): New define_insn controlled by TARGET_SM75.
	(sminhf3, smaxhf3): New define_isnns controlled by TARGET_SM80.

gcc/testsuite/ChangeLog
	* gcc.target/nvptx/float16-2.c: New test case.
	* gcc.target/nvptx/tanh-1.c: New test case.

Roger
--


-----Original Message-----
From: Tobias Burnus <tobias@codesourcery.com> 
Sent: 17 September 2021 09:25
To: Roger Sayle <roger@nextmovesoftware.com>; 'GCC Patches'
<gcc-patches@gcc.gnu.org>; Tom de Vries <tdevries@suse.de>
Subject: Re: [PATCH] nvptx: Add (experimental) support for HFmode with
-misa=sm_53

Hi Roger,

some more generic remarks not specific to using new ISA features.

On 17.09.21 00:53, Roger Sayle wrote:

> Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points 
> where other useful instructions were added to the ISA.

First, my impression was that already sm_70 added lots of useful stuff, but
granted sm_75 adds some more. In any case, the question is whether it makes
sense to add those flags if no use is made of those flags.

In particular, sm_80 is according to the following webpage only supported
with PTX ISA 7.0 of CUDA 11.0. But GCC currently only supports
-mptx=3.6 (default) and -mptx=6.3 (= CUDA 10).
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-no
tes

Note that you missed to update gcc/config/nvptx/t-omp-device for the new
sm_*  and likewise the "-misa=@var{ISA-string}" section in
gcc/gcc/doc/invoke.texi.

Additionally, I wonder whether the preprocessor macros __nvptx__,
__nvptx_softstack__, __nvptx_unisimt__ and __PTX_SM__  should be documented
somewhere as well. As all but one are related to command-line options, I
wonder whether the respective section in invoke.texi would be a good place
for them.

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
Registergericht München, HRB 106955

/* { dg-do compile } */
/* { dg-options "-O2 -misa=sm_80 -mptx=6.3 -ffast-math" } */

_Float16 x;
_Float16 y;
_Float16 t;

void foo()
{
  t = x < y ? x : y;
}

void bar()
{
  t = x > y ? x : y;
}

/* { dg-final { scan-assembler "min.f16" } } */
/* { dg-final { scan-assembler "max.f16" } } */
/* { dg-do compile } */
/* { dg-options "-O2 -ffast-math -misa=sm_75" } */

float foo(float x)
{
  return __builtin_tanhf(x);
}

/* { dg-final { scan-assembler "tanh.approx.f32" } } */
  

Comments

Tom de Vries Dec. 15, 2021, 2:08 p.m. UTC | #1
On 9/17/21 5:41 PM, Roger Sayle wrote:
> 
> This patch adds upon my previous patch to prototype HFmode support on
> nvptx, which includes adding new target macros TARGET_SM75 and TARGET_SM80.

I've mode those parts into this patch.

> Tobias Burnus has questioned "whether it makes sense to add those
> flags if no use is made of those flags".  I had hoped that it might
> be possible to split these patch submissions into smaller parts to
> assist the review process, but failing that, here's part 2, that
> adds support for __builtin_tanhf, HFmode exp2/tanh and also
> for HFmode min/max, controlled by TARGET_SM75 and TARGET_SM80 respectively.
> 
> The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu
> (on top of my previous patch) with a "make" and "make -k check" with no
> new failures.  Please ignore the hunks in the git diff that were described
> in the previous patch (hopefully I'll be able to resume submitting
> patches sequentially in future).  Are both parts Ok for mainline?
> 
> 

Committed.

I've used mptx=7.0 in the test-cases, since that's required.

That doesn't become apparent though unless dg-do assemble is used
instead of dg-do compile, but I've left that as is for now.  To deal
with this properly will require adding some required target testing of
what is supported by ptxas, and then choosing between dg-do assemble and
compile based on what is supported, and that all looks involved enough
to treat as a separate issue.

Thanks,
- Tom

> 2020-09-17  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
> 	* config/nvptx/nvptx.md (define_c_enum "unspec"): New UNSPEC_TANH.
> 	(define_mode_iterator HSFM): New iterator for HFmode and SFmode.
> 	(exp2hf2): New define_insn controlled by TARGET_SM75.
> 	(tanh<mode>2): New define_insn controlled by TARGET_SM75.
> 	(sminhf3, smaxhf3): New define_isnns controlled by TARGET_SM80.
> 
> gcc/testsuite/ChangeLog
> 	* gcc.target/nvptx/float16-2.c: New test case.
> 	* gcc.target/nvptx/tanh-1.c: New test case.
> 
> Roger
> --
> 
> 
> -----Original Message-----
> From: Tobias Burnus <tobias@codesourcery.com> 
> Sent: 17 September 2021 09:25
> To: Roger Sayle <roger@nextmovesoftware.com>; 'GCC Patches'
> <gcc-patches@gcc.gnu.org>; Tom de Vries <tdevries@suse.de>
> Subject: Re: [PATCH] nvptx: Add (experimental) support for HFmode with
> -misa=sm_53
> 
> Hi Roger,
> 
> some more generic remarks not specific to using new ISA features.
> 
> On 17.09.21 00:53, Roger Sayle wrote:
> 
>> Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points 
>> where other useful instructions were added to the ISA.
> 
> First, my impression was that already sm_70 added lots of useful stuff, but
> granted sm_75 adds some more. In any case, the question is whether it makes
> sense to add those flags if no use is made of those flags.
> 
> In particular, sm_80 is according to the following webpage only supported
> with PTX ISA 7.0 of CUDA 11.0. But GCC currently only supports
> -mptx=3.6 (default) and -mptx=6.3 (= CUDA 10).
> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-no
> tes
> 
> Note that you missed to update gcc/config/nvptx/t-omp-device for the new
> sm_*  and likewise the "-misa=@var{ISA-string}" section in
> gcc/gcc/doc/invoke.texi.
> 
> Additionally, I wonder whether the preprocessor macros __nvptx__,
> __nvptx_softstack__, __nvptx_unisimt__ and __PTX_SM__  should be documented
> somewhere as well. As all but one are related to command-line options, I
> wonder whether the respective section in invoke.texi would be a good place
> for them.
> 
> Tobias
> 
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> Registergericht München, HRB 106955
>
  

Patch

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 108de1c..1d0a197 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -26,6 +26,7 @@ 
    UNSPEC_EXP2
    UNSPEC_SIN
    UNSPEC_COS
+   UNSPEC_TANH
 
    UNSPEC_FPINT_FLOOR
    UNSPEC_FPINT_BTRUNC
@@ -196,6 +197,7 @@ 
 (define_mode_iterator QHIM [QI HI])
 (define_mode_iterator QHSIM [QI HI SI])
 (define_mode_iterator SDFM [SF DF])
+(define_mode_iterator HSFM [HF SF])
 (define_mode_iterator SDCM [SC DC])
 (define_mode_iterator BITS [SI SF])
 (define_mode_iterator BITD [DI DF])
@@ -273,6 +275,48 @@ 
 }
   [(set_attr "subregs_ok" "true")])
 
+(define_insn "*movhf_insn"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=R,R,m")
+	(match_operand:HF 1 "nonimmediate_operand" "R,m,R"))]
+  "!MEM_P (operands[0]) || REG_P (operands[1])"
+  "@
+   %.\\tmov.b16\\t%0, %1;
+   %.\\tld.b16\\t%0, %1;
+   %.\\tst.b16\\t%0, %1;")
+
+(define_expand "movhf"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "")
+	(match_operand:HF 1 "nonimmediate_operand" ""))]
+  ""
+{
+  /* Load HFmode constants as SFmode with an explicit FLOAT_TRUNCATE.  */
+  if (CONST_DOUBLE_P (operands[1]))
+    {
+      rtx tmp1 = gen_reg_rtx (SFmode);
+      REAL_VALUE_TYPE d = *CONST_DOUBLE_REAL_VALUE (operands[1]);
+      real_convert (&d, SFmode, &d);
+      emit_move_insn (tmp1, const_double_from_real_value (d, SFmode));
+
+      if (!REG_P (operands[0]))
+	{
+	  rtx tmp2 = gen_reg_rtx (HFmode);
+	  emit_insn (gen_truncsfhf2 (tmp2, tmp1));
+	  emit_move_insn (operands[0], tmp2);
+	}
+      else
+        emit_insn (gen_truncsfhf2 (operands[0], tmp1));
+      DONE;
+    }
+     
+  if (MEM_P (operands[0]) && !REG_P (operands[1]))
+    {
+      rtx tmp = gen_reg_rtx (HFmode);
+      emit_move_insn (tmp, operands[1]);
+      emit_move_insn (operands[0], tmp);
+      DONE;
+    }
+})
+
 (define_insn "load_arg_reg<mode>"
   [(set (match_operand:QHIM 0 "nvptx_register_operand" "=R")
 	(unspec:QHIM [(match_operand 1 "const_int_operand" "n")]
@@ -1052,6 +1096,57 @@ 
   "flag_unsafe_math_optimizations"
   "%.\\tex2.approx%t0\\t%0, %1;")
 
+(define_insn "exp2hf2"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(unspec:HF [(match_operand:HF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_EXP2))]
+  "TARGET_SM75 && flag_unsafe_math_optimizations"
+  "%.\\tex2.approx.f16\\t%0, %1;")
+
+(define_insn "tanh<mode>2"
+  [(set (match_operand:HSFM 0 "nvptx_register_operand" "=R")
+	(unspec:HSFM [(match_operand:HSFM 1 "nvptx_register_operand" "R")]
+		     UNSPEC_TANH))]
+  "TARGET_SM75 && flag_unsafe_math_optimizations"
+  "%.\\ttanh.approx%t0\\t%0, %1;")
+
+;; HFmode floating point arithmetic.
+
+(define_insn "addhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(plus:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		 (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tadd.f16\\t%0, %1, %2;")
+
+(define_insn "subhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(minus:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		  (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tsub.f16\\t%0, %1, %2;")
+
+(define_insn "mulhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(mult:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		 (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tmul.f16\\t%0, %1, %2;")
+
+(define_insn "sminhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(smin:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		 (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM80"
+  "%.\\tmin.f16\\t%0, %1, %2;")
+
+(define_insn "smaxhf3"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(smax:HF (match_operand:HF 1 "nvptx_register_operand" "R")
+		 (match_operand:HF 2 "nvptx_register_operand" "R")))]
+  "TARGET_SM80"
+  "%.\\tmax.f16\\t%0, %1, %2;")
+
 ;; Conversions involving floating point
 
 (define_insn "extendsfdf2"
@@ -1145,6 +1240,18 @@ 
   ""
   "%.\\tcvt<FPINT2:fpint2_roundingmode>.s%T0%t1\\t%0, %1;")
 
+(define_insn "extendhf<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(float_extend:SDFM (match_operand:HF 1 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tcvt%t0%t1\\t%0, %1;")
+
+(define_insn "trunc<mode>hf2"
+  [(set (match_operand:HF 0 "nvptx_register_operand" "=R")
+	(float_truncate:HF (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  "TARGET_SM53"
+  "%.\\tcvt%#%t0%t1\\t%0, %1;")
+
 ;; Vector operations
 
 (define_insn "*vec_set<mode>_0"