From patchwork Fri Sep 17 15:41:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 45134 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 04E77385AC19 for ; Fri, 17 Sep 2021 15:43:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 78AAC385AC38 for ; Fri, 17 Sep 2021 15:41:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 78AAC385AC38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=DDO1fW7XPWhzP0IWiJ6vojWBZq5NZmI2qV1Jr6isYls=; b=g/SdwN/7FPhfn8wuXf/2Oqk8g+ 0/maZ20ezFtB2lZC2JHISQFQuEpCXte1XiIwCDz6ojlNyNthtWeTI85Y50aT1mBy8hxSS1i59wGq2 mNQ6r4XErIVZ+ReRLMPvfCU8s8skcgqqslP4bOCN5psTkSKgSHZfpl2PIG8HWPKVbCFfYD8VT5LhX sCP+HNpOKBDAqplohl5iIl5TMmWKypPWFFCwo87W3Tp2JZwxNy7zRlNaOruKQnt3IuJx8GPxLZO7h +jEFCAyDZCeo8CwHQt9s6Ak2uR3bQzrq2R68vN0opVgw0GeRPPGbr0YnsSpbWqQjZl/Ly1uw0dw95 bBqvXFYw==; Received: from [185.62.158.67] (port=52103 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mRFzS-0008VK-Qg; Fri, 17 Sep 2021 11:41:27 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] nvptx: Adds uses of -misa=sm_75 and -misa=sm_80 Date: Fri, 17 Sep 2021 16:41:25 +0100 Message-ID: <001a01d7abda$79e43c60$6dacb520$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Ader2U1M3oLGuge+SfiiDtkzQfAe6g== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 'Tobias Burnus' Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds upon my previous patch to prototype HFmode support on nvptx, which includes adding new target macros TARGET_SM75 and TARGET_SM80. Tobias Burnus has questioned "whether it makes sense to add those flags if no use is made of those flags". I had hoped that it might be possible to split these patch submissions into smaller parts to assist the review process, but failing that, here's part 2, that adds support for __builtin_tanhf, HFmode exp2/tanh and also for HFmode min/max, controlled by TARGET_SM75 and TARGET_SM80 respectively. The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu (on top of my previous patch) with a "make" and "make -k check" with no new failures. Please ignore the hunks in the git diff that were described in the previous patch (hopefully I'll be able to resume submitting patches sequentially in future). Are both parts Ok for mainline? 2020-09-17 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.md (define_c_enum "unspec"): New UNSPEC_TANH. (define_mode_iterator HSFM): New iterator for HFmode and SFmode. (exp2hf2): New define_insn controlled by TARGET_SM75. (tanh2): New define_insn controlled by TARGET_SM75. (sminhf3, smaxhf3): New define_isnns controlled by TARGET_SM80. gcc/testsuite/ChangeLog * gcc.target/nvptx/float16-2.c: New test case. * gcc.target/nvptx/tanh-1.c: New test case. Roger --- -----Original Message----- From: Tobias Burnus Sent: 17 September 2021 09:25 To: Roger Sayle ; 'GCC Patches' ; Tom de Vries Subject: Re: [PATCH] nvptx: Add (experimental) support for HFmode with -misa=sm_53 Hi Roger, some more generic remarks not specific to using new ISA features. On 17.09.21 00:53, Roger Sayle wrote: > Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points > where other useful instructions were added to the ISA. First, my impression was that already sm_70 added lots of useful stuff, but granted sm_75 adds some more. In any case, the question is whether it makes sense to add those flags if no use is made of those flags. In particular, sm_80 is according to the following webpage only supported with PTX ISA 7.0 of CUDA 11.0. But GCC currently only supports -mptx=3.6 (default) and -mptx=6.3 (= CUDA 10). https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-no tes Note that you missed to update gcc/config/nvptx/t-omp-device for the new sm_* and likewise the "-misa=@var{ISA-string}" section in gcc/gcc/doc/invoke.texi. Additionally, I wonder whether the preprocessor macros __nvptx__, __nvptx_softstack__, __nvptx_unisimt__ and __PTX_SM__ should be documented somewhere as well. As all but one are related to command-line options, I wonder whether the respective section in invoke.texi would be a good place for them. Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 /* { dg-do compile } */ /* { dg-options "-O2 -misa=sm_80 -mptx=6.3 -ffast-math" } */ _Float16 x; _Float16 y; _Float16 t; void foo() { t = x < y ? x : y; } void bar() { t = x > y ? x : y; } /* { dg-final { scan-assembler "min.f16" } } */ /* { dg-final { scan-assembler "max.f16" } } */ /* { dg-do compile } */ /* { dg-options "-O2 -ffast-math -misa=sm_75" } */ float foo(float x) { return __builtin_tanhf(x); } /* { dg-final { scan-assembler "tanh.approx.f32" } } */ diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 108de1c..1d0a197 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -26,6 +26,7 @@ UNSPEC_EXP2 UNSPEC_SIN UNSPEC_COS + UNSPEC_TANH UNSPEC_FPINT_FLOOR UNSPEC_FPINT_BTRUNC @@ -196,6 +197,7 @@ (define_mode_iterator QHIM [QI HI]) (define_mode_iterator QHSIM [QI HI SI]) (define_mode_iterator SDFM [SF DF]) +(define_mode_iterator HSFM [HF SF]) (define_mode_iterator SDCM [SC DC]) (define_mode_iterator BITS [SI SF]) (define_mode_iterator BITD [DI DF]) @@ -273,6 +275,48 @@ } [(set_attr "subregs_ok" "true")]) +(define_insn "*movhf_insn" + [(set (match_operand:HF 0 "nonimmediate_operand" "=R,R,m") + (match_operand:HF 1 "nonimmediate_operand" "R,m,R"))] + "!MEM_P (operands[0]) || REG_P (operands[1])" + "@ + %.\\tmov.b16\\t%0, %1; + %.\\tld.b16\\t%0, %1; + %.\\tst.b16\\t%0, %1;") + +(define_expand "movhf" + [(set (match_operand:HF 0 "nonimmediate_operand" "") + (match_operand:HF 1 "nonimmediate_operand" ""))] + "" +{ + /* Load HFmode constants as SFmode with an explicit FLOAT_TRUNCATE. */ + if (CONST_DOUBLE_P (operands[1])) + { + rtx tmp1 = gen_reg_rtx (SFmode); + REAL_VALUE_TYPE d = *CONST_DOUBLE_REAL_VALUE (operands[1]); + real_convert (&d, SFmode, &d); + emit_move_insn (tmp1, const_double_from_real_value (d, SFmode)); + + if (!REG_P (operands[0])) + { + rtx tmp2 = gen_reg_rtx (HFmode); + emit_insn (gen_truncsfhf2 (tmp2, tmp1)); + emit_move_insn (operands[0], tmp2); + } + else + emit_insn (gen_truncsfhf2 (operands[0], tmp1)); + DONE; + } + + if (MEM_P (operands[0]) && !REG_P (operands[1])) + { + rtx tmp = gen_reg_rtx (HFmode); + emit_move_insn (tmp, operands[1]); + emit_move_insn (operands[0], tmp); + DONE; + } +}) + (define_insn "load_arg_reg" [(set (match_operand:QHIM 0 "nvptx_register_operand" "=R") (unspec:QHIM [(match_operand 1 "const_int_operand" "n")] @@ -1052,6 +1096,57 @@ "flag_unsafe_math_optimizations" "%.\\tex2.approx%t0\\t%0, %1;") +(define_insn "exp2hf2" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (unspec:HF [(match_operand:HF 1 "nvptx_register_operand" "R")] + UNSPEC_EXP2))] + "TARGET_SM75 && flag_unsafe_math_optimizations" + "%.\\tex2.approx.f16\\t%0, %1;") + +(define_insn "tanh2" + [(set (match_operand:HSFM 0 "nvptx_register_operand" "=R") + (unspec:HSFM [(match_operand:HSFM 1 "nvptx_register_operand" "R")] + UNSPEC_TANH))] + "TARGET_SM75 && flag_unsafe_math_optimizations" + "%.\\ttanh.approx%t0\\t%0, %1;") + +;; HFmode floating point arithmetic. + +(define_insn "addhf3" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (plus:HF (match_operand:HF 1 "nvptx_register_operand" "R") + (match_operand:HF 2 "nvptx_register_operand" "R")))] + "TARGET_SM53" + "%.\\tadd.f16\\t%0, %1, %2;") + +(define_insn "subhf3" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (minus:HF (match_operand:HF 1 "nvptx_register_operand" "R") + (match_operand:HF 2 "nvptx_register_operand" "R")))] + "TARGET_SM53" + "%.\\tsub.f16\\t%0, %1, %2;") + +(define_insn "mulhf3" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (mult:HF (match_operand:HF 1 "nvptx_register_operand" "R") + (match_operand:HF 2 "nvptx_register_operand" "R")))] + "TARGET_SM53" + "%.\\tmul.f16\\t%0, %1, %2;") + +(define_insn "sminhf3" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (smin:HF (match_operand:HF 1 "nvptx_register_operand" "R") + (match_operand:HF 2 "nvptx_register_operand" "R")))] + "TARGET_SM80" + "%.\\tmin.f16\\t%0, %1, %2;") + +(define_insn "smaxhf3" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (smax:HF (match_operand:HF 1 "nvptx_register_operand" "R") + (match_operand:HF 2 "nvptx_register_operand" "R")))] + "TARGET_SM80" + "%.\\tmax.f16\\t%0, %1, %2;") + ;; Conversions involving floating point (define_insn "extendsfdf2" @@ -1145,6 +1240,18 @@ "" "%.\\tcvt.s%T0%t1\\t%0, %1;") +(define_insn "extendhf2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (float_extend:SDFM (match_operand:HF 1 "nvptx_register_operand" "R")))] + "TARGET_SM53" + "%.\\tcvt%t0%t1\\t%0, %1;") + +(define_insn "trunchf2" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (float_truncate:HF (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "TARGET_SM53" + "%.\\tcvt%#%t0%t1\\t%0, %1;") + ;; Vector operations (define_insn "*vec_set_0"