From patchwork Tue Oct 5 17:48:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 45905 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A47D0385E828 for ; Tue, 5 Oct 2021 17:48:20 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 42543385DC15 for ; Tue, 5 Oct 2021 17:48:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 42543385DC15 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=zBSC3Cso8cUB7Jq4ZclXCEA2cWczqOJ18iCCEOjdusw=; b=PhQ26FooTj02/GUMc/bEeFZAhl 7ZFhFtMVZhkdhW7990zqg9W6X4SMHLcbIL0QSHsYQfNFjazVtaD/EjYOiA1rzS6NHfJq4yQ1mkY+5 Pe1+RW5vezMlk+L0ntvi4jbore4F58mpZYJCDLMJByLLxaDtPll7LJapAzzFVMXLePdp0/i8RTcLJ tId9GUUvucRjhWih5q0WP+n8yCUG37RVDxCl1xECCii74WZaz5k6J+hYODSodVndUo59Cf4Cv/TF7 v4t4QYrp2UzVwkAnSvIZ+1AeERFR8xh+G4gsgVbjhQGuGMpfBORAmMLbHmxhoowNa1jCS5wcF0EM0 eTNkpbMg==; Received: from [185.62.158.67] (port=49739 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mXoXq-0005XB-F6; Tue, 05 Oct 2021 13:48:02 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] Transition nvptx backend to STORE_FLAG_VALUE = 1 Date: Tue, 5 Oct 2021 18:48:01 +0100 Message-ID: <03f801d7ba11$24ea8a80$6ebf9f80$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Ade6D+4y6Xs3OY3KRxuvAFxdFbTdaA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch to the nvptx backend changes the backend's STORE_FLAG_VALUE from -1 to 1, by using BImode predicates and selp instructions, instead of set instructions (almost always followed by integer negation). Historically, it was reasonable (through rare) for backends to use -1 for representing true during the RTL passes. However with tree-ssa, GCC now emits lots of code that reads and writes _Bool values, requiring STORE_FLAG_VALUE=-1 targets to frequently convert 0/-1 pseudos to 0/1 pseudos using integer negation. Unfortunately, this process prevents or complicates many optimizations (negate isn't associative with logical AND, OR and XOR, and interferes with range/vrp/nonzerobits bounds etc.). The impact of this is that for a relatively simple logical expression like "return (x==21) && (y==69);", the nvptx backend currently generates: mov.u32 %r26, %ar0; mov.u32 %r27, %ar1; set.u32.eq.u32 %r30, %r26, 21; neg.s32 %r31, %r30; mov.u32 %r29, %r31; set.u32.eq.u32 %r33, %r27, 69; neg.s32 %r34, %r33; mov.u32 %r32, %r34; cvt.u16.u8 %r39, %r29; mov.u16 %r36, %r39; cvt.u16.u8 %r39, %r32; mov.u16 %r37, %r39; and.b16 %r35, %r36, %r37; cvt.u32.u16 %r38, %r35; cvt.u32.u8 %value, %r38; This patch tweaks nvptx to generate 0/1 values instead, requiring the same number of instructions, using (BImode) predicate registers and selp instructions so as to now generate the almost identical: mov.u32 %r26, %ar0; mov.u32 %r27, %ar1; setp.eq.u32 %r31, %r26, 21; selp.u32 %r30, 1, 0, %r31; mov.u32 %r29, %r30; setp.eq.u32 %r34, %r27, 69; selp.u32 %r33, 1, 0, %r34; mov.u32 %r32, %r33; cvt.u16.u8 %r39, %r29; mov.u16 %r36, %r39; cvt.u16.u8 %r39, %r32; mov.u16 %r37, %r39; and.b16 %r35, %r36, %r37; cvt.u32.u16 %r38, %r35; cvt.u32.u8 %value, %r38; The hidden benefit is that this sequence can (in theory) be optimized by the RTL passes to eventually generate a much shorter sequence using an and.pred instruction (just like Nvidia's nvcc compiler). This patch has been tested nvptx-none with a "make" and "make -k check" (including newlib) hosted on x86_64-pc-linux-gnu with no new failures. Ok for mainline? 2021-10-05 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.h (STORE_FLAG_VALUE): Change to 1. * config/nvptx/nvptx.md (movbi): Use P1 constraint for true. (setcc_from_bi): Remove SImode specific pattern. (setcc_from_bi): Provide more general HSDIM pattern. (extendbi2, zeroextendbi2): Provide instructions for sign- and zero-extending BImode predicates to integers. (setcc_int): Remove previous (-1-based) instructions. (cstorebi4): Remove BImode to SImode specific expander. (cstore4): Fix indentation. Expand using setccsi_from_bi. (cstore4): For both integer and floating point modes. Thanks in advance, Roger diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index d367174..fcebcf9 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -315,7 +315,7 @@ struct GTY(()) machine_function #define NO_DOT_IN_LABEL #define ASM_COMMENT_START "//" -#define STORE_FLAG_VALUE -1 +#define STORE_FLAG_VALUE 1 #define FLOAT_STORE_FLAG_VALUE(MODE) REAL_VALUE_ATOF("1.0", (MODE)) #define CASE_VECTOR_MODE SImode diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 108de1c..b3275b1 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -213,7 +213,7 @@ ;; get variables in this mode and pseudos are never spilled. (define_insn "movbi" [(set (match_operand:BI 0 "nvptx_register_operand" "=R,R,R") - (match_operand:BI 1 "nvptx_nonmemory_operand" "R,P0,Pn"))] + (match_operand:BI 1 "nvptx_nonmemory_operand" "R,P0,P1"))] "" "@ %.\\tmov%t0\\t%0, %1; @@ -789,12 +789,26 @@ ;; Conditional stores -(define_insn "setcc_from_bi" - [(set (match_operand:SI 0 "nvptx_register_operand" "=R") - (ne:SI (match_operand:BI 1 "nvptx_register_operand" "R") - (const_int 0)))] +(define_insn "setcc_from_bi" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (ne:HSDIM (match_operand:BI 1 "nvptx_register_operand" "R") + (const_int 0)))] + "" + "%.\\tselp%t0\\t%0, 1, 0, %1;") + +(define_insn "extendbi2" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (sign_extend:HSDIM + (match_operand:BI 1 "nvptx_register_operand" "R")))] + "" + "%.\\tselp%t0\\t%0, -1, 0, %1;") + +(define_insn "zero_extendbi2" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (zero_extend:HSDIM + (match_operand:BI 1 "nvptx_register_operand" "R")))] "" - "%.\\tselp%t0 %0,-1,0,%1;") + "%.\\tselp%t0\\t%0, 1, 0, %1;") (define_insn "sel_true" [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") @@ -832,22 +846,6 @@ "" "%.\\tselp%t0\\t%0, %3, %2, %1;") -(define_insn "setcc_int" - [(set (match_operand:SI 0 "nvptx_register_operand" "=R") - (match_operator:SI 1 "nvptx_comparison_operator" - [(match_operand:HSDIM 2 "nvptx_register_operand" "R") - (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))] - "" - "%.\\tset%t0%c1\\t%0, %2, %3;") - -(define_insn "setcc_int" - [(set (match_operand:SI 0 "nvptx_register_operand" "=R") - (match_operator:SI 1 "nvptx_float_comparison_operator" - [(match_operand:SDFM 2 "nvptx_register_operand" "R") - (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))] - "" - "%.\\tset%t0%c1\\t%0, %2, %3;") - (define_insn "setcc_float" [(set (match_operand:SF 0 "nvptx_register_operand" "=R") (match_operator:SF 1 "nvptx_comparison_operator" @@ -864,29 +862,35 @@ "" "%.\\tset%t0%c1\\t%0, %2, %3;") -(define_expand "cstorebi4" - [(set (match_operand:SI 0 "nvptx_register_operand") - (match_operator:SI 1 "ne_operator" - [(match_operand:BI 2 "nvptx_register_operand") - (match_operand:BI 3 "const0_operand")]))] - "" - "") - (define_expand "cstore4" [(set (match_operand:SI 0 "nvptx_register_operand") (match_operator:SI 1 "nvptx_comparison_operator" - [(match_operand:HSDIM 2 "nvptx_register_operand") - (match_operand:HSDIM 3 "nvptx_nonmemory_operand")]))] + [(match_operand:HSDIM 2 "nvptx_register_operand") + (match_operand:HSDIM 3 "nvptx_nonmemory_operand")]))] "" - "") +{ + rtx reg = gen_reg_rtx (BImode); + rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), BImode, + operands[2], operands[3]); + emit_move_insn (reg, cmp); + emit_insn (gen_setccsi_from_bi (operands[0], reg)); + DONE; +}) (define_expand "cstore4" [(set (match_operand:SI 0 "nvptx_register_operand") (match_operator:SI 1 "nvptx_float_comparison_operator" - [(match_operand:SDFM 2 "nvptx_register_operand") - (match_operand:SDFM 3 "nvptx_nonmemory_operand")]))] + [(match_operand:SDFM 2 "nvptx_register_operand") + (match_operand:SDFM 3 "nvptx_nonmemory_operand")]))] "" - "") +{ + rtx reg = gen_reg_rtx (BImode); + rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), BImode, + operands[2], operands[3]); + emit_move_insn (reg, cmp); + emit_insn (gen_setccsi_from_bi (operands[0], reg)); + DONE; +}) ;; Calls