From patchwork Mon Jan 10 10:58:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 49795 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 660BB3896C17 for ; Mon, 10 Jan 2022 10:58:23 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id AA0863896828 for ; Mon, 10 Jan 2022 10:58:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AA0863896828 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=NSWlwNgfLvOoyXkF859+fx++hjv91voAwo47iJAHzQs=; b=pggWzuQfOqZO5kLTrlVtJ4o0S+ 7wvhkLIlGrjN0jpB2BA3ZfTrRxyeDRG5re59/qe0GCrg3qwXqAI+crClxS1j5nqMYsdiq6OwGJorJ htIhJGezZz8dPKCI634mpJVrVJ4EqOTOd4U93wdKobLDPzGU1VnmgJ8U0UgHiS1lMErWMhKdLmkcW +MLs0lNUm7hTS8m8q9yvHIdFYSCfZxMgr0IKFNZooh7OAJaFc8DDvIdVAxiyJzBFOtlgrJMpgEmYp ttQHmyWDa8hj34isqHV+R2/zeagCH3ZTuYfQch84Dh0UNEOuHJPtMP6bm0V3BCWC+Y+h5DYCg5E7v aWMxFI/g==; Received: from host86-160-23-130.range86-160.btcentralplus.com ([86.160.23.130]:49858 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1n6sNJ-0002Hu-PL; Mon, 10 Jan 2022 05:58:05 -0500 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] nvptx: Expand QI mode operations using SI mode instructions. Date: Mon, 10 Jan 2022 10:58:04 -0000 Message-ID: <02e601d80610$f219a540$d64cefc0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdgGECz7MFPgfP4pSVShyhQ0Bs59BA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" One of the unusual target features of the Nvidia PTX ISA is that it doesn't provide QI mode (byte sized) operations or registers. Somewhat conventionally, 8-bit quantities are read from/written to memory using special instructions, but stored internally using SImode (32-bit) registers. GCC's middle-end accomodates targets without QImode optabs, by widening operations until suitable support is found, and with the current nvptx backend this means 16-bit HImode operations. The inconvenience is that nvptx is also a TARGET_TRULY_NOOP_TRUNCATION=false target, meaning that additional instructions are required to convert between the SImode registers used to hold QImode values, and the HImode registers used to operate on them (and back again). This results in a large amount of shuffling and type conversion in code dealing with bytes, i.e. using char or Boolean types. This patch improves the situation by providing expanders in the nvptx machine description to perform QImode operations natively in SImode instead of HImode. An alternate implementation might be to provide some form of target hook to specify which fallback modes to use during RTL expansion, but I think this requirement is unusual, and a solution entirely in the nvptx backend doesn't disturb/affect other targets. The improvements can be quite dramatic, as shown in the example below: int foo(int x, int y) { return (x==21) && (y==69); } previously with -O2 required 15 instructions: mov.u32 %r26, %ar0; mov.u32 %r27, %ar1; setp.eq.u32 %r31, %r26, 21; selp.u32 %r30, 1, 0, %r31; mov.u32 %r29, %r30; setp.eq.u32 %r34, %r27, 69; selp.u32 %r33, 1, 0, %r34; mov.u32 %r32, %r33; cvt.u16.u8 %r39, %r29; mov.u16 %r36, %r39; cvt.u16.u8 %r39, %r32; mov.u16 %r37, %r39; and.b16 %r35, %r36, %r37; cvt.u32.u16 %r38, %r35; cvt.u32.u8 %value, %r38; with this patch, now requires only 7 instructions: mov.u32 %r26, %ar0; mov.u32 %r27, %ar1; setp.eq.u32 %r31, %r26, 21; setp.eq.u32 %r34, %r27, 69; selp.u32 %r37, 1, 0, %r31; selp.u32 %r38, 1, 0, %r34; and.b32 %value, %r37, %r38; This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. Ok for mainline? 2022-01-10 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.md (cmp): Renamed from *cmp. (setcc_from_bi): Additionally support QImode. (extendbi2): Additionally support QImode. (zero_extendbi2): Additionally support QImode. (any_sbinary, any_ubinary, any_sunary, any_uunary): New code iterators for signed and unsigned, binary and unary operations. (qi3, qi3, qi2, qi2): New expanders to perform QImode operations using SImode instructions. (cstoreqi4): New define_expand. (*ext_truncsi2_qi): New define_insn. (*zext_truncsi2_qi): New define_insn. gcc/testsuite/ChangeLog * gcc.target/nvptx/bool-1.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index ce74672..cc9cff7 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -763,7 +763,7 @@ ;; Comparisons and branches -(define_insn "*cmp" +(define_insn "cmp" [(set (match_operand:BI 0 "nvptx_register_operand" "=R") (match_operator:BI 1 "nvptx_comparison_operator" [(match_operand:HSDIM 2 "nvptx_register_operand" "R") @@ -867,22 +867,22 @@ ;; Conditional stores (define_insn "setcc_from_bi" - [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") - (ne:HSDIM (match_operand:BI 1 "nvptx_register_operand" "R") + [(set (match_operand:QHSDIM 0 "nvptx_register_operand" "=R") + (ne:QHSDIM (match_operand:BI 1 "nvptx_register_operand" "R") (const_int 0)))] "" "%.\\tselp%t0\\t%0, 1, 0, %1;") (define_insn "extendbi2" - [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") - (sign_extend:HSDIM + [(set (match_operand:QHSDIM 0 "nvptx_register_operand" "=R") + (sign_extend:QHSDIM (match_operand:BI 1 "nvptx_register_operand" "R")))] "" "%.\\tselp%t0\\t%0, -1, 0, %1;") (define_insn "zero_extendbi2" - [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") - (zero_extend:HSDIM + [(set (match_operand:QHSDIM 0 "nvptx_register_operand" "=R") + (zero_extend:QHSDIM (match_operand:BI 1 "nvptx_register_operand" "R")))] "" "%.\\tselp%t0\\t%0, 1, 0, %1;") @@ -1947,3 +1947,104 @@ return nvptx_output_red_partition (operands[0], operands[1]); } [(set_attr "predicable" "false")]) + +;; Expand QI mode operations using SI mode instructions. +(define_code_iterator any_sbinary [plus minus smin smax]) +(define_code_attr sbinary [(plus "add") (minus "sub") (smin "smin") (smax "smax")]) + +(define_code_iterator any_ubinary [and ior xor umin umax]) +(define_code_attr ubinary [(and "and") (ior "ior") (xor "xor") (umin "umin") + (umax "umax")]) + +(define_code_iterator any_sunary [neg abs]) +(define_code_attr sunary [(neg "neg") (abs "abs")]) + +(define_code_iterator any_uunary [not]) +(define_code_attr uunary [(not "one_cmpl")]) + +(define_expand "qi3" + [(set (match_operand:QI 0 "nvptx_register_operand") + (any_sbinary:QI (match_operand:QI 1 "nvptx_nonmemory_operand") + (match_operand:QI 2 "nvptx_nonmemory_operand")))] + "" +{ + rtx reg = gen_reg_rtx (SImode); + rtx op0 = convert_modes (SImode, QImode, operands[1], 0); + rtx op1 = convert_modes (SImode, QImode, operands[2], 0); + if ( == MINUS) + op0 = force_reg (SImode, op0); + emit_insn (gen_si3 (reg, op0, op1)); + emit_insn (gen_truncsiqi2 (operands[0], reg)); + DONE; +}) + +(define_expand "qi3" + [(set (match_operand:QI 0 "nvptx_register_operand") + (any_ubinary:QI (match_operand:QI 1 "nvptx_nonmemory_operand") + (match_operand:QI 2 "nvptx_nonmemory_operand")))] + "" +{ + rtx reg = gen_reg_rtx (SImode); + rtx op0 = convert_modes (SImode, QImode, operands[1], 1); + rtx op1 = convert_modes (SImode, QImode, operands[2], 1); + emit_insn (gen_si3 (reg, op0, op1)); + emit_insn (gen_truncsiqi2 (operands[0], reg)); + DONE; +}) + +(define_expand "qi2" + [(set (match_operand:QI 0 "nvptx_register_operand") + (any_sunary:QI (match_operand:QI 1 "nvptx_nonmemory_operand")))] + "" +{ + rtx reg = gen_reg_rtx (SImode); + rtx op0 = convert_modes (SImode, QImode, operands[1], 0); + emit_insn (gen_si2 (reg, op0)); + emit_insn (gen_truncsiqi2 (operands[0], reg)); + DONE; +}) + +(define_expand "qi2" + [(set (match_operand:QI 0 "nvptx_register_operand") + (any_uunary:QI (match_operand:QI 1 "nvptx_nonmemory_operand")))] + "" +{ + rtx reg = gen_reg_rtx (SImode); + rtx op0 = convert_modes (SImode, QImode, operands[1], 1); + emit_insn (gen_si2 (reg, op0)); + emit_insn (gen_truncsiqi2 (operands[0], reg)); + DONE; +}) + +(define_expand "cstoreqi4" + [(set (match_operand:SI 0 "nvptx_register_operand") + (match_operator:SI 1 "nvptx_comparison_operator" + [(match_operand:QI 2 "nvptx_nonmemory_operand") + (match_operand:QI 3 "nvptx_nonmemory_operand")]))] + "" +{ + rtx reg = gen_reg_rtx (BImode); + enum rtx_code code = GET_CODE (operands[1]); + int unsignedp = unsigned_condition_p (code); + rtx op2 = convert_modes (SImode, QImode, operands[2], unsignedp); + rtx op3 = convert_modes (SImode, QImode, operands[3], unsignedp); + rtx cmp = gen_rtx_fmt_ee (code, SImode, op2, op3); + emit_insn (gen_cmpsi (reg, cmp, op2, op3)); + emit_insn (gen_setccsi_from_bi (operands[0], reg)); + DONE; +}) + +(define_insn "*ext_truncsi2_qi" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (sign_extend:SI + (truncate:QI (match_operand:SI 1 "nvptx_register_operand" "R"))))] + "" + "%.\\tcvt.s32.s8\\t%0, %1;") + +(define_insn "*zext_truncsi2_qi" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (zero_extend:SI + (truncate:QI (match_operand:SI 1 "nvptx_register_operand" "R"))))] + "" + "%.\\tcvt.u32.u8\\t%0, %1;") + diff --git a/gcc/testsuite/gcc.target/nvptx/bool-1.c b/gcc/testsuite/gcc.target/nvptx/bool-1.c new file mode 100644 index 0000000..1fc5cef7 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/bool-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int foo(int x, int y) +{ + return (x==21) && (y==69); +} + +/* { dg-final { scan-assembler-not "cvt.u16.u8" } } */ +/* { dg-final { scan-assembler-not "cvt.u32.u16" } } */ +/* { dg-final { scan-assembler-not "cvt.u32.u8" } } */ +