From patchwork Sat Jun 4 11:03:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 54808 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 34E90383980D for ; Sat, 4 Jun 2022 11:03:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 401A73839C5C for ; Sat, 4 Jun 2022 11:03:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 401A73839C5C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=iIdNttUhLic4BLXbw3pAtqXqdVpQKCeYPsfRYgv/Y1M=; b=ZkZWN7S5XYYijze95UEdo1vkt9 E/WAEHVJIxK5tSQDT1Gygk9ar58AOUM0ba+chb/pKpza5nBK6pDKRiZEIWJXupHVRHzEP1kLt6O7W 9gbrmrsqtOWg37fedNkF1lRN4ov+G8xWlZ6iKO4LZbL2PG8V3/C8rxNWbbJJpixqwNL3iZ+M0muMK EXIV7QGkRHmfeE3H83bvJbGySSC9+xAhKwa/8GR0LsDXMEv2QRKHrbH3WEuUQt4EA+UnDRrwlc6MY YwucaigyIgcXmHIddcnP7TXAsz940+Qp1NaMgf1EFcZotDgZEXdlLEIQ1QK8cMZ7+3Ra2RZGczCP9 Lznzz5cw==; Received: from [185.62.158.67] (port=56588 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nxRZ1-00044q-Hw; Sat, 04 Jun 2022 07:03:27 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [x86 PATCH] Recognize vpcmov in combine with -mxop. Date: Sat, 4 Jun 2022 12:03:26 +0100 Message-ID: <011d01d87802$b7f7fce0$27e7f6a0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adh4AeiCAYdZwqTFSaGRR3g9frLtow== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" By way of an apology for causing PR target/105791, where I'd overlooked the need to support V1TImode in TARGET_XOP's vpcmov instruction, this patch further improves support for TARGET_XOP's vpcmov instruction, by recognizing it in combine. Currently, the test case: typedef int v4si __attribute__ ((vector_size (16))); v4si foo(v4si c, v4si t, v4si f) { return (c&t)|(~c&f); } on x86_64 with -O2 -mxop generates: vpxor %xmm2, %xmm1, %xmm1 vpand %xmm0, %xmm1, %xmm1 vpxor %xmm2, %xmm1, %xmm0 ret but with this patch now generates: vpcmov %xmm0, %xmm2, %xmm1, %xmm0 ret On its own, the new combine splitter works fine on TARGET_64BIT, but alas with -m32 combine incorrectly thinks the replacement instruction is more expensive, as IF_THEN_ELSE isn't currently/correctly handled in ix86_rtx_costs. So to avoid the need for a target selector in the new testcase, I've updated ix86_rtx_costs to report that AMD's vpcmov has a latency of two cycles [it's now an obsolete instruction set extension and there's unlikely to ever be a processor where this instruction has a different timing], and while there I also added rtx_costs for x86_64's integer conditional move instructions (which have single cycle latency). This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-06-04 Roger Sayle gcc/ChangeLog * config/i386/i386.cc (ix86_rtx_costs): Add a new case for IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and TARGET_CMOVE's (scalar integer) conditional moves. * config/i386/sse.md (define_split): Recognize XOP's vpcmov from its equivalent (canonical) pxor;pand;pxor sequence. gcc/testsuite/ChangeLog * gcc.target/i386/xop-pcmov3.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 11f4ddf..0823161 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21009,6 +21009,37 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, } return false; + case IF_THEN_ELSE: + if (TARGET_XOP + && VECTOR_MODE_P (mode) + && (GET_MODE_SIZE (mode) == 16 || GET_MODE_SIZE (mode) == 32)) + { + /* vpcmov. */ + *total = speed ? COSTS_N_INSNS (2) : COSTS_N_BYTES (6); + if (!REG_P (XEXP (x, 0))) + *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed); + if (!REG_P (XEXP (x, 1))) + *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed); + if (!REG_P (XEXP (x, 2))) + *total += rtx_cost (XEXP (x, 2), mode, code, 2, speed); + return true; + } + else if (TARGET_CMOVE + && SCALAR_INT_MODE_P (mode) + && GET_MODE_SIZE (mode) <= UNITS_PER_WORD) + { + /* cmov. */ + *total = COSTS_N_INSNS (1); + if (!REG_P (XEXP (x, 0))) + *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed); + if (!REG_P (XEXP (x, 1))) + *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed); + if (!REG_P (XEXP (x, 2))) + *total += rtx_cost (XEXP (x, 2), mode, code, 2, speed); + return true; + } + return false; + default: return false; } diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8b3163f..828c699 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -23858,6 +23858,26 @@ "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "sse4arg")]) +;; Recognize XOP's vpcmov from canonical (xor (and (xor t f) c) f) +(define_split + [(set (match_operand:V_128_256 0 "register_operand") + (xor:V_128_256 + (and:V_128_256 + (xor:V_128_256 (match_operand:V_128_256 1 "register_operand") + (match_operand:V_128_256 2 "register_operand")) + (match_operand:V_128_256 3 "nonimmediate_operand")) + (match_operand:V_128_256 4 "register_operand")))] + "TARGET_XOP + && (REGNO (operands[4]) == REGNO (operands[1]) + || REGNO (operands[4]) == REGNO (operands[2]))" + [(set (match_dup 0) (if_then_else:V_128_256 (match_dup 3) + (match_dup 5) + (match_dup 4)))] +{ + operands[5] = REGNO (operands[4]) == REGNO (operands[1]) ? operands[2] + : operands[1]; +}) + ;; XOP horizontal add/subtract instructions (define_insn "xop_phaddbw" [(set (match_operand:V8HI 0 "register_operand" "=x") diff --git a/gcc/testsuite/gcc.target/i386/xop-pcmov3.c b/gcc/testsuite/gcc.target/i386/xop-pcmov3.c new file mode 100644 index 0000000..6c40f33 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/xop-pcmov3.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mxop" } */ +typedef int v4si __attribute__ ((vector_size (16))); + +v4si foo(v4si c, v4si t, v4si f) +{ + return (c&t)|(~c&f); +} + +/* { dg-final { scan-assembler "vpcmov" } } */