From patchwork Tue Aug 9 08:16:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 56612 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ADDF93856DC0 for ; Tue, 9 Aug 2022 08:16:25 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id A5BE93858289 for ; Tue, 9 Aug 2022 08:16:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A5BE93858289 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=zT7aAOiVXd4pQFlwAgeFuUdbjsaZmwO6/cK2Hz8aA0g=; b=TKc7e8lsiwJyllo9rr9fHeQthb iouvd4KIFruOjpftAjjnUTxXvn6/lYcWTLTN9WLIIGaM8zUHIGmwPgpshvYYUdrRl0Vy7xiAEL9t2 P6NwGXZ6dDHdA5V5RYhqelzK7CDxUV3RH/5s7IZjuwmeB7NRusc5r0du1NuOVB6VIWCPwUsMITlQ9 2XwtD0T6729mQbmdI8vqfW+E3M+uvnWw6EG50lcIllQ3bjVMWPdyUqjZZMIUZSIjCWtC5ra5E7GXJ c4TkZIBDQl/GdhJjb2BhCyxImZtNnpnr7ojFN77ODrR4/B8hZPHa034LZwt+uxu2gde1wkJKUIQzW yWyZQoaw==; Received: from host86-169-41-119.range86-169.btcentralplus.com ([86.169.41.119]:51921 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oLKPE-00041T-OD; Tue, 09 Aug 2022 04:16:04 -0400 From: "Roger Sayle" To: Subject: [x86_64 PATCH] Use PTEST to perform AND in TImode STV of (A & B) != 0. Date: Tue, 9 Aug 2022 09:16:03 +0100 Message-ID: <004d01d8abc8$45845920$d08d0b60$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adirx2Cm2zrYsjTwRzatUDt33YN0uw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This x86_64 backend patch allows TImode STV to take advantage of the fact that the PTEST instruction performs an AND operation. Previously PTEST was (mostly) used for comparison against zero, by using the same operands. The benefits are demonstrated by the new test case: __int128 a,b; int foo() { return (a & b) != 0; } Currently with -O2 -msse4 we generate: movdqa a(%rip), %xmm0 pand b(%rip), %xmm0 xorl %eax, %eax ptest %xmm0, %xmm0 setne %al ret with this patch we now generate: movdqa a(%rip), %xmm0 xorl %eax, %eax ptest b(%rip), %xmm0 setne %al ret Technically, the magic happens using new define_insn_and_split patterns. Using two patterns allows this transformation to performed independently of whether TImode STV is run before or after combine. The one tricky case is that immediate constant operands of the AND behave slightly differently between TImode and V1TImode: All V1TImode immediate operands becomes loads, but for TImode only values that are not hilo_operands need to be loaded. Hence the new *testti_doubleword accepts any general_operand, but internally during split calls force_reg whenever the second operand is not x86_64_hilo_general_operand. This required (benefits from) some tweaks to TImode STV to support CONST_WIDE_INT in more places, using CONST_SCALAR_INT_P instead of just CONST_INT_P. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-08-09 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (scalar_chain::convert_compare): Create new pseudos only when/if needed. Add support for TEST (i.e. (COMPARE (AND x y) (const_int 0)), using UNSPEC_PTEST. When broadcasting V2DImode and V4SImode use new pseudo register. (timode_scalar_chain::convert_op): Do nothing if operand is already V1TImode. Avoid generating useless SUBREG conversions, i.e. (SUBREG:V1TImode (REG:V1TImode) 0). Handle CONST_WIDE_INT in addition to CONST_INT by using CONST_SCALAR_INT_P. (convertible_comparison_p): Use CONST_SCALAR_INT_P to match both CONST_WIDE_INT and CONST_INT. Recognize new *testti_doubleword pattern as an STV candidate. (timode_scalar_to_vector_candidate_p): Allow CONST_SCALAR_INT_P operands in binary logic operations. * config/i386/i386.cc (ix86_rtx_costs) : Add costs for UNSPEC_PTEST; a PTEST that performs an AND has the same cost as regular PTEST, i.e. cost->sse_op. * config/i386/i386.md (*testti_doubleword): New pre-reload define_insn_and_split that recognizes comparison of TI mode AND against zero. * config/i386/sse.md (*ptest_and): New pre-reload define_insn_and_split that recognizes UNSPEC_PTEST of identical AND operands. gcc/testsuite/ChangeLog * gcc.target/i386/sse4_1-stv-8.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 5e3a7ff..effc2f2 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -919,8 +919,7 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn) rtx scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn) { - rtx tmp = gen_reg_rtx (vmode); - rtx src; + rtx src, tmp; /* Comparison against anything other than zero, requires an XOR. */ if (op2 != const0_rtx) { @@ -929,6 +928,7 @@ scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn) /* If both operands are MEMs, explicitly load the OP1 into TMP. */ if (MEM_P (op1) && MEM_P (op2)) { + tmp = gen_reg_rtx (vmode); emit_insn_before (gen_rtx_SET (tmp, op1), insn); src = tmp; } @@ -943,34 +943,56 @@ scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn) rtx op12 = XEXP (op1, 1); convert_op (&op11, insn); convert_op (&op12, insn); - if (MEM_P (op11)) + if (!REG_P (op11)) { + tmp = gen_reg_rtx (vmode); emit_insn_before (gen_rtx_SET (tmp, op11), insn); op11 = tmp; } src = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, op11), op12); } + else if (GET_CODE (op1) == AND) + { + rtx op11 = XEXP (op1, 0); + rtx op12 = XEXP (op1, 1); + convert_op (&op11, insn); + convert_op (&op12, insn); + if (!REG_P (op11)) + { + tmp = gen_reg_rtx (vmode); + emit_insn_before (gen_rtx_SET (tmp, op11), insn); + op11 = tmp; + } + return gen_rtx_UNSPEC (CCmode, gen_rtvec (2, op11, op12), + UNSPEC_PTEST); + } else { convert_op (&op1, insn); src = op1; } - emit_insn_before (gen_rtx_SET (tmp, src), insn); + + if (!REG_P (src)) + { + tmp = gen_reg_rtx (vmode); + emit_insn_before (gen_rtx_SET (tmp, src), insn); + src = tmp; + } if (vmode == V2DImode) - emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (tmp), - copy_rtx_if_shared (tmp), - copy_rtx_if_shared (tmp)), - insn); + { + tmp = gen_reg_rtx (vmode); + emit_insn_before (gen_vec_interleave_lowv2di (tmp, src, src), insn); + src = tmp; + } else if (vmode == V4SImode) - emit_insn_before (gen_sse2_pshufd (copy_rtx_if_shared (tmp), - copy_rtx_if_shared (tmp), - const0_rtx), - insn); - - return gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (tmp), - copy_rtx_if_shared (tmp)), - UNSPEC_PTEST); + { + tmp = gen_reg_rtx (vmode); + emit_insn_before (gen_sse2_pshufd (tmp, src, const0_rtx), insn); + src = tmp; + } + + return gen_rtx_UNSPEC (CCmode, gen_rtvec (2, src, src), UNSPEC_PTEST); } /* Helper function for converting INSN to vector mode. */ @@ -1289,6 +1311,9 @@ timode_scalar_chain::fix_debug_reg_uses (rtx reg) void timode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) { + if (GET_MODE (*op) == V1TImode) + return; + *op = copy_rtx_if_shared (*op); if (REG_P (*op)) @@ -1296,19 +1321,19 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) else if (MEM_P (*op)) { rtx tmp = gen_reg_rtx (V1TImode); - emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (V1TImode, tmp, 0), + emit_insn_before (gen_rtx_SET (tmp, gen_gpr_to_xmm_move_src (V1TImode, *op)), insn); - *op = gen_rtx_SUBREG (V1TImode, tmp, 0); + *op = tmp; if (dump_file) fprintf (dump_file, " Preloading operand for insn %d into r%d\n", INSN_UID (insn), REGNO (tmp)); } - else if (CONST_INT_P (*op)) + else if (CONST_SCALAR_INT_P (*op)) { rtx vec_cst; - rtx tmp = gen_rtx_SUBREG (V1TImode, gen_reg_rtx (TImode), 0); + rtx tmp = gen_reg_rtx (V1TImode); /* Prefer all ones vector in case of -1. */ if (constm1_operand (*op, TImode)) @@ -1329,7 +1354,7 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) emit_insn_before (seq, insn); } - emit_insn_before (gen_move_insn (copy_rtx (tmp), vec_cst), insn); + emit_insn_before (gen_move_insn (tmp, vec_cst), insn); *op = tmp; } else @@ -1609,14 +1634,26 @@ convertible_comparison_p (rtx_insn *insn, enum machine_mode mode) rtx op2 = XEXP (src, 1); /* *cmp_doubleword. */ - if ((CONST_INT_P (op1) + if ((CONST_SCALAR_INT_P (op1) || ((REG_P (op1) || MEM_P (op1)) && GET_MODE (op1) == mode)) - && (CONST_INT_P (op2) + && (CONST_SCALAR_INT_P (op2) || ((REG_P (op2) || MEM_P (op2)) && GET_MODE (op2) == mode))) return true; + /* *testti_doubleword. */ + if (op2 == const0_rtx + && GET_CODE (op1) == AND + && REG_P (XEXP (op1, 0))) + { + rtx op12 = XEXP (op1, 1); + return GET_MODE (XEXP (op1, 0)) == TImode + && (CONST_SCALAR_INT_P (op12) + || ((REG_P (op12) || MEM_P (op12)) + && GET_MODE (op12) == TImode)); + } + /* *test_not_doubleword. */ if (op2 == const0_rtx && GET_CODE (op1) == AND @@ -1803,15 +1840,21 @@ timode_scalar_to_vector_candidate_p (rtx_insn *insn) if (!MEM_P (dst) && GET_CODE (XEXP (src, 0)) == NOT && REG_P (XEXP (XEXP (src, 0), 0)) - && (REG_P (XEXP (src, 1)) || timode_mem_p (XEXP (src, 1)))) + && (REG_P (XEXP (src, 1)) + || CONST_SCALAR_INT_P (XEXP (src, 1)) + || timode_mem_p (XEXP (src, 1)))) return true; return REG_P (XEXP (src, 0)) - && (REG_P (XEXP (src, 1)) || timode_mem_p (XEXP (src, 1))); + && (REG_P (XEXP (src, 1)) + || CONST_SCALAR_INT_P (XEXP (src, 1)) + || timode_mem_p (XEXP (src, 1))); case IOR: case XOR: return REG_P (XEXP (src, 0)) - && (REG_P (XEXP (src, 1)) || timode_mem_p (XEXP (src, 1))); + && (REG_P (XEXP (src, 1)) + || CONST_SCALAR_INT_P (XEXP (src, 1)) + || timode_mem_p (XEXP (src, 1))); case NOT: return REG_P (XEXP (src, 0)) || timode_mem_p (XEXP (src, 0)); diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 5e30dc8..5be76e1 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21063,11 +21063,25 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, case UNSPEC: if (XINT (x, 1) == UNSPEC_TP) *total = 0; - else if (XINT(x, 1) == UNSPEC_VTERNLOG) + else if (XINT (x, 1) == UNSPEC_VTERNLOG) { *total = cost->sse_op; return true; } + else if (XINT (x, 1) == UNSPEC_PTEST) + { + *total = cost->sse_op; + if (XVECLEN (x, 0) == 2 + && GET_CODE (XVECEXP (x, 0, 0)) == AND) + { + rtx andop = XVECEXP (x, 0, 0); + *total += rtx_cost (XEXP (andop, 0), GET_MODE (andop), + AND, opno, speed) + + rtx_cost (XEXP (andop, 1), GET_MODE (andop), + AND, opno, speed); + return true; + } + } return false; case VEC_SELECT: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index fd30c57..2fde8cd 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -9756,6 +9756,27 @@ [(set_attr "type" "test") (set_attr "mode" "QI")]) +;; Provide a *testti instruction that STV can implement using ptest. +;; This pattern splits into *andti3_doubleword and *cmpti_doubleword. +(define_insn_and_split "*testti_doubleword" + [(set (reg:CCZ FLAGS_REG) + (compare:CCZ + (and:TI (match_operand:TI 0 "register_operand") + (match_operand:TI 1 "general_operand")) + (const_int 0)))] + "TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 2) (and:TI (match_dup 0) (match_dup 1))) + (clobber (reg:CC FLAGS_REG))]) + (set (reg:CCZ FLAGS_REG) (compare:CCZ (match_dup 2) (const_int 0)))] +{ + operands[2] = gen_reg_rtx (TImode); + if (!x86_64_hilo_general_operand (operands[1], TImode)) + operands[1] = force_reg (TImode, operands[1]); +}) + ;; Combine likes to form bit extractions for some tests. Humor it. (define_insn_and_split "*testqi_ext_3" [(set (match_operand 0 "flags_reg_operand") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 14d12d1..ccd9d00 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -23021,6 +23021,19 @@ (set_attr "prefix" "orig,orig,vex") (set_attr "mode" "TI")]) +(define_insn_and_split "*ptest_and" + [(set (reg:CC FLAGS_REG) + (unspec:CC [(and:V_AVX (match_operand:V_AVX 0 "register_operand") + (match_operand:V_AVX 1 "vector_operand")) + (and:V_AVX (match_dup 0) (match_dup 1))] + UNSPEC_PTEST))] + "TARGET_SSE4_1 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (reg:CC FLAGS_REG) + (unspec:CC [(match_dup 0) (match_dup 1)] UNSPEC_PTEST))]) + (define_expand "nearbyint2" [(set (match_operand:VFH 0 "register_operand") (unspec:VFH diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-8.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-8.c new file mode 100644 index 0000000..5c5d803 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-8.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */ + +__int128 a,b; +int foo() +{ + return (a & b) != 0; +} + +/* { dg-final { scan-assembler-not "pand" } } */ +/* { dg-final { scan-assembler "ptest" } } */