From patchwork Tue Nov 9 02:09:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 47239 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2B100385843D for ; Tue, 9 Nov 2021 02:10:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2B100385843D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1636423815; bh=1AnHghtr7BHvIpSP7dsv7GpWTaa+w3pXM5GLe3v/tC0=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=T+p/UOHp1mZDU54FirBg5N6EcHxHZJXTu0GLaNlGbK0h1H8Pu+nvoPcuo1O6b21CA E2f4Pv8/AtU6X7Z7omyt7RKpt/J0Q0hdgjaVZ0hlYICPA93J7LNcI048Drgz74kOq/ +gaSytqBQtqUuAegelqRgQDLMB+0p3hxzzlwjdfM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 204313858D39 for ; Tue, 9 Nov 2021 02:09:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 204313858D39 X-IronPort-AV: E=McAfee;i="6200,9189,10162"; a="232205693" X-IronPort-AV: E=Sophos;i="5.87,218,1631602800"; d="scan'208";a="232205693" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2021 18:09:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,218,1631602800"; d="scan'208";a="669240319" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga005.jf.intel.com with ESMTP; 08 Nov 2021 18:09:42 -0800 Received: from shliclel051.sh.intel.com (shliclel051.sh.intel.com [10.239.236.51]) by scymds01.sc.intel.com with ESMTP id 1A929eoR015159; Mon, 8 Nov 2021 18:09:41 -0800 To: gcc-patches@gcc.gnu.org Subject: [PATCH] [pass_if_conversion] Extend is_cond_scalar_reduction to handle bit_and/bit_xor/bit_ior. Date: Tue, 9 Nov 2021 10:09:40 +0800 Message-Id: <20211109020940.108983-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This will enable transformation like - # sum1_50 = PHI - # sum2_52 = PHI + # sum1_50 = PHI <_87(13), 0(4)> + # sum2_52 = PHI <_89(13), 0(4)> # ivtmp_62 = PHI i.2_7 = (long unsigned int) i_49; _8 = i.2_7 * 8; ... vec1_i_38 = vec1_29 >> _10; vec2_i_39 = vec2_31 >> _10; _11 = vec1_i_38 & 1; - _63 = tmp_37 ^ sum1_50; - prephitmp_64 = _11 == 0 ? sum1_50 : _63; + _ifc__86 = _11 != 0 ? tmp_37 : 0; + _87 = sum1_50 ^ _ifc__86; _12 = vec2_i_39 & 1; : so that vectorizer won't failed due to /* If this isn't a nested cycle or if the nested cycle reduction value is used ouside of the inner loop we cannot handle uses of the reduction value. */ if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "reduction used in loop.\n"); return NULL; } Bootstrap and regtest on x86_64-pc-linux-gnu{-m32,} Ok for trunk? gcc/ChangeLog: PR tree-optimization/103126 * tree-if-conv.c (is_cond_scalar_reduction): Handle BIT_XOR_EXPR/BIT_IOR_EXPR/BIT_AND_EXPR. (convert_scalar_cond_reduction): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/ifcvt-reduction-logic-op.c: New test. --- .../i386/ifcvt-reduction-logic-op.c | 80 +++++++++++++++++++ gcc/tree-if-conv.c | 19 +++-- 2 files changed, 92 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/ifcvt-reduction-logic-op.c diff --git a/gcc/testsuite/gcc.target/i386/ifcvt-reduction-logic-op.c b/gcc/testsuite/gcc.target/i386/ifcvt-reduction-logic-op.c new file mode 100644 index 00000000000..eeb822d5d43 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/ifcvt-reduction-logic-op.c @@ -0,0 +1,80 @@ +/* PR tree-optimization/103126. */ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mavx2 -ftree-vectorize -fdump-tree-vect-details" } */ +/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 3 "vect" } } */ +#include + +void xor_bit_arr_nolcd (uint64_t *__restrict mat, uint64_t* a,uint64_t* b, uint64_t *__restrict ans, + int64_t n) +{ + int64_t i; + uint64_t vec1, sum1; + uint64_t vec2, sum2; + + while (n > 0) { + sum1 = 0; + vec1 = a[n]; + sum2 = 0; + vec2 = b[n]; + + for (i = 0; i < 64; i++) { + uint64_t tmp = mat[i]; + uint64_t vec1_i = (vec1 >> i); + uint64_t vec2_i = (vec2 >> i); + sum1 ^= (vec1_i & 1) ? tmp : 0; + if (vec2_i&1) sum2 ^= tmp; + } + *ans++ ^= sum1; n--; + *ans++ ^= sum2; n--; + } +} + +void ior_bit_arr_nolcd (uint64_t *__restrict mat, uint64_t* a,uint64_t* b, uint64_t *__restrict ans, + int64_t n) +{ + int64_t i; + uint64_t vec1, sum1; + uint64_t vec2, sum2; + + while (n > 0) { + sum1 = 0; + vec1 = a[n]; + sum2 = 0; + vec2 = b[n]; + + for (i = 0; i < 64; i++) { + uint64_t tmp = mat[i]; + uint64_t vec1_i = (vec1 >> i); + uint64_t vec2_i = (vec2 >> i); + sum1 |= (vec1_i & 1) ? tmp : 0; + if (vec2_i&1) sum2 |= tmp; + } + *ans++ |= sum1; n--; + *ans++ |= sum2; n--; + } +} + +void and_bit_arr_nolcd (uint64_t *__restrict mat, uint64_t* a,uint64_t* b, uint64_t *__restrict ans, + int64_t n) +{ + int64_t i; + uint64_t vec1, sum1; + uint64_t vec2, sum2; + + while (n > 0) { + sum1 = -1; + vec1 = a[n]; + sum2 = 0; + vec2 = b[n]; + + for (i = 0; i < 64; i++) { + uint64_t tmp = mat[i]; + uint64_t vec1_i = (vec1 >> i); + uint64_t vec2_i = (vec2 >> i); + sum1 &= (vec1_i & 1) ? tmp : -1; + if (vec2_i&1) sum2 &= tmp; + } + *ans++ &= sum1; n--; + *ans++ &= sum2; n--; + } +} diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index b165dc0c17f..7df1103ff89 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -1732,7 +1732,9 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc, tree arg_0, tree arg_1, reduction_op = gimple_assign_rhs_code (stmt); } - if (reduction_op != PLUS_EXPR && reduction_op != MINUS_EXPR) + if (reduction_op != PLUS_EXPR && reduction_op != MINUS_EXPR + && reduction_op != BIT_IOR_EXPR && reduction_op != BIT_XOR_EXPR + && reduction_op != BIT_AND_EXPR) return false; r_op1 = gimple_assign_rhs1 (stmt); r_op2 = gimple_assign_rhs2 (stmt); @@ -1742,7 +1744,7 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc, tree arg_0, tree arg_1, /* Make R_OP1 to hold reduction variable. */ if (r_nop2 == PHI_RESULT (header_phi) - && reduction_op == PLUS_EXPR) + && commutative_tree_code (reduction_op)) { std::swap (r_op1, r_op2); std::swap (r_nop1, r_nop2); @@ -1811,7 +1813,10 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi, tree rhs1 = gimple_assign_rhs1 (reduc); tree tmp = make_temp_ssa_name (TREE_TYPE (rhs1), NULL, "_ifc_"); tree c; - tree zero = build_zero_cst (TREE_TYPE (rhs1)); + enum tree_code reduction_op = gimple_assign_rhs_code (reduc); + tree op_nochange = reduction_op != BIT_AND_EXPR + ? build_zero_cst (TREE_TYPE (rhs1)) + : build_minus_one_cst (TREE_TYPE (rhs1)); gimple_seq stmts = NULL; if (dump_file && (dump_flags & TDF_DETAILS)) @@ -1824,14 +1829,14 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi, of reduction rhs. */ c = fold_build_cond_expr (TREE_TYPE (rhs1), unshare_expr (cond), - swap ? zero : op1, - swap ? op1 : zero); + swap ? op_nochange : op1, + swap ? op1 : op_nochange); /* Create assignment stmt and insert it at GSI. */ new_assign = gimple_build_assign (tmp, c); gsi_insert_before (gsi, new_assign, GSI_SAME_STMT); - /* Build rhs for unconditional increment/decrement. */ - rhs = gimple_build (&stmts, gimple_assign_rhs_code (reduc), + /* Build rhs for unconditional increment/decrement/logic_operation. */ + rhs = gimple_build (&stmts, reduction_op, TREE_TYPE (rhs1), op0, tmp); if (has_nop)