From patchwork Tue Sep 20 02:21:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 57782 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9F9723858438 for ; Tue, 20 Sep 2022 02:24:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9F9723858438 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663640646; bh=u3ijWMYjEZFVLZN+u4Eul0Q2gW10nmPZkOTyx1betcE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=j5Bk16CfXK8DVwtNzwwjd0L85eV6Wqui/PumEHTMZ+KN9P4iKMOQ1IbpTbBmHnNZm DzGwkKA8ZDqcx1ywhSYSU5LULwhDSMEa19qcqb489S1F/mPEHFzcg2DowM44cyqp2g jQb4Fo/nePrPVmkOV7LGrSRsmwghFRcVQE3VDYIs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 995DF385840F for ; Tue, 20 Sep 2022 02:23:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 995DF385840F X-IronPort-AV: E=McAfee;i="6500,9779,10475"; a="297166706" X-IronPort-AV: E=Sophos;i="5.93,329,1654585200"; d="scan'208";a="297166706" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2022 19:23:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,329,1654585200"; d="scan'208";a="794085508" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga005.jf.intel.com with ESMTP; 19 Sep 2022 19:23:34 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 0AA551005165; Tue, 20 Sep 2022 10:23:34 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix incorrect handle in vectorizable_induction for mixed induction type. Date: Tue, 20 Sep 2022 10:21:33 +0800 Message-Id: <20220920022133.64778-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The codes in vectorizable_induction for slp_node assume all phi_info have same induction type(vect_step_op_add), but since we support nonlinear induction, it could be wrong handled. So the patch return false when slp_node has mixed induction type. Note codes in other place will still vectorize the induction with separate iv update and vec_perm. But slp_node handle in vectorizable_induction will be more optimal when all induction type are the same, it will update ivs with one operation instead of separate iv updates and permutation. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR tree-optimization/103144 * tree-vect-loop.cc (vectorizable_induction): Return false for slp_node with mixed induction type. gcc/testsuite/ChangeLog: * gcc.target/i386/pr103144-mix-1.c: New test. * gcc.target/i386/pr103144-mix-2.c: New test. --- .../gcc.target/i386/pr103144-mix-1.c | 17 +++++++++ .../gcc.target/i386/pr103144-mix-2.c | 35 +++++++++++++++++++ gcc/tree-vect-loop.cc | 34 ++++++++++++++---- 3 files changed, 79 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mix-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mix-2.c diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c b/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c new file mode 100644 index 00000000000..b292d66ef71 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "optimized" } } */ +/* For induction variable with differernt induction type(vect_step_op_add, vect_step_op_neg), + It should't be handled in vectorizable_induction with just 1 single iv update(addition.), + separate iv update and vec_perm are needed. */ +int +__attribute__((noipa)) +foo (int* p, int c, int n) +{ + for (int i = 0; i != n; i++) + { + p[2* i]= i; + p[2 * i+1] = c; + c = -c; + } +} diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c b/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c new file mode 100644 index 00000000000..b7043d59aec --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */ +/* { dg-require-effective-target avx2 } */ + +#include "avx2-check.h" +#include +#include "pr103144-mix-1.c" + +typedef int v8si __attribute__((vector_size(32))); + +#define N 34 +void +avx2_test (void) +{ + int* epi32_exp = (int*) malloc (N * sizeof (int)); + int* epi32_dst = (int*) malloc (N * sizeof (int)); + + __builtin_memset (epi32_exp, 0, N * sizeof (int)); + int b = 8; + v8si init1 = __extension__(v8si) { 0, b, 1, -b, 2, b, 3, -b }; + v8si init2 = __extension__(v8si) { 4, b, 5, -b, 6, b, 7, -b }; + v8si init3 = __extension__(v8si) { 8, b, 9, -b, 10, b, 11, -b }; + v8si init4 = __extension__(v8si) { 12, b, 13, -b, 14, b, 15, -b }; + memcpy (epi32_exp, &init1, 32); + memcpy (epi32_exp + 8, &init2, 32); + memcpy (epi32_exp + 16, &init3, 32); + memcpy (epi32_exp + 24, &init4, 32); + epi32_exp[32] = 16; + epi32_exp[33] = b; + foo (epi32_dst, b, N / 2); + if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0) + __builtin_abort (); + + return; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 9c434b66c5b..c7050a47c1c 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9007,14 +9007,34 @@ vectorizable_induction (loop_vec_info loop_vinfo, iv_loop = loop; gcc_assert (iv_loop == (gimple_bb (phi))->loop_father); - if (slp_node && !nunits.is_constant ()) + if (slp_node) { - /* The current SLP code creates the step value element-by-element. */ - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "SLP induction not supported for variable-length" - " vectors.\n"); - return false; + if (!nunits.is_constant ()) + { + /* The current SLP code creates the step value element-by-element. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "SLP induction not supported for variable-length" + " vectors.\n"); + return false; + } + + stmt_vec_info phi_info; + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, phi_info) + { + if (STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info) != vect_step_op_add) + { + /* The below SLP code assume all induction type to be the same. + But slp in other place will still vectorize the loop via updating + iv update separately + vec_perm, but not from below codes. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "SLP induction not supported for mixed induction type" + " vectors.\n"); + return false; + } + } + } if (FLOAT_TYPE_P (vectype) && !param_vect_induction_float)