From patchwork Fri Mar 4 07:27:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 51558 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 16E613857C5A for ; Fri, 4 Mar 2022 07:28:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 16E613857C5A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1646378909; bh=ZQmfK3cyxwiWUbii05odrlreNChFn1cH3NooXOXM2PU=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=L0WpUNZB96n1MSZUGaaO5s9QLp/ENV+8vUiQwx+ju4o/xl0OTJtISCu0O2ww/70+u Ffqj4Ng3sIUyHc3lSUhPUb3K1vVrXP8kVdzdjTYIgsCVpsy27v8ZqruDBgDijgXkzB YHLfaYilr0cD5ZaVoma0D1/Jc2NrkK+INJiB130c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id BD0163857C49 for ; Fri, 4 Mar 2022 07:27:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BD0163857C49 X-IronPort-AV: E=McAfee;i="6200,9189,10275"; a="253646909" X-IronPort-AV: E=Sophos;i="5.90,154,1643702400"; d="scan'208";a="253646909" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Mar 2022 23:27:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,154,1643702400"; d="scan'208";a="609867843" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 03 Mar 2022 23:27:16 -0800 Received: from shliclel051.sh.intel.com (shliclel051.sh.intel.com [10.239.236.51]) by scymds01.sc.intel.com with ESMTP id 2247RF7W013034; Thu, 3 Mar 2022 23:27:15 -0800 To: gcc-patches@gcc.gnu.org Subject: [PATCH] [i386] Prevent vectorization for load from parm_decl at O2 to avoid STF issue. Date: Fri, 4 Mar 2022 15:27:14 +0800 Message-Id: <20220304072714.55713-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" For parameter passing through stack, vectorized load from parm_decl in callee may trigger serious STF issue. This is why GCC12 regresses 50% for cray at -O2 compared to GCC11. The patch add an extremely large number to stmt_cost to prevent vectorization for loads from parm_decl under very-cheap cost model, this can at least prevent O2 regression due to STF issue, but may lose some perf where there's no such issue(1 vector_load vs n scalar_load + CTOR). No impact for SPEC2017 for both plain O2 and native O2 on ICX. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/101908 * config/i386/i386.cc (ix86_load_maybe_stfs_p): New. (ix86_vector_costs::add_stmt_cost): Add extra cost for vector_load/unsigned_load which may have stall forward issue. gcc/testsuite/ChangeLog: * gcc.target/i386/pr101908-1.c: New test. * gcc.target/i386/pr101908-2.c: New test. --- gcc/config/i386/i386.cc | 31 ++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 +++++++++ gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 +++++++++ 3 files changed, 55 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index b2bf90576d5..3bbaaf65ea8 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -22976,6 +22976,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) return default_noce_conversion_profitable_p (seq, if_info); } +/* Return true if REF may have STF issue, otherwise false. */ +static bool +ix86_load_maybe_stfs_p (tree ref) +{ + tree addr = get_base_address (ref); + + if (TREE_CODE (addr) != PARM_DECL + || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (addr))) + || tree_to_uhwi (TYPE_SIZE (TREE_TYPE (addr))) <= MAX_BITS_PER_WORD) + return false; + return true; +} + /* x86-specific vector costs. */ class ix86_vector_costs : public vector_costs { @@ -23203,6 +23216,24 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, if (TREE_CODE (op) == SSA_NAME) TREE_VISITED (op) = 0; } + + /* Prevent vectorization for load from parm_decl at O2 to avoid STF issue. + Performance may lose when there's no STF issue(1 vector_load vs n + scalar_load + CTOR). + TODO: both extra cost(2000) and ix86_load_maybe_stfs_p need to be fine + tuned. */ + if ((kind == vector_load || kind == unaligned_load) + && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP + && stmt_info + && stmt_info->slp_type == pure_slp + && stmt_info->stmt + && gimple_assign_load_p (stmt_info->stmt) + && ix86_load_maybe_stfs_p (gimple_assign_rhs1 (stmt_info->stmt))) + { + stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); + stmt_cost += 2000; + } + if (stmt_cost == -1) stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); diff --git a/gcc/testsuite/gcc.target/i386/pr101908-1.c b/gcc/testsuite/gcc.target/i386/pr101908-1.c new file mode 100644 index 00000000000..f8e0f2e26bb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101908-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-slp-details" } */ +/* { dg-final { scan-tree-dump {(?n)add new stmt:.*MEM \} "slp2" } } */ + +struct X { double x[2]; }; +typedef double v2df __attribute__((vector_size(16))); + +v2df __attribute__((noipa)) +foo (struct X* x, struct X* y) +{ + return (v2df) {x->x[1], x->x[0] } + (v2df) { y->x[1], y->x[0] }; +} diff --git a/gcc/testsuite/gcc.target/i386/pr101908-2.c b/gcc/testsuite/gcc.target/i386/pr101908-2.c new file mode 100644 index 00000000000..7f2f00cebab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101908-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-slp-details" } */ +/* { dg-final { scan-tree-dump-not {(?n)add new stmt:.*MEM \} "slp2" } } */ + +struct X { double x[2]; }; +typedef double v2df __attribute__((vector_size(16))); + +v2df __attribute__((noipa)) +foo (struct X x, struct X y) +{ + return (v2df) {x.x[1], x.x[0] } + (v2df) { y.x[1], y.x[0] }; +}