From patchwork Wed Nov 27 03:25:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 101940 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B06023858435 for ; Wed, 27 Nov 2024 03:26:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B06023858435 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=aQLcPriJ X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by sourceware.org (Postfix) with ESMTPS id 3DA853858D37 for ; Wed, 27 Nov 2024 03:25:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3DA853858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3DA853858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732677932; cv=none; b=JcSEA4v50FiCqA/Ih6t+t9MtmlD6ctO1675bfum9DA7SE1uEou8K9IpgyaKJTBCROEQFyZlCaXKk4mFcQp1VLMO3yCYEV6mJMxNn4SxJ8FDzABMp56RQoGUo+X++M6D+sOHQmC0qUlI0cQznJ41I4GV8WkT0e9gXZdozc18oiFQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732677932; c=relaxed/simple; bh=aD2m8UNZTHVsalu3WvzPRCWrZk6zQKsK5c3acZfwD8o=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=AwzksTECZg011Tf3F3e6gelJEXTYPAJL7s39X+IYaLBsC8pHQaF+j4OYRQO+If98FAZW02XYcsN+t9w8zBphge59b0KWLnOfPC0ebeN6HUigZdEt3YbBBnLp6vEk0j1lwfSuyvs0RFOgP4BCdSLK67OgZRo2Ji5J1OxA6nKymU4= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3DA853858D37 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1732677932; x=1764213932; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=aD2m8UNZTHVsalu3WvzPRCWrZk6zQKsK5c3acZfwD8o=; b=aQLcPriJsG7qF+ww3MEw0maIRvdUZvtqKp8Wdk9BEjYPkOaFcExVTBQo 41I5BEv5yPY4AK5B2Ci0A+ing8D1R7aCNXJ+raH4rZOXJO9NTsp6LXK4o eRxo3cTMiGzcMAHy76nEYWQwGp1Cn1DN980PE+/ydqCnnIgtv1NfDFgs5 Fer52xICBckZsmDomKwGnY9fVKEPOYkvfAvNTUUW1/0NHsJh4GnS6VsDm RAwenEGECWM+eAU5buXnOS8qlGXBEwf2lbCVcU0OQmKGBbEg5/4HJhLaa ezMivKkthf5dDIPxJhag8/9kCG7lv+x2bF9DiZjrjhEuFICRVf7l423Xh A==; X-CSE-ConnectionGUID: IAuwuoqLRVadpdnuOEVwmg== X-CSE-MsgGUID: 2CwG1xY7Tlasn9WB1sAlVA== X-IronPort-AV: E=McAfee;i="6700,10204,11268"; a="50393944" X-IronPort-AV: E=Sophos;i="6.12,188,1728975600"; d="scan'208";a="50393944" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Nov 2024 19:25:31 -0800 X-CSE-ConnectionGUID: XXVnppQJQOuR6+lYwz5qzg== X-CSE-MsgGUID: MfSIU4tSR82THfnXjCKJ3w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,188,1728975600"; d="scan'208";a="91702372" Received: from scymds04.sc.intel.com ([10.82.73.238]) by orviesa010.jf.intel.com with ESMTP; 26 Nov 2024 19:25:31 -0800 Received: from jfel-spr-6155.jf.intel.com (jfel-spr-6155.jf.intel.com [10.7.183.102]) by scymds04.sc.intel.com (Postfix) with ESMTP id C4AA92003B03; Tue, 26 Nov 2024 19:25:30 -0800 (PST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com Subject: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop. Date: Tue, 26 Nov 2024 19:25:30 -0800 Message-Id: <20241127032530.2968290-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org When loop requires any kind of versioning which could increase register pressure too much, and it's in a deeply nest big loop, don't do vectorization. I tested the patch with both Ofast and O2 for SPEC2017, besides 548.exchange_r, other benchmarks are same binary. Bootstrapped and regtested 0on x86_64-pc-linux-gnu{-m32,} Any comments? gcc/ChangeLog: pr target/117088 * config/i386/i386.cc (ix86_vector_costs::ix86_vect_in_deep_nested_loop_p): New function. (ix86_vector_costs::finish_cost): Prevent loop vectorization if it's in a deeply nested loop and require versioning. * config/i386/i386.opt (--param=vect-max-loop-depth=): New param. --- gcc/config/i386/i386.cc | 89 ++++++++++++++++++++++++++++++++++++++++ gcc/config/i386/i386.opt | 4 ++ 2 files changed, 93 insertions(+) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 526c9df7618..608f40413d2 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -25019,6 +25019,8 @@ private: /* Estimate register pressure of the vectorized code. */ void ix86_vect_estimate_reg_pressure (); + /* Check if vect_loop is in a deeply-nested loop. */ + bool ix86_vect_in_deep_nested_loop_p (class loop *vect_loop); /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used for estimation of register pressure. ??? Currently it's only used by vec_construct/scalar_to_vec @@ -25324,6 +25326,84 @@ ix86_vector_costs::ix86_vect_estimate_reg_pressure () } } +/* Return true if vect_loop is in a deeply-nested loop. + .i.e vect_loop_n in below loop structure. +loop1 +{ + loop2 + { + loop3 + { + vect_loop_1; + loop4 + { + vect_loop_2; + loop5 + { + vect_loop_3; + loop6 + { + vect_loop_4; + loop7 + { + vect_loop_5; + loop8 + { + loop9 + } + vect_loop_6; + } + vect_loop_7; + } + } + } + } + It's a big hammer to fix O2 regression for 548.exchange_r after vectorization + is enhanced by (r15-4225-g70c3db511ba14f) */ +bool +ix86_vector_costs::ix86_vect_in_deep_nested_loop_p (class loop *vect_loop) +{ + if (loop_depth (vect_loop) > (unsigned) ix86_vect_max_loop_depth) + return true; + + if (loop_depth (vect_loop) < 2) + return false; + + class loop* outer_loop = loop_outer (vect_loop); + + auto_vec m_loop_stack; + auto_sbitmap m_visited_loops (number_of_loops (cfun)); + + /* Get all sibling loops for vect_loop. */ + class loop* next_loop = outer_loop->inner; + for (; next_loop; next_loop = next_loop->next) + { + m_loop_stack.safe_push (next_loop); + bitmap_set_bit (m_visited_loops, next_loop->num); + } + + /* DFS the max depth of all sibling loop. */ + while (!m_loop_stack.is_empty ()) + { + next_loop = m_loop_stack.pop (); + if (loop_depth (next_loop) > (unsigned) ix86_vect_max_loop_depth) + return true; + + class loop* inner_loop = next_loop->inner; + while (inner_loop) + { + if (!bitmap_bit_p (m_visited_loops, inner_loop->num)) + { + m_loop_stack.safe_push (inner_loop); + bitmap_set_bit (m_visited_loops, inner_loop->num); + } + inner_loop = inner_loop->next; + } + } + + return false; +} + void ix86_vector_costs::finish_cost (const vector_costs *scalar_costs) { @@ -25344,6 +25424,15 @@ ix86_vector_costs::finish_cost (const vector_costs *scalar_costs) && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ()) > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo)))) m_costs[vect_body] = INT_MAX; + + /* Prohibit vectorization when the loop requires versioning + and loop_depth exceeds threshold. */ + if ((LOOP_REQUIRES_VERSIONING (loop_vinfo) + || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) + || LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) + || vect_apply_runtime_profitability_check_p (loop_vinfo)) + && ix86_vect_in_deep_nested_loop_p (LOOP_VINFO_LOOP (loop_vinfo))) + m_costs[vect_body] = INT_MAX; } ix86_vect_estimate_reg_pressure (); diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 99e86f545e8..c5abf83473d 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1414,3 +1414,7 @@ Support MOVRS built-in functions and code generation. mamx-movrs Target Mask(ISA2_AMX_MOVRS) Var(ix86_isa_flags2) Save Support AMX-MOVRS built-in functions and code generation. + +-param=vect-max-loop-depth= +Target Joined UInteger Var(ix86_vect_max_loop_depth) Init(8) Param +Preversion loop vectorization when it's in a deeply nested loop and requires versioning, since it may increase register pressure too much.