From patchwork Fri Sep 17 15:31:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 45131 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2274A3857C5B for ; Fri, 17 Sep 2021 15:31:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2274A3857C5B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1631892709; bh=C7nLTs2FZpUeV5tWSzcWVWZj9vitC3zsCuFOA5ItMFA=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=iaQwSKooL1D4DL1Ed0Ke6wYHd6jWqLAOR6g7L42imMonkowdsriQw/JEtSjGUQLe8 Mt9RKGghTESz5CNNGC6eMy087WzbUZ5DLBB5Yvd3raxFBIGBNAfd3Tf4L7qPi2BQHC oKam9T+Ivl5+dbPspQHY5ydPGl68vHQDSUORzuXg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 7657B3858D29 for ; Fri, 17 Sep 2021 15:31:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7657B3858D29 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1F265101E; Fri, 17 Sep 2021 08:31:18 -0700 (PDT) Received: from [10.57.71.131] (unknown [10.57.71.131]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 693A23F59C; Fri, 17 Sep 2021 08:31:17 -0700 (PDT) Subject: [PATCH 1/3][vect] Add main vectorized loop unrolling To: "gcc-patches@gcc.gnu.org" References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> Message-ID: <27777876-4201-5e86-bf9a-063143d38641@arm.com> Date: Fri, 17 Sep 2021 16:31:24 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> Content-Language: en-US X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LOTSOFHASH, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches" From: "Andre Vieira (lists)" Reply-To: "Andre Vieira \(lists\)" Cc: Richard Sandiford , Richard Biener Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi all, This patch adds the ability to define a target hook to unroll the main vectorized loop. It also introduces --param's vect-unroll and vect-unroll-reductions to control this through a command-line. I found this useful to experiment and believe can help when tuning, so I decided to leave it in. We only unroll the main loop and have disabled unrolling epilogues for now. We also do not support unrolling of any loop that has a negative step and we do not support unrolling a loop with any reduction other than a TREE_CODE_REDUCTION. Bootstrapped and regression tested on aarch64-linux-gnu as part of the series. gcc/ChangeLog:         * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR         and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.         * doc/tm.texi.in: Add entries for target hooks above.         * params.opt: Add vect-unroll and vect-unroll-reductions parameters.         * target.def: Define hooks TARGET_VECTORIZE_UNROLL_FACTOR         and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.         * targhooks.c (default_add_stmt_cost_for_unroll): New.         (default_unroll_factor): Likewise.         * targhooks.h (default_add_stmt_cost_for_unroll): Likewise.         (default_unroll_factor): Likewise.         * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize         par_unrolling_factor.         (vect_update_vf_for_slp): Use unrolling factor to update vectorization         factor.         (vect_determine_partial_vectors_and_peeling): Account for unrolling.         (vect_determine_unroll_factor): Determine how much to unroll vectorized         main loop.         (vect_analyze_loop_2): Call vect_determine_unroll_factor.         (vect_analyze_loop): Allow for epilogue vectorization when unrolling         and rewalk vector_mode array for the epilogues.         (vectorizable_reduction): Disable single_defuse_cycle when unrolling.         * tree-vectorizer.h (vect_unroll_value): Declare par_unrolling_factor         as a member of loop_vec_info. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f68f42638a112bed8396fd634bd3fd3c44ce848a..3bc9694d2162055d3db165ef888f35deb676548b 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6283,6 +6283,19 @@ allocated by TARGET_VECTORIZE_INIT_COST. The default releases the accumulator. @end deftypefn +@deftypefn {Target Hook} void TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL (class vec_info *@var{vinfo}, class _stmt_vec_info *@var{stmt_info}, void *@var{data}) +This hook should update the target-specific @var{data} relative +relative to the statement represented by @var{stmt_vinfo} to be used +later to determine the unrolling factor for this loop using the current +vectorization factor. +@end deftypefn + +@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_UNROLL_FACTOR (class vec_info *@var{vinfo}, void *@var{data}) +This hook should return the desired vector unrolling factor for a loop with +@var{vinfo} based on the target-specific @var{data}. The default returns one, +which means no unrolling will be performed. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_GATHER (const_tree @var{mem_vectype}, const_tree @var{index_type}, int @var{scale}) Target builtin that implements vector gather operation. @var{mem_vectype} is the vector type of the load and @var{index_type} is scalar type of diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index fdf16b901c537e6a02f630a80a2213d2dcb6d5d6..40f4cb02c34f575439f35070301855ddaf82a21a 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4195,6 +4195,10 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_DESTROY_COST_DATA +@hook TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL + +@hook TARGET_VECTORIZE_UNROLL_FACTOR + @hook TARGET_VECTORIZE_BUILTIN_GATHER @hook TARGET_VECTORIZE_BUILTIN_SCATTER diff --git a/gcc/params.opt b/gcc/params.opt index f414dc1a61cfa9d5b9ded75e96560fc1f73041a5..00f92d4484797df0dbbad052f45205469cbb2c49 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1117,4 +1117,12 @@ Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 10000) Param Optimization The maximum factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. +-param=vect-unroll= +Common Joined UInteger Var(param_vect_unroll) Init(0) IntegerRange(0, 32) Param Optimization +Controls how many times the vectorizer tries to unroll loops. Also see vect-unroll-reductions. + +-param=vect-unroll-reductions= +Common Joined UInteger Var(param_vect_unroll_reductions) Init(0) IntegerRange(0, 32) Param Optimization +Controls how many times the vectorizer tries to unroll loops that contain associative reductions. 0 means that such loops should be unrolled vect-unroll times. + ; This comment is to ensure we retain the blank line above. diff --git a/gcc/target.def b/gcc/target.def index 28a34f1d51b5abb41c537b9cd327ca59f1f9260f..0eac529f17bd981b6494fe613117f28803a02390 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2095,6 +2095,28 @@ accumulator.", (void *data), default_destroy_cost_data) +/* Target function to record cost approximation to be used by + TARGET_VECTORIZE_UNROLL_FACTOR. */ +DEFHOOK +(add_stmt_cost_for_unroll, + "This hook should update the target-specific @var{data} relative\n\ +relative to the statement represented by @var{stmt_vinfo} to be used\n\ +later to determine the unrolling factor for this loop using the current\n\ +vectorization factor.", + void, + (class vec_info *vinfo, class _stmt_vec_info *stmt_info, void *data), + default_add_stmt_cost_for_unroll) + +/* Function to determine unroll factor for vectorization. */ +DEFHOOK +(unroll_factor, + "This hook should return the desired vector unrolling factor for a loop with\n\ +@var{vinfo} based on the target-specific @var{data}. The default returns one,\n\ +which means no unrolling will be performed.", + unsigned, + (class vec_info *vinfo, void *data), + default_unroll_factor) + HOOK_VECTOR_END (vectorize) #undef HOOK_PREFIX diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 92d51992e625c2497aa8496b1e2e3d916e5706fd..d285c24d6d398cfabb58c291fd2dcbfa6e1bd8f6 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -125,6 +125,9 @@ extern unsigned default_add_stmt_cost (class vec_info *, void *, int, enum vect_cost_model_location); extern void default_finish_cost (void *, unsigned *, unsigned *, unsigned *); extern void default_destroy_cost_data (void *); +extern void default_add_stmt_cost_for_unroll (class vec_info *, + class _stmt_vec_info *, void *); +extern unsigned default_unroll_factor (class vec_info *, void *); /* OpenACC hooks. */ extern bool default_goacc_validate_dims (tree, int [], int, unsigned); diff --git a/gcc/targhooks.c b/gcc/targhooks.c index c9b5208853dbc15706a65d1eb335e28e0564325e..9bc7e80e5a67129633dab99a871b6babff65de97 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1535,6 +1535,26 @@ default_destroy_cost_data (void *data) free (data); } +/* By default, we do not perform unrolling so this function does not need + to do anything. */ +void +default_add_stmt_cost_for_unroll (class vec_info *vinfo ATTRIBUTE_UNUSED, + class _stmt_vec_info *stmt_info + ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) +{ +} + + +/* By default, return a vector unroll factor of one, meaning no unrolling will + be performed. */ +unsigned +default_unroll_factor (class vec_info *vinfo ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) +{ + return 1; +} + /* Determine whether or not a pointer mode is valid. Assume defaults of ptr_mode or Pmode - can be overridden. */ bool diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 0c8d992624b59ddd056aff594738305d6be5afa8..14f8150d7c262b9422784e0e997ca4387664a20a 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -828,6 +828,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) skip_main_loop_edge (nullptr), skip_this_loop_edge (nullptr), reusable_accumulators (), + par_unrolling_factor (1), max_vectorization_factor (0), mask_skip_niters (NULL_TREE), rgroup_compare_type (NULL_TREE), @@ -1594,6 +1595,7 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo) dump_printf_loc (MSG_NOTE, vect_location, "Loop contains only SLP stmts\n"); vectorization_factor = LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo); + vectorization_factor *= loop_vinfo->par_unrolling_factor; } else { @@ -2131,7 +2133,8 @@ vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo, ??? We could then end up failing to use partial vectors if we decide to peel iterations into a prologue, and if the main loop then ends up processing fewer than VF iterations. */ - if (param_vect_partial_vector_usage == 1 + if ((param_vect_partial_vector_usage == 1 + || loop_vinfo->par_unrolling_factor > 1) && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) && !vect_known_niters_smaller_than_vf (loop_vinfo)) LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; @@ -2192,6 +2195,101 @@ vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo, return opt_result::success (); } + +static poly_uint64 +vect_determine_unroll_factor (loop_vec_info loop_vinfo) +{ + stmt_vec_info stmt_info; + unsigned i; + bool seen_reduction_p = false; + bool can_unroll_p = !LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); + poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + + if (!can_unroll_p) + return vectorization_factor; + + DUMP_VECT_SCOPE ("vect_determine_unroll_factor"); + + void *target_cost_data = init_cost (loop_vinfo->loop, true); + + FOR_EACH_VEC_ELT (loop_vinfo->stmt_vec_infos, i, stmt_info) + { + if (STMT_VINFO_IN_PATTERN_P (stmt_info) + || !STMT_VINFO_RELEVANT_P (stmt_info) + || stmt_info->vectype == NULL_TREE) + continue; + /* Do not unroll loops with negative steps as it is unlikely that + vectorization will succeed due to the way we deal with negative steps + in loads and stores in 'get_load_store_type'. */ + if (stmt_info->dr_aux.dr + && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + { + dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info); + tree step = vect_dr_behavior (loop_vinfo, dr_info)->step; + if (TREE_CODE (step) == INTEGER_CST + && tree_int_cst_compare (step, size_zero_node) < 0) + { + can_unroll_p = false; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "could not unroll due to negative step\n"); + break; + } + } + + if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def) + { + auto red_info = info_for_reduction (loop_vinfo, stmt_info); + if (STMT_VINFO_REDUC_TYPE (red_info) == TREE_CODE_REDUCTION) + seen_reduction_p = true; + else + { + can_unroll_p = false; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "could not unroll due to unsupported " + "reduction\n"); + break; + } + } + + targetm.vectorize.add_stmt_cost_for_unroll (loop_vinfo, stmt_info, + target_cost_data); + } + + if (!can_unroll_p) + { + return vectorization_factor; + } + + unsigned int unrolling_factor = 1; + if (maybe_gt (vectorization_factor, 1U)) + unrolling_factor = vect_unroll_value (loop_vinfo, seen_reduction_p, + target_cost_data); + + + destroy_cost_data (target_cost_data); + + while (unrolling_factor > 1) + { + poly_uint64 candidate_factor = vectorization_factor * unrolling_factor; + if (estimated_poly_value (candidate_factor, POLY_VALUE_MAX) + <= (HOST_WIDE_INT) LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)) + { + vectorization_factor = candidate_factor; + break; + } + unrolling_factor /= 2; + } + loop_vinfo->par_unrolling_factor = unrolling_factor; + LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "unrolling factor = %d\n", + unrolling_factor); + + return vectorization_factor; +} + /* Function vect_analyze_loop_2. Apply a set of analyses on LOOP, and create a loop_vec_info struct @@ -2320,6 +2418,8 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal, unsigned *n_stmts) "can't determine vectorization factor.\n"); return ok; } + + vect_determine_unroll_factor (loop_vinfo); if (max_vf != MAX_VECTORIZATION_FACTOR && maybe_lt (max_vf, LOOP_VINFO_VECT_FACTOR (loop_vinfo))) return opt_result::failure_at (vect_location, "bad data dependence.\n"); @@ -3062,7 +3162,14 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) gcc_assert (vect_epilogues); delete vinfos.pop (); } + /* Check if we may want to replace the current first_loop_vinfo + with the new loop, but only if they have different vector + modes. If they have the same vector mode this means the main + loop is an unrolled loop and we are trying to vectorize the + epilogue using the same vector mode but with a lower + vectorization factor. */ if (vinfos.is_empty () + && loop_vinfo->vector_mode != first_loop_vinfo->vector_mode && vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo)) { loop_vec_info main_loop_vinfo @@ -3156,10 +3263,26 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) The retry should be in the same mode as original. */ if (vect_epilogues && loop_vinfo - && LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo)) + && (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) + || loop_vinfo->par_unrolling_factor > 1)) { - gcc_assert (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + gcc_assert ((LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + || loop_vinfo->par_unrolling_factor > 1) && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)); + /* If we are unrolling, try all VECTOR_MODES for the epilogue. */ + if (loop_vinfo->par_unrolling_factor > 1) + { + next_vector_mode = vector_modes[0]; + mode_i = 1; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "***** Re-trying analysis with vector mode" + " %s for epilogue with partial vectors.\n", + GET_MODE_NAME (next_vector_mode)); + continue; + } + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "***** Re-trying analysis with same vector mode" @@ -7212,7 +7335,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo, participating. */ if (ncopies > 1 && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live) - && reduc_chain_length == 1) + && reduc_chain_length == 1 + && loop_vinfo->par_unrolling_factor == 1) single_defuse_cycle = true; if (single_defuse_cycle || lane_reduc_code_p) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 9c2c29d61fae5e651a112b103482131e3d646fb6..b51e82a0663a391a096480bff03a2191bc11dcf4 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -620,6 +620,11 @@ public: about the reductions that generated them. */ hash_map reusable_accumulators; + /* The number of times that we've unrolled the vector loop in order + to promote more ILP. This value is folded into vectorization_factor + (and therefore exactly divides vectorization_factor). */ + unsigned int par_unrolling_factor; + /* Maximum runtime vectorization factor, or MAX_VECTORIZATION_FACTOR if there is no particular limit. */ unsigned HOST_WIDE_INT max_vectorization_factor; @@ -1810,6 +1815,20 @@ vect_apply_runtime_profitability_check_p (loop_vec_info loop_vinfo) && th >= vect_vf_for_cost (loop_vinfo)); } +/* Return the number of times that we should unroll general + reduction-free loops. */ + +inline unsigned int +vect_unroll_value (loop_vec_info loop_vinfo, bool seen_reduction_p, void *data) +{ + if (seen_reduction_p && param_vect_unroll_reductions >= 1) + return param_vect_unroll_reductions; + if (param_vect_unroll >= 1) + return param_vect_unroll; + else + return targetm.vectorize.unroll_factor (loop_vinfo, data); +} + /* Source location + hotness information. */ extern dump_user_location_t vect_location; From patchwork Fri Sep 17 15:32:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 45132 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E71623857421 for ; Fri, 17 Sep 2021 15:33:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E71623857421 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1631892791; bh=/GcUBoUx3W6DcOl+5Bntn5cfIlPGGZEj/zhgHe0RG6M=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Lch67b9b4AWuegfmVrJy2U0JVQOhFeplImKakiu1+FMjwd09vwKbqpAj/U5xvRtJJ 6bxajdJ5RsWcKeGYK3hgKoJPfGwQsSp4FgBw1K4fW+Ca/uS9jGr+EjlRhGFUTZ3qch W9dnZ+VK9OsUMagOYPD71IoUO7sQh2TzSxq6SS0w= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CC41B3858420 for ; Fri, 17 Sep 2021 15:32:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CC41B3858420 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6F976101E; Fri, 17 Sep 2021 08:32:41 -0700 (PDT) Received: from [10.57.71.131] (unknown [10.57.71.131]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DA2EC3F59C; Fri, 17 Sep 2021 08:32:40 -0700 (PDT) Subject: [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops To: "gcc-patches@gcc.gnu.org" References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> Message-ID: <4b403865-bb56-29a4-56d0-b18536925db6@arm.com> Date: Fri, 17 Sep 2021 16:32:48 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> Content-Language: en-US X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches" From: "Andre Vieira (lists)" Reply-To: "Andre Vieira \(lists\)" Cc: Richard Sandiford , Richard Biener Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, This patch changes the order in which we check outside and inside costs for epilogue loops, this is to ensure that a predicated epilogue is more likely to be picked over an unpredicated one, since it saves having to enter a scalar epilogue loop. gcc/ChangeLog:         * tree-vect-loop.c (vect_better_loop_vinfo_p): Change how epilogue loop costs are compared. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 14f8150d7c262b9422784e0e997ca4387664a20a..038af13a91d43c9f09186d042cf415020ea73a38 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2881,17 +2881,75 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo, return new_simdlen_p; } + loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo); + if (main_loop) + { + poly_uint64 main_poly_vf = LOOP_VINFO_VECT_FACTOR (main_loop); + unsigned HOST_WIDE_INT main_vf; + unsigned HOST_WIDE_INT old_factor, new_factor, old_cost, new_cost; + /* If we can determine how many iterations are left for the epilogue + loop, that is if both the main loop's vectorization factor and number + of iterations are constant, then we use them to calculate the cost of + the epilogue loop together with a 'likely value' for the epilogues + vectorization factor. Otherwise we use the main loop's vectorization + factor and the maximum poly value for the epilogue's. If the target + has not provided with a sensible upper bound poly vectorization + factors are likely to be favored over constant ones. */ + if (main_poly_vf.is_constant (&main_vf) + && LOOP_VINFO_NITERS_KNOWN_P (main_loop)) + { + unsigned HOST_WIDE_INT niters + = LOOP_VINFO_INT_NITERS (main_loop) % main_vf; + HOST_WIDE_INT old_likely_vf + = estimated_poly_value (old_vf, POLY_VALUE_LIKELY); + HOST_WIDE_INT new_likely_vf + = estimated_poly_value (new_vf, POLY_VALUE_LIKELY); + + /* If the epilogue is using partial vectors we account for the + partial iteration here too. */ + old_factor = niters / old_likely_vf; + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo) + && niters % old_likely_vf != 0) + old_factor++; + + new_factor = niters / new_likely_vf; + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo) + && niters % new_likely_vf != 0) + new_factor++; + } + else + { + unsigned HOST_WIDE_INT main_vf_max + = estimated_poly_value (main_poly_vf, POLY_VALUE_MAX); + + old_factor = main_vf_max / estimated_poly_value (old_vf, + POLY_VALUE_MAX); + new_factor = main_vf_max / estimated_poly_value (new_vf, + POLY_VALUE_MAX); + + /* If the loop is not using partial vectors then it will iterate one + time less than one that does. It is safe to subtract one here, + because the main loop's vf is always at least 2x bigger than that + of an epilogue. */ + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo)) + old_factor -= 1; + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo)) + new_factor -= 1; + } + + /* Compute the costs by multiplying the inside costs with the factor and + add the outside costs for a more complete picture. The factor is the + amount of times we are expecting to iterate this epilogue. */ + old_cost = old_loop_vinfo->vec_inside_cost * old_factor; + new_cost = new_loop_vinfo->vec_inside_cost * new_factor; + old_cost += old_loop_vinfo->vec_outside_cost; + new_cost += new_loop_vinfo->vec_outside_cost; + return new_cost < old_cost; + } + /* Limit the VFs to what is likely to be the maximum number of iterations, to handle cases in which at least one loop_vinfo is fully-masked. */ - HOST_WIDE_INT estimated_max_niter; - loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo); - unsigned HOST_WIDE_INT main_vf; - if (main_loop - && LOOP_VINFO_NITERS_KNOWN_P (main_loop) - && LOOP_VINFO_VECT_FACTOR (main_loop).is_constant (&main_vf)) - estimated_max_niter = LOOP_VINFO_INT_NITERS (main_loop) % main_vf; - else - estimated_max_niter = likely_max_stmt_executions_int (loop); + HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop); if (estimated_max_niter != -1) { if (known_le (estimated_max_niter, new_vf))