From patchwork Fri Sep 17 15:32:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 45132 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E71623857421 for ; Fri, 17 Sep 2021 15:33:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E71623857421 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1631892791; bh=/GcUBoUx3W6DcOl+5Bntn5cfIlPGGZEj/zhgHe0RG6M=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Lch67b9b4AWuegfmVrJy2U0JVQOhFeplImKakiu1+FMjwd09vwKbqpAj/U5xvRtJJ 6bxajdJ5RsWcKeGYK3hgKoJPfGwQsSp4FgBw1K4fW+Ca/uS9jGr+EjlRhGFUTZ3qch W9dnZ+VK9OsUMagOYPD71IoUO7sQh2TzSxq6SS0w= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CC41B3858420 for ; Fri, 17 Sep 2021 15:32:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CC41B3858420 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6F976101E; Fri, 17 Sep 2021 08:32:41 -0700 (PDT) Received: from [10.57.71.131] (unknown [10.57.71.131]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DA2EC3F59C; Fri, 17 Sep 2021 08:32:40 -0700 (PDT) Subject: [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops To: "gcc-patches@gcc.gnu.org" References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> Message-ID: <4b403865-bb56-29a4-56d0-b18536925db6@arm.com> Date: Fri, 17 Sep 2021 16:32:48 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> Content-Language: en-US X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches" From: "Andre Vieira (lists)" Reply-To: "Andre Vieira \(lists\)" Cc: Richard Sandiford , Richard Biener Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, This patch changes the order in which we check outside and inside costs for epilogue loops, this is to ensure that a predicated epilogue is more likely to be picked over an unpredicated one, since it saves having to enter a scalar epilogue loop. gcc/ChangeLog:         * tree-vect-loop.c (vect_better_loop_vinfo_p): Change how epilogue loop costs are compared. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 14f8150d7c262b9422784e0e997ca4387664a20a..038af13a91d43c9f09186d042cf415020ea73a38 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2881,17 +2881,75 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo, return new_simdlen_p; } + loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo); + if (main_loop) + { + poly_uint64 main_poly_vf = LOOP_VINFO_VECT_FACTOR (main_loop); + unsigned HOST_WIDE_INT main_vf; + unsigned HOST_WIDE_INT old_factor, new_factor, old_cost, new_cost; + /* If we can determine how many iterations are left for the epilogue + loop, that is if both the main loop's vectorization factor and number + of iterations are constant, then we use them to calculate the cost of + the epilogue loop together with a 'likely value' for the epilogues + vectorization factor. Otherwise we use the main loop's vectorization + factor and the maximum poly value for the epilogue's. If the target + has not provided with a sensible upper bound poly vectorization + factors are likely to be favored over constant ones. */ + if (main_poly_vf.is_constant (&main_vf) + && LOOP_VINFO_NITERS_KNOWN_P (main_loop)) + { + unsigned HOST_WIDE_INT niters + = LOOP_VINFO_INT_NITERS (main_loop) % main_vf; + HOST_WIDE_INT old_likely_vf + = estimated_poly_value (old_vf, POLY_VALUE_LIKELY); + HOST_WIDE_INT new_likely_vf + = estimated_poly_value (new_vf, POLY_VALUE_LIKELY); + + /* If the epilogue is using partial vectors we account for the + partial iteration here too. */ + old_factor = niters / old_likely_vf; + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo) + && niters % old_likely_vf != 0) + old_factor++; + + new_factor = niters / new_likely_vf; + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo) + && niters % new_likely_vf != 0) + new_factor++; + } + else + { + unsigned HOST_WIDE_INT main_vf_max + = estimated_poly_value (main_poly_vf, POLY_VALUE_MAX); + + old_factor = main_vf_max / estimated_poly_value (old_vf, + POLY_VALUE_MAX); + new_factor = main_vf_max / estimated_poly_value (new_vf, + POLY_VALUE_MAX); + + /* If the loop is not using partial vectors then it will iterate one + time less than one that does. It is safe to subtract one here, + because the main loop's vf is always at least 2x bigger than that + of an epilogue. */ + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo)) + old_factor -= 1; + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo)) + new_factor -= 1; + } + + /* Compute the costs by multiplying the inside costs with the factor and + add the outside costs for a more complete picture. The factor is the + amount of times we are expecting to iterate this epilogue. */ + old_cost = old_loop_vinfo->vec_inside_cost * old_factor; + new_cost = new_loop_vinfo->vec_inside_cost * new_factor; + old_cost += old_loop_vinfo->vec_outside_cost; + new_cost += new_loop_vinfo->vec_outside_cost; + return new_cost < old_cost; + } + /* Limit the VFs to what is likely to be the maximum number of iterations, to handle cases in which at least one loop_vinfo is fully-masked. */ - HOST_WIDE_INT estimated_max_niter; - loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo); - unsigned HOST_WIDE_INT main_vf; - if (main_loop - && LOOP_VINFO_NITERS_KNOWN_P (main_loop) - && LOOP_VINFO_VECT_FACTOR (main_loop).is_constant (&main_vf)) - estimated_max_niter = LOOP_VINFO_INT_NITERS (main_loop) % main_vf; - else - estimated_max_niter = likely_max_stmt_executions_int (loop); + HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop); if (estimated_max_niter != -1) { if (known_le (estimated_max_niter, new_vf))