From patchwork Tue Dec 7 11:27:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 48575 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 82A753858031 for ; Tue, 7 Dec 2021 11:27:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 82A753858031 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638876473; bh=Fw0h0kXirtZfFBc1F8NWpKzFbsnhwkwlkm2pXt5IGFU=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=D8qKY3fboXtxhhFQtbHFmIxDHR0Wc0WZCPg8N2a81m0CJN28SUjVcHvQqwU5cuwSZ kmU4DPmbWxSjyXa+vLAjFuH/NuCIzmxvUCfWjUE3aFW0LYeQ482spxBPgF8vHJoMh1 BlooBzqXBY8HOxxY1OI9cL3AxDLwgsnqyDbCjO1E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 7D5DA3858401 for ; Tue, 7 Dec 2021 11:27:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7D5DA3858401 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0A87B11FB; Tue, 7 Dec 2021 03:27:23 -0800 (PST) Received: from [10.57.3.27] (unknown [10.57.3.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6F6F03F73B; Tue, 7 Dec 2021 03:27:22 -0800 (PST) Message-ID: Date: Tue, 7 Dec 2021 11:27:22 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.3.2 Subject: [vect] Re-analyze all modes for epilogues Content-Language: en-US To: Richard Biener References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> <4272814n-8538-p793-157q-5n6q16r48n51@fhfr.qr> <623fbfd9-b97c-8c6e-0348-07d6c4496592@arm.com> <5c887c48-7f7e-c02b-2998-7a7c41b11af8@arm.com> <33cb143e-bb2e-e214-cd5f-66fd2d1bd20b@arm.com> <5op15ns-4sq8-2sn3-41qs-49q44417sp6@fhfr.qr> <99qs2o2p-pn87-n164-q8n9-9p814r6n75r1@fhfr.qr> <475fae98-9541-5dca-2e60-eaff172ff787@arm.com> <8p72o15s-5894-4or0-409r-oo4p74o238r1@fhfr.qr> <21e3500d-6cf5-ed46-6f95-1f554c5dbc50@arm.com> <5477e0cb-6dc9-e828-7c20-a99de3c6840c@arm.com> In-Reply-To: X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches" From: "Andre Vieira (lists)" Reply-To: "Andre Vieira \(lists\)" Cc: richard.sandiford@arm.com, "gcc-patches@gcc.gnu.org" Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, I've split this particular part off, since it's not only relevant to unrolling. The new test shows how this is useful for existing (non-unrolling) cases. I also had to fix the costing function, the main_vf / epilogue_vf calculations for old and new didn't take into consideration that the main_vf could be lower, nor did it take into consideration that they were not necessarily always a multiple of each other.  So using CEIL here is the correct approach. Bootstrapped and regression tested on aarch64-none-linux-gnu. OK for trunk? gcc/ChangeLog:         * tree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up for epilogue costing.         (vect_analyze_loop): Re-analyze all modes for epilogues. gcc/testsuite/ChangeLog:         * gcc.target/aarch64/masked_epilogue.c: New test. On 30/11/2021 13:56, Richard Biener wrote: > On Tue, 30 Nov 2021, Andre Vieira (lists) wrote: > >> On 25/11/2021 12:46, Richard Biener wrote: >>> Oops, my fault, yes, it does. I would suggest to refactor things so >>> that the mode_i = first_loop_i case is there only once. I also wonder >>> if all the argument about starting at 0 doesn't apply to the >>> not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well? So >>> what's the reason to differ here? So in the end I'd just change >>> the existing >>> >>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) >>> { >>> >>> to >>> >>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo) >>> || first_loop_vinfo->suggested_unroll_factor > 1) >>> { >>> >>> and maybe revisit this when we have an actual testcase showing that >>> doing sth else has a positive effect? >>> >>> Thanks, >>> Richard. >> So I had a quick chat with Richard Sandiford and he is suggesting resetting >> mode_i to 0 for all cases. >> >> He pointed out that for some tunings the SVE mode might come after the NEON >> mode, which means that even for not-unrolled loop_vinfos we could end up with >> a suboptimal choice of mode for the epilogue. I.e. it could be that we pick >> V16QI for main vectorization, but that's VNx16QI + 1 in the array, so we'd not >> try VNx16QI for the epilogue. >> >> This would simplify the mode selecting cases, by just simply restarting at >> mode_i in all epilogue cases. Is that something you'd be OK? > Works for me with an updated comment. Even better with showing a > testcase exercising such tuning. > > Richard. diff --git a/gcc/testsuite/gcc.target/aarch64/masked_epilogue.c b/gcc/testsuite/gcc.target/aarch64/masked_epilogue.c new file mode 100644 index 0000000000000000000000000000000000000000..286a7be236f337fee4c4650f42da72000855c5e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/masked_epilogue.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -march=armv8-a+sve -msve-vector-bits=scalable" } */ + +void f(unsigned char y[restrict], + unsigned char x[restrict], int n) { + for (int i = 0; i < n; ++i) + y[i] = (y[i] + x[i] + 1) >> 1; +} + +/* { dg-final { scan-tree-dump {LOOP EPILOGUE VECTORIZED \(MODE=VNx} "vect" } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index a28bb6321d76b8222bc8cfdade151ca9b4dca406..17b090170d4a5dad22097a727bc25a63e230e278 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2824,11 +2824,13 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo, { unsigned HOST_WIDE_INT main_vf_max = estimated_poly_value (main_poly_vf, POLY_VALUE_MAX); + unsigned HOST_WIDE_INT old_vf_max + = estimated_poly_value (old_vf, POLY_VALUE_MAX); + unsigned HOST_WIDE_INT new_vf_max + = estimated_poly_value (new_vf, POLY_VALUE_MAX); - old_factor = main_vf_max / estimated_poly_value (old_vf, - POLY_VALUE_MAX); - new_factor = main_vf_max / estimated_poly_value (new_vf, - POLY_VALUE_MAX); + old_factor = CEIL (main_vf_max, old_vf_max); + new_factor = CEIL (main_vf_max, new_vf_max); /* If the loop is not using partial vectors then it will iterate one time less than one that does. It is safe to subtract one here, @@ -3069,8 +3071,6 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) machine_mode autodetected_vector_mode = VOIDmode; opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL); unsigned int mode_i = 0; - unsigned int first_loop_i = 0; - unsigned int first_loop_next_i = 0; unsigned HOST_WIDE_INT simdlen = loop->simdlen; /* First determine the main loop vectorization mode, either the first @@ -3079,7 +3079,6 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) lowest cost if pick_lowest_cost_p. */ while (1) { - unsigned int loop_vinfo_i = mode_i; bool fatal; opt_loop_vec_info loop_vinfo = vect_analyze_loop_1 (loop, shared, &loop_form_info, @@ -3108,11 +3107,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) first_loop_vinfo = opt_loop_vec_info::success (NULL); } if (first_loop_vinfo == NULL) - { - first_loop_vinfo = loop_vinfo; - first_loop_i = loop_vinfo_i; - first_loop_next_i = mode_i; - } + first_loop_vinfo = loop_vinfo; else { delete loop_vinfo; @@ -3158,26 +3153,18 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) /* Now analyze first_loop_vinfo for epilogue vectorization. */ poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo); - /* Handle the case that the original loop can use partial - vectorization, but want to only adopt it for the epilogue. - The retry should be in the same mode as original. */ - if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) - { - gcc_assert (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (first_loop_vinfo) - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (first_loop_vinfo)); - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "***** Re-trying analysis with same vector mode" - " %s for epilogue with partial vectors.\n", - GET_MODE_NAME (first_loop_vinfo->vector_mode)); - mode_i = first_loop_i; - } - else - { - mode_i = first_loop_next_i; - if (mode_i == vector_modes.length ()) - return first_loop_vinfo; - } + /* For epilogues start the analysis from the first mode. The motivation + behind starting from the beginning comes from cases where the VECTOR_MODES + array may contain length agnostic and length fixed modes. Their ordering + is not guaranteed, so we could end up picking a mode for the main loop + that is after the epilogue's optimal mode. */ + mode_i = 0; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "***** Re-trying analysis with first vector mode" + " %s for epilogue.\n", + GET_MODE_NAME (vector_modes[mode_i])); /* ??? If first_loop_vinfo was using VOIDmode then we probably want to instead search for the corresponding mode in vector_modes[]. */