From patchwork Mon Mar 21 15:10:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 52177 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6ECDC389942C for ; Mon, 21 Mar 2022 15:15:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6ECDC389942C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1647875752; bh=bRax2V302PFMyd4g2oxmrk5GQQ0qOgOJXXzHB4ecZOQ=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=FaIqvf7MdacTtn/JUZvRNG20avA8TLxXBGKXUkWYB6CPlLAO7XnwYuaSBGbhzqjIE 0C/HOVHh5rVM/NNSXuEwJxqR5fqG/4hAfcl7BxXaMWcggYjkrU8ySj4ydNi0zHrhUT 30mzCTaZghkE7vMkiynoU8NWBI9mWmxVTxP9ATcE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 023DB3846418 for ; Mon, 21 Mar 2022 15:10:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 023DB3846418 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B76C11F37C; Mon, 21 Mar 2022 15:10:31 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9B0CB133B6; Mon, 21 Mar 2022 15:10:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id em3JJGeVOGJdDAAAMHmgww (envelope-from ); Mon, 21 Mar 2022 15:10:31 +0000 Date: Mon, 21 Mar 2022 16:10:31 +0100 (CET) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/104912 - ensure cost model is checked first MIME-Version: 1.0 Message-Id: <20220321151031.9B0CB133B6@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Cc: Jakub Jelinek , richard.sandiford@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The following makes sure that when we build the versioning condition for vectorization including the cost model check, we check for the cost model and branch over other versioning checks. That is what the cost modeling assumes, since the cost model check is the only one accounted for in the scalar outside cost. Currently we emit all checks as straight-line code combined with bitwise ops which can result in surprising ordering of checks in the final assembly. Since loop_version accepts only a single versioning condition the splitting is done after the fact. The result is a 1.5% speedup of 416.gamess on x86_64 when compiling with -Ofast and tuning for generic or skylake. That's not enough to recover from the slowdown when vectorizing but it now cuts off the expensive alias versioning test. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? For the rest of the regression my plan is to somehow factor in the evolution of the number of iterations in the outer loop (which is {1, +, 1}) to somehow bump the static profitability estimate and together with the "cheap" cost model check never execute the vectorized version (well, it is actually never executed, but only because the alias check fails). Thanks, Richard. 2022-03-21 Richard Biener PR tree-optimization/104912 * tree-vect-loop-manip.cc (vect_loop_versioning): Split the cost model check to a separate BB to make sure it is checked first and not combined with other version checks. --- gcc/tree-vect-loop-manip.cc | 53 ++++++++++++++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index a7bbc916bbc..8ef333eb31b 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo, cond_expr = expr; } + tree cost_name = NULL_TREE; + if (cond_expr + && !integer_truep (cond_expr) + && (version_niter + || version_align + || version_alias + || version_simd_if_cond)) + cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr), + &cond_expr_stmt_list, + is_gimple_val, NULL_TREE); + if (version_niter) vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr); if (cond_expr) - cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr), - &cond_expr_stmt_list, - is_gimple_condexpr, NULL_TREE); + { + gimple_seq tem = NULL; + cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr), + &tem, + is_gimple_condexpr, NULL_TREE); + gimple_seq_add_seq (&cond_expr_stmt_list, tem); + } if (version_align) vect_create_cond_for_align_checks (loop_vinfo, &cond_expr, @@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo, update_ssa (TODO_update_ssa); } + /* Split the cost model check off to a separate BB. Costing assumes + this is the only thing we perform when we enter the scalar loop. */ + if (cost_name) + { + gimple *def = SSA_NAME_DEF_STMT (cost_name); + /* All uses of the cost check are 'true' after the check we + are going to insert. */ + replace_uses_by (cost_name, boolean_true_node); + /* And we're going to build the new single use of it. */ + gcond *cond = gimple_build_cond (NE_EXPR, cost_name, boolean_false_node, + NULL_TREE, NULL_TREE); + edge e = split_block (gimple_bb (def), def); + gimple_stmt_iterator gsi = gsi_for_stmt (def); + gsi_insert_after (&gsi, cond, GSI_NEW_STMT); + edge true_e, false_e; + extract_true_false_edges_from_block (e->dest, &true_e, &false_e); + e->flags &= ~EDGE_FALLTHRU; + e->flags |= EDGE_TRUE_VALUE; + edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE); + e->probability = prob; + e2->probability = prob.invert (); + set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src); + auto_vec adj; + for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + if (EDGE_COUNT (son->preds) > 1) + adj.safe_push (son); + for (auto son : adj) + set_immediate_dominator (CDI_DOMINATORS, son, e->src); + } + if (version_niter) { /* The versioned loop could be infinite, we need to clear existing