From patchwork Fri Nov 12 18:04:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 47561 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D47CF3858410 for ; Fri, 12 Nov 2021 18:05:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D47CF3858410 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1636740342; bh=2XDWZkJT09NFCk3vv1PKt9nJw7jpsXhUzOkj9MFIHuw=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=p/sOABx4cXjPVgXwuR5xZHVeOvfpiF8C2rIb8LbHMADwdo64wHNnHCgcJPt6w3pko KHRwW/2jk+lK9xNxMLsrYwt48Kr3pkO/e3LFQuq2uThDGxucXHASMfmOG6ax3eUM3g IqnCawJDsMNshPd0tJCYoDNm2cc0Y/ssqK6Zb0ZM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 93EBA3858022 for ; Fri, 12 Nov 2021 18:04:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 93EBA3858022 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3B6B0D6E for ; Fri, 12 Nov 2021 10:04:15 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D1D4A3F718 for ; Fri, 12 Nov 2021 10:04:14 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 4/5] if-conv: Apply VN to hoisted conversions Date: Fri, 12 Nov 2021 18:04:13 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch is a prerequisite for a later one. At the moment, if-conversion converts predicated POINTER_PLUS_EXPRs into non-wrapping forms, which for: … = base + offset becomes: tmp = (unsigned long) base … = tmp + offset It then hoists these conversions out of the loop where possible. However, because “base” is a valid gimple operand, there can be multiple POINTER_PLUS_EXPRs with the same base, which can in turn lead to multiple instances of the same conversion. The later VN pass is (and I think needs to be) restricted to the new if-converted code, whereas here we're deliberately inserting the conversions before the .LOOP_VECTORIZED condition: /* If we versioned loop then make sure to insert invariant stmts before the .LOOP_VECTORIZED check since the vectorizer will re-use that for things like runtime alias versioning whose condition can end up using those invariants. */ We can therefore enter the vectoriser with redundant conversions. The easiest fix seemed to be to defer the hoisting until after VN. This catches other hoisting opportunities too. Hoisting the code from the (artificial) loop in pr99102.c means that it's no longer worth vectorising. The patch forces vectorisation instead of relying on the cost model. The patch also reverts pr87007-4.c and pr87007-5.c back to their original forms, undoing changes in 783dc66f9ccb0019c3dad. The code at the time the tests were added was: testl %edi, %edi je .L10 vxorps %xmm1, %xmm1, %xmm1 vsqrtsd d3(%rip), %xmm1, %xmm0 vsqrtsd d2(%rip), %xmm1, %xmm1 ... .L10: ret with the operations being hoisted, and the vxorps was specifically wanted (compared to the previous code). This patch restores the code to that form, with the hoisted operations and the vxorps. Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard gcc/ * tree-if-conv.c: Include tree-eh.h. (predicate_statements): Remove pe argument. Don't hoist statements here. (combine_blocks): Remove pe argument. (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions. (ifcvt_hoist_invariants): Likewise. (tree_if_conversion): Update call to combine_blocks. Call ifcvt_hoist_invariants after VN. gcc/testsuite/ * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model. Revert: 2020-09-09 Richard Biener * gcc.target/i386/pr87007-4.c: Adjust. * gcc.target/i386/pr87007-5.c: Likewise. --- gcc/testsuite/gcc.dg/vect/pr99102.c | 2 +- gcc/testsuite/gcc.target/i386/pr87007-4.c | 2 +- gcc/testsuite/gcc.target/i386/pr87007-5.c | 2 +- gcc/tree-if-conv.c | 122 ++++++++++++++++++++-- 4 files changed, 114 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c index 6c1a13f0783..0d030d15c86 100644 --- a/gcc/testsuite/gcc.dg/vect/pr99102.c +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c @@ -1,4 +1,4 @@ -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */ /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */ long a[44]; short d, e = -7; diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c index 9c4b8005af3..e91bdcbac44 100644 --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c @@ -15,4 +15,4 @@ foo (int n, int k) d1 = ceil (d3); } -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c index e4d956a5d7f..20d13cf650b 100644 --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c @@ -15,4 +15,4 @@ foo (int n, int k) d1 = sqrt (d3); } -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index e88ddc9f788..0ad557a2f4d 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-cfgcleanup.h" #include "tree-ssa-dse.h" #include "tree-vectorizer.h" +#include "tree-eh.h" /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond, */ static void -predicate_statements (loop_p loop, edge pe) +predicate_statements (loop_p loop) { unsigned int i, orig_loop_num_nodes = loop->num_nodes; auto_vec vect_sizes; @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe) { gassign *stmt2 = as_a (gsi_stmt (gsi2)); gsi_remove (&gsi2, false); - /* Make sure to move invariant conversions out of the - loop. */ - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2)) - && expr_invariant_in_loop_p (loop, - gimple_assign_rhs1 (stmt2))) - gsi_insert_on_edge_immediate (pe, stmt2); - else if (first) + if (first) { gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT); first = false; @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop) blocks. Replace PHI nodes with conditional modify expressions. */ static void -combine_blocks (class loop *loop, edge pe) +combine_blocks (class loop *loop) { basic_block bb, exit_bb, merge_target_bb; unsigned int orig_loop_num_nodes = loop->num_nodes; @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe) predicate_all_scalar_phis (loop); if (need_to_predicate || need_to_rewrite_undefined) - predicate_statements (loop, pe); + predicate_statements (loop); /* Merge basic blocks. */ exit_bb = NULL; @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop) } } +/* Return true if STMT can be hoisted from if-converted loop LOOP. */ + +static bool +ifcvt_can_hoist (class loop *loop, gimple *stmt) +{ + if (auto *call = dyn_cast (stmt)) + { + if (gimple_call_internal_p (call) + && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0) + return false; + } + else if (auto *assign = dyn_cast (stmt)) + { + if (gimple_assign_rhs_code (assign) == COND_EXPR) + return false; + } + else + return false; + + if (gimple_has_side_effects (stmt) + || gimple_could_trap_p (stmt) + || stmt_could_throw_p (cfun, stmt) + || gimple_vdef (stmt) + || gimple_vuse (stmt)) + return false; + + int num_args = gimple_num_args (stmt); + for (int i = 0; i < num_args; ++i) + if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i))) + return false; + + return true; +} + +/* PE is the preferred hoisting edge selected by tree_if_conversion, which + s known to be different from (and to dominate) the preheader edge of the + if-converted loop. We already know that STMT can be inserted on the loop + preheader edge. Return true if we prefer to insert it on PE instead. */ + +static bool +ifcvt_can_hoist_further (edge pe, gimple *stmt) +{ + /* As explained in tree_if_conversion, we want to hoist invariant + conversions further so that they can be reused by alias analysis. */ + auto *assign = dyn_cast (stmt); + if (assign + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign))) + { + tree rhs = gimple_assign_rhs1 (assign); + if (is_gimple_min_invariant (rhs)) + return true; + + if (TREE_CODE (rhs) == SSA_NAME) + { + basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs)); + if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb)) + return true; + } + } + return false; +} + +/* Hoist invariant statements from LOOP. PE is the preferred edge for + hoisting conversions, as selected by tree_if_conversion; see there + for details. */ + +static void +ifcvt_hoist_invariants (class loop *loop, edge pe) +{ + gimple_stmt_iterator hoist_gsi = {}; + gimple_stmt_iterator hoist_gsi_pe = {}; + unsigned int num_blocks = loop->num_nodes; + basic_block *body = get_loop_body (loop); + for (unsigned int i = 0; i < num_blocks; ++i) + for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);) + { + gimple *stmt = gsi_stmt (gsi); + if (ifcvt_can_hoist (loop, stmt)) + { + /* Once we've hoisted one statement, insert other statements + after it. */ + edge e = loop_preheader_edge (loop); + gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi; + if (e != pe && ifcvt_can_hoist_further (pe, stmt)) + { + e = pe; + hoist_gsi_ptr = &hoist_gsi_pe; + } + gsi_remove (&gsi, false); + if (hoist_gsi_ptr->ptr) + gsi_insert_after (hoist_gsi_ptr, stmt, GSI_NEW_STMT); + else + { + gsi_insert_on_edge_immediate (e, stmt); + *hoist_gsi_ptr = gsi_for_stmt (stmt); + } + continue; + } + gsi_next (&gsi); + } + free (body); +} + /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something changed. */ @@ -3275,7 +3373,7 @@ tree_if_conversion (class loop *loop, vec *preds) /* Now all statements are if-convertible. Combine all the basic blocks into one huge basic block doing the if-conversion on-the-fly. */ - combine_blocks (loop, pe); + combine_blocks (loop); /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry @@ -3297,6 +3395,8 @@ tree_if_conversion (class loop *loop, vec *preds) ifcvt_local_dce (loop); BITMAP_FREE (exit_bbs); + ifcvt_hoist_invariants (loop, pe); + todo |= TODO_cleanup_cfg; cleanup: