From patchwork Wed Feb 23 13:51:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 51334 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D74973948A4B for ; Wed, 23 Feb 2022 13:51:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D74973948A4B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1645624297; bh=5ikuAzbsU3r/5HAlPddX3LbjtT9n2Ymq0u6cofnbF+w=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=jR/p8KaZ3vRM5iVB8+bcz3fZgwKdgLIJeC8mZTSLge4lexSAfOIo/fvfADuuAqVTe C+nFPASEYYnS/Z7fQKcuTdPUDvkkzxqmeAcfG6/mhFcoQbpwBl55MOUWIwfv/YooIZ NgXGNnuaQkgdiMzvOoJP+RfFPMkbpbTeDmoQus4M= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id B58743858405 for ; Wed, 23 Feb 2022 13:51:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B58743858405 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 80EE821155; Wed, 23 Feb 2022 13:51:06 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 65F7613D72; Wed, 23 Feb 2022 13:51:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id lM4rF8o7FmJiNgAAMHmgww (envelope-from ); Wed, 23 Feb 2022 13:51:06 +0000 Date: Wed, 23 Feb 2022 14:51:06 +0100 (CET) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/104658 - avoid mixing mask & non-mask vector defs MIME-Version: 1.0 Message-Id: <20220223135106.65F7613D72@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Cc: richard.sandiford@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" When pattern recognition fails to sanitize all defs of a mask producing operation and the respective def is external or constant we end up trying to produce a VECTOR_BOOLEAN_TYPE_P constructor which in turn ends up exposing stmts like _135 = _49 ? -1 : 0; which isn't handled well in followup SLP and generates awful code. We do rely heavily on pattern recognition to sanitize mask vs. data uses of bools but that fails here which means we also should fail vectorization. That avoids ICEing because of such stmts and it also avoids generating weird code which makes the vectorization not profitable. The following patch simply disallows external VECTOR_BOOLEAN_TYPE_P defs and arranges the promote to external code to instead promote mask uses to extern (that's just a short-cut here). I've also looked at aarch64 and with SVE and a fixed vector length for the gcc.target/i386/pr101636.c testcase. I see similar vectorization (using ) there but it's hard to decide whether the old, the new or no vectorization is better for this. The code generated with traditional integer masks isn't as awkward but we still get the != 0 promotion done for each scalar element which doesn't look like intended - this operation should be visible upfront. That also means some cases will now become a missed optimization that needs to be fixed by bool pattern recognition which I plan to look at in more detail during stage1. Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1. 2022-02-22 Richard Biener PR tree-optimization/104658 * tree-vect-slp.cc (vect_slp_convert_to_external): Do not create VECTOR_BOOLEAN_TYPE_P extern defs. Reset the vector type on nodes we promote. (vectorizable_bb_reduc_epilogue): Deal with externalized root. * tree-vect-stmts.cc (vect_maybe_update_slp_op_vectype): Do not allow VECTOR_BOOLEAN_TYPE_P extern defs. * gcc.target/i386/pr104658.c: New testcase. --- gcc/testsuite/gcc.target/i386/pr104658.c | 113 +++++++++++++++++++++++ gcc/tree-vect-slp.cc | 9 +- gcc/tree-vect-stmts.cc | 5 + 3 files changed, 125 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr104658.c diff --git a/gcc/testsuite/gcc.target/i386/pr104658.c b/gcc/testsuite/gcc.target/i386/pr104658.c new file mode 100644 index 00000000000..2b8d02aacab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104658.c @@ -0,0 +1,113 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fgimple -ftree-slp-vectorize -mavx512f -fdump-tree-slp2" } */ + +void __GIMPLE (ssa,guessed_local(118111600)) +bar (int * restrict a, int * restrict e, + _Bool d0, _Bool d1, _Bool d2, _Bool d3, _Bool d4, _Bool d5, _Bool d6, _Bool d7) +{ + int _1; + int _4; + int _6; + int _8; + int _10; + int _12; + int _14; + int _16; + int _27; + _Bool _37; + _Bool _39; + _Bool _41; + int _43; + _Bool _45; + _Bool _47; + _Bool _49; + _Bool _53; + _Bool _54; + _Bool _55; + int _56; + _Bool _57; + _Bool _58; + _Bool _59; + int _60; + _Bool _61; + _Bool _62; + _Bool _63; + int _64; + _Bool _65; + _Bool _66; + _Bool _67; + int _68; + _Bool _69; + _Bool _70; + _Bool _71; + int _72; + _Bool _73; + _Bool _74; + _Bool _75; + int _76; + + __BB(2,guessed_local(118111600)): + _73 = d0_2(D); + _69 = d1_5(D); + _65 = d2_7(D); + _61 = d3_9(D); + _57 = d4_11(D); + _53 = d5_13(D); + _41 = d6_15(D); + _49 = d7_17(D); + a_81 = a_22(D); + e_82 = e_23(D); + _1 = __MEM (a_81 + _Literal (int * restrict) 32); + _4 = __MEM (a_81 + _Literal (int * restrict) 36); + _6 = __MEM (a_81); + _8 = __MEM (a_81 + _Literal (int * restrict) 4); + _10 = __MEM (a_81 + _Literal (int * restrict) 48); + _12 = __MEM (a_81 + _Literal (int * restrict) 52); + _14 = __MEM (a_81 + _Literal (int * restrict) 16); + _16 = __MEM (a_81 + _Literal (int * restrict) 60); + _74 = _1 != 0; + _75 = _73 & _74; + _76 = _75 ? _1 : 0; + __MEM (e_82) = _76; + __MEM (e_82 + _Literal (int * restrict) 4) = _76; + __MEM (e_82 + _Literal (int * restrict) 8) = _76; + __MEM (e_82 + _Literal (int * restrict) 12) = _76; + __MEM (e_82 + _Literal (int * restrict) 16) = _76; + __MEM (e_82 + _Literal (int * restrict) 20) = _76; + __MEM (e_82 + _Literal (int * restrict) 24) = _76; + __MEM (e_82 + _Literal (int * restrict) 28) = _76; + __MEM (e_82 + _Literal (int * restrict) 32) = _76; + _70 = _4 != 0; + _71 = _69 & _70; + _72 = _71 ? _4 : 0; + __MEM (e_82 + _Literal (int * restrict) 36) = _72; + _66 = _6 != 0; + _67 = _65 & _66; + _68 = _67 ? _6 : 0; + __MEM (e_82 + _Literal (int * restrict) 40) = _68; + _62 = _8 != 0; + _63 = _61 & _62; + _64 = _63 ? _8 : 0; + __MEM (e_82 + _Literal (int * restrict) 44) = _64; + _58 = _10 != 0; + _59 = _57 & _58; + _60 = _59 ? _10 : 0; + __MEM (e_82 + _Literal (int * restrict) 48) = _60; + _54 = _12 != 0; + _55 = _53 & _54; + _56 = _55 ? _12 : 0; + __MEM (e_82 + _Literal (int * restrict) 52) = _56; + _39 = _14 != 0; + _37 = _39 & _41; + _27 = _37 ? _14 : 0; + __MEM (e_82 + _Literal (int * restrict) 56) = _27; + _47 = _16 != 0; + _45 = _47 & _49; + _43 = _45 ? _16 : 0; + __MEM (e_82 + _Literal (int * restrict) 60) = _43; + return; + +} + +/* We do not want a AVX512 mask CTOR built from converted _Bool. */ +/* { dg-final { scan-tree-dump-not " = \\(\\) " "slp2" } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index f9bbc872a99..25b8b1e57b2 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -4515,7 +4515,9 @@ vect_slp_convert_to_external (vec_info *vinfo, slp_tree node, if (!is_a (vinfo) || node == SLP_INSTANCE_TREE (node_instance) || !SLP_TREE_SCALAR_STMTS (node).exists () - || vect_contains_pattern_stmt_p (SLP_TREE_SCALAR_STMTS (node))) + || vect_contains_pattern_stmt_p (SLP_TREE_SCALAR_STMTS (node)) + /* Force the mask use to be built from scalars instead. */ + || VECTOR_BOOLEAN_TYPE_P (SLP_TREE_VECTYPE (node))) return false; if (dump_enabled_p ()) @@ -4527,6 +4529,8 @@ vect_slp_convert_to_external (vec_info *vinfo, slp_tree node, (need to) ignore child nodes of anything that isn't vect_internal_def. */ unsigned int group_size = SLP_TREE_LANES (node); SLP_TREE_DEF_TYPE (node) = vect_external_def; + /* Invariants get their vector type from the uses. */ + SLP_TREE_VECTYPE (node) = NULL_TREE; SLP_TREE_SCALAR_OPS (node).safe_grow (group_size, true); SLP_TREE_LOAD_PERMUTATION (node).release (); FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info) @@ -4861,7 +4865,8 @@ vectorizable_bb_reduc_epilogue (slp_instance instance, reduc_code = PLUS_EXPR; internal_fn reduc_fn; tree vectype = SLP_TREE_VECTYPE (SLP_INSTANCE_TREE (instance)); - if (!reduction_fn_for_scalar_code (reduc_code, &reduc_fn) + if (!vectype + || !reduction_fn_for_scalar_code (reduc_code, &reduc_fn) || reduc_fn == IFN_LAST || !direct_internal_fn_supported_p (reduc_fn, vectype, OPTIMIZE_FOR_BOTH) || !useless_type_conversion_p (TREE_TYPE (gimple_assign_lhs (stmt)), diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 5c9e8cfefa5..8262f7d3fbc 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -11841,6 +11841,11 @@ vect_maybe_update_slp_op_vectype (slp_tree op, tree vectype) return true; if (SLP_TREE_VECTYPE (op)) return types_compatible_p (SLP_TREE_VECTYPE (op), vectype); + /* For external defs refuse to produce VECTOR_BOOLEAN_TYPE_P, those + should be handled by patters. Allow vect_constant_def for now. */ + if (VECTOR_BOOLEAN_TYPE_P (vectype) + && SLP_TREE_DEF_TYPE (op) == vect_external_def) + return false; SLP_TREE_VECTYPE (op) = vectype; return true; }