From patchwork Wed Oct 23 12:28:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 99433 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B89083858C50 for ; Wed, 23 Oct 2024 12:29:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id 8A70A3858C41 for ; Wed, 23 Oct 2024 12:28:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8A70A3858C41 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8A70A3858C41 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729686540; cv=none; b=r3bDTda/K41Jgc7BJJJyPYkbIdzjqobMjCC6nY6YAgu68Y5VMs8wCj4R1WQqD+Wq1e1ASdQVxndo1rWFJTVBRGOE+dwfzn8VG3/kA0Be07aAsayd+VMCcfLE7AROgHr7X0gLtIDZDBf1aJLku9p5FbDXgr/HeaGawt6XMLH9Q2g= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729686540; c=relaxed/simple; bh=OxrlGc1ClsKjmIUgMA9iFEBklKmWyvlsteZjTYO3yGg=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=tSj5UCYPfRxw3PPA4isP5ZXBH7M2+BF4wC6VQgPf3ghgOdxLf1xuxIEhxf/jVwSU5H4XYAf6PxQjQpTexdrhaIe+9wBATMqoq/xBjAY3X+vHfIKxZ0Jlb1jdWZ1TJTrABNs+9g8Ssx4eiIu+6UJS9CfcbNNMfq16zHSWfNUY8zk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 88E8C21CBA for ; Wed, 23 Oct 2024 12:28:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1729686536; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=VJB4rf/66R2QiAcj3iy+lFI7lxGc71DESXN8+b8yZJ8=; b=kBFS57JHd+hntuFeW5wEZXaRrRx/xvG/BmX1X5ViYA8qFNV6WDnFVigQN5N4qrRt1fAicx lK7G5+eFe54y/tSblG2kJHGnnFWfi15dUoHI58b1i6AVuFdfw4oeyHS9U1BYSMchHZOi5G SFij5H6Bv4IFCHGHgzWIPRWglg/IDQw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1729686536; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=VJB4rf/66R2QiAcj3iy+lFI7lxGc71DESXN8+b8yZJ8=; b=u3wbuPtx+Pna7M4S2WGw/Xatob33uSH39HpfJzRS4ml7Vt40gNf3bbSGxX9rjcwTpsLZEH u6hZDBttP+Y3OGBg== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1729686536; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=VJB4rf/66R2QiAcj3iy+lFI7lxGc71DESXN8+b8yZJ8=; b=kBFS57JHd+hntuFeW5wEZXaRrRx/xvG/BmX1X5ViYA8qFNV6WDnFVigQN5N4qrRt1fAicx lK7G5+eFe54y/tSblG2kJHGnnFWfi15dUoHI58b1i6AVuFdfw4oeyHS9U1BYSMchHZOi5G SFij5H6Bv4IFCHGHgzWIPRWglg/IDQw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1729686536; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=VJB4rf/66R2QiAcj3iy+lFI7lxGc71DESXN8+b8yZJ8=; b=u3wbuPtx+Pna7M4S2WGw/Xatob33uSH39HpfJzRS4ml7Vt40gNf3bbSGxX9rjcwTpsLZEH u6hZDBttP+Y3OGBg== Date: Wed, 23 Oct 2024 14:28:56 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2] Relax vect_check_scalar_mask check MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-1.13 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-0.33)[-0.333]; NEURAL_HAM_SHORT(-0.20)[-0.995]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[] X-Spam-Score: -1.13 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Message-Id: <20241023122930.B89083858C50@sourceware.org> When the mask is not a constant or external def there's no need to check the scalar type, in particular with SLP and the mask being a VEC_PERM_EXPR there isn't a scalar operand ready to check (not one vect_is_simple_use will get you). We later check the vector type and reject non-mask types there. Bootstrapped and tested on x86_64-unknown-linux-gnu. * tree-vect-stmts.cc (vect_check_scalar_mask): Only check the scalar type for constant or extern defs. --- gcc/tree-vect-stmts.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index cca6fd6fa97..55f263620c5 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2520,7 +2520,8 @@ vect_check_scalar_mask (vec_info *vinfo, stmt_vec_info stmt_info, return false; } - if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (*mask))) + if ((mask_dt == vect_constant_def || mask_dt == vect_external_def) + && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (*mask))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, From patchwork Wed Oct 23 12:32:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 99434 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F0FC03858C60 for ; Wed, 23 Oct 2024 12:33:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id 628423858D21 for ; Wed, 23 Oct 2024 12:32:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 628423858D21 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 628423858D21 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729686769; cv=none; b=bQwAv3pUcF4JoqYhCtIMPPD9e4/mIk476RZoFPL9HoLTZLqKkC1EuWzBvIZ24/Q5HHmfIhjTYFlOVv3ECDHGWL647fj1HmnNZKH8Md0bqZEpMuRvoTohL2i5NQzhJ8KK3LEL6K0w3wRPontOR3S3WQu539KF99aXRv0W8CKfU1g= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729686769; c=relaxed/simple; bh=eeie+l5PpPLI6k71vlC+SNtmlioiWOct7rShEg3h1uk=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=q8XLCdB5sZUDTob0/sagqWO94mFdAlVYoIoZKv0c7owo0cVG6JFT7HzrQXxCOJ+zuVPa8TfecuTs6cWeeKieU4rwYrYqJTZ0xJIEZ0nAq48BB5QHWIfG+4eHhbbq/UibbgrlA1EPtoTTVsw1I4QcjYIsCMGaDNY7W3AE8ArE4iw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6134321E8F; Wed, 23 Oct 2024 12:32:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1729686761; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=/ozLe3+8oQ4oVenYOVSAwmO4fU4qQ5WShvI9l8jnSKg=; b=KfQgZFOHBzPa19pQxJcL4TyiIhVQb6LmUf0yCz5i/1PuXmavhVxrkJIqxuef4WzoW61i1T 9ZOaMxG/hw+lp8g6onnrg5ykboTCtoxGWxfr2jLFCLzIpp4+m226eCnrq6FESRBWzSgnUh 07jEDet58sWkpDms9J2pasP8AznfS1U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1729686761; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=/ozLe3+8oQ4oVenYOVSAwmO4fU4qQ5WShvI9l8jnSKg=; b=3cqMFQM784MYIeUUitUHGSXoAlzUFmAd+Q0h2G/m0So4ilPmKpGTpfaD5pYwpvSQq2Abcr ZURtpvi0wln6iWBA== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1729686761; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=/ozLe3+8oQ4oVenYOVSAwmO4fU4qQ5WShvI9l8jnSKg=; b=KfQgZFOHBzPa19pQxJcL4TyiIhVQb6LmUf0yCz5i/1PuXmavhVxrkJIqxuef4WzoW61i1T 9ZOaMxG/hw+lp8g6onnrg5ykboTCtoxGWxfr2jLFCLzIpp4+m226eCnrq6FESRBWzSgnUh 07jEDet58sWkpDms9J2pasP8AznfS1U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1729686761; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=/ozLe3+8oQ4oVenYOVSAwmO4fU4qQ5WShvI9l8jnSKg=; b=3cqMFQM784MYIeUUitUHGSXoAlzUFmAd+Q0h2G/m0So4ilPmKpGTpfaD5pYwpvSQq2Abcr ZURtpvi0wln6iWBA== Date: Wed, 23 Oct 2024 14:32:41 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: RISC-V CI , richard.sandiford@arm.com Subject: [PATCH 2/2] tree-optimization/116575 - SLP masked load-lanes discovery MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-1.61 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-0.81)[-0.807]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; FROM_HAS_DN(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCPT_COUNT_THREE(0.00)[3] X-Spam-Score: -1.61 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Message-Id: <20241023123335.F0FC03858C60@sourceware.org> The following implements masked load-lane discovery for SLP. The challenge here is that a masked load has a full-width mask with group-size number of elements when this becomes a masked load-lanes instruction one mask element gates all group members. We already have some discovery hints in place, namely STMT_VINFO_SLP_VECT_ONLY to guard non-uniform masks, but we need to choose a way for SLP discovery to handle possible masked load-lanes SLP trees. I have this time chosen to handle load-lanes discovery where we have performed permute optimization already and conveniently got the graph with predecessor edges built. This is because unlike non-masked loads masked loads with a load_permutation are never produced by SLP discovery (because load permutation handling doesn't handle un-permuting the mask) and thus the load-permutation lowering which handles non-masked load-lanes discovery doesn't trigger. With this SLP discovery for a possible masked load-lanes, thus a masked load with uniform mask, produces a splat of a single-lane sub-graph as the mask SLP operand. This is a representation that shouldn't pessimize the mask load case and allows the masked load-lanes transform to simply elide this splat. This fixes the aarch64-sve.exp mask_struct_load*.c testcases with --param vect-force-slp=1 Bootstrap and regtest running on x86_64-unknown-linux-gnu. I realize we are still quite inconsistent in how we do SLP discovery - mainly because of my idea to only apply minimal changes at this point. I would expect that permuted masked loads miss the interleaving lowering performed by load permutation lowering. And if we fix that we again have to decide whether to interleave or load-lane at the same time. I'm also not sure how much good the optimize_slp passes to do VEC_PERMs in the SLP graph and what stops working when there are no longer any load_permutations in there. Richard. PR tree-optimization/116575 * tree-vect-slp.cc (vect_get_and_check_slp_defs): Handle gaps, aka NULL scalar stmt. (vect_build_slp_tree_2): Allow gaps in the middle of a grouped mask load. When the mask of a grouped mask load is uniform do single-lane discovery for the mask and insert a splat VEC_PERM_EXPR node. (vect_optimize_slp_pass::decide_masked_load_lanes): New function. (vect_optimize_slp_pass::run): Call it. --- gcc/tree-vect-slp.cc | 138 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 135 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index fca9ae86d2e..037098a96cb 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -641,6 +641,16 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, unsigned int commutative_op = -1U; bool first = stmt_num == 0; + if (!stmt_info) + { + for (auto oi : *oprnds_info) + { + oi->def_stmts.quick_push (NULL); + oi->ops.quick_push (NULL_TREE); + } + return 0; + } + if (!is_a (stmt_info->stmt) && !is_a (stmt_info->stmt) && !is_a (stmt_info->stmt)) @@ -2029,9 +2039,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, has_gaps = true; /* We cannot handle permuted masked loads directly, see PR114375. We cannot handle strided masked loads or masked - loads with gaps. */ + loads with gaps unless the mask is uniform. */ if ((STMT_VINFO_GROUPED_ACCESS (stmt_info) - && (DR_GROUP_GAP (first_stmt_info) != 0 || has_gaps)) + && (DR_GROUP_GAP (first_stmt_info) != 0 + || (has_gaps + && STMT_VINFO_SLP_VECT_ONLY (first_stmt_info)))) || STMT_VINFO_STRIDED_P (stmt_info)) { load_permutation.release (); @@ -2054,7 +2066,12 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, unsigned i = 0; for (stmt_vec_info si = first_stmt_info; si; si = DR_GROUP_NEXT_ELEMENT (si)) - stmts2[i++] = si; + { + if (si != first_stmt_info) + for (unsigned k = 1; k < DR_GROUP_GAP (si); ++k) + stmts2[i++] = NULL; + stmts2[i++] = si; + } bool *matches2 = XALLOCAVEC (bool, dr_group_size); slp_tree unperm_load = vect_build_slp_tree (vinfo, stmts2, dr_group_size, @@ -2719,6 +2736,43 @@ out: continue; } + /* When we have a masked load with uniform mask discover this + as a single-lane mask with a splat permute. This way we can + recognize this as a masked load-lane by stripping the splat. */ + if (is_a (STMT_VINFO_STMT (stmt_info)) + && gimple_call_internal_p (STMT_VINFO_STMT (stmt_info), + IFN_MASK_LOAD) + && STMT_VINFO_GROUPED_ACCESS (stmt_info) + && ! STMT_VINFO_SLP_VECT_ONLY (DR_GROUP_FIRST_ELEMENT (stmt_info))) + { + vec def_stmts2; + def_stmts2.create (1); + def_stmts2.quick_push (oprnd_info->def_stmts[0]); + child = vect_build_slp_tree (vinfo, def_stmts2, 1, + &this_max_nunits, + matches, limit, + &this_tree_size, bst_map); + if (child) + { + slp_tree pnode = vect_create_new_slp_node (1, VEC_PERM_EXPR); + SLP_TREE_VECTYPE (pnode) = SLP_TREE_VECTYPE (child); + SLP_TREE_LANES (pnode) = group_size; + SLP_TREE_SCALAR_STMTS (pnode).create (group_size); + SLP_TREE_LANE_PERMUTATION (pnode).create (group_size); + for (unsigned k = 0; k < group_size; ++k) + { + SLP_TREE_SCALAR_STMTS (pnode).quick_push (def_stmts2[0]); + SLP_TREE_LANE_PERMUTATION (pnode) + .quick_push (std::make_pair (0u, 0u)); + } + SLP_TREE_CHILDREN (pnode).quick_push (child); + pnode->max_nunits = child->max_nunits; + children.safe_push (pnode); + oprnd_info->def_stmts = vNULL; + continue; + } + } + if ((child = vect_build_slp_tree (vinfo, oprnd_info->def_stmts, group_size, &this_max_nunits, matches, limit, @@ -5498,6 +5552,9 @@ private: /* Clean-up. */ void remove_redundant_permutations (); + /* Masked load lanes discovery. */ + void decide_masked_load_lanes (); + void dump (); vec_info *m_vinfo; @@ -7126,6 +7183,80 @@ vect_optimize_slp_pass::dump () } } +/* Masked load lanes discovery. */ + +void +vect_optimize_slp_pass::decide_masked_load_lanes () +{ + for (auto v : m_vertices) + { + slp_tree node = v.node; + if (SLP_TREE_DEF_TYPE (node) != vect_internal_def + || SLP_TREE_CODE (node) == VEC_PERM_EXPR) + continue; + stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node); + if (! STMT_VINFO_GROUPED_ACCESS (stmt_info) + /* The mask has to be uniform. */ + || STMT_VINFO_SLP_VECT_ONLY (stmt_info) + || ! is_a (STMT_VINFO_STMT (stmt_info)) + || ! gimple_call_internal_p (STMT_VINFO_STMT (stmt_info), + IFN_MASK_LOAD)) + continue; + stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); + if (STMT_VINFO_STRIDED_P (stmt_info) + || compare_step_with_zero (m_vinfo, stmt_info) <= 0 + || vect_load_lanes_supported (SLP_TREE_VECTYPE (node), + DR_GROUP_SIZE (stmt_info), + true) == IFN_LAST) + continue; + + /* Uniform masks need to be suitably represented. */ + slp_tree mask = SLP_TREE_CHILDREN (node)[0]; + if (SLP_TREE_CODE (mask) != VEC_PERM_EXPR + || SLP_TREE_CHILDREN (mask).length () != 1) + continue; + bool match = true; + for (auto perm : SLP_TREE_LANE_PERMUTATION (mask)) + if (perm.first != 0 || perm.second != 0) + { + match = false; + break; + } + if (!match) + continue; + + /* Now see if the consumer side matches. */ + for (graph_edge *pred = m_slpg->vertices[node->vertex].pred; + pred; pred = pred->pred_next) + { + slp_tree pred_node = m_vertices[pred->src].node; + /* All consumers should be a permute with a single outgoing lane. */ + if (SLP_TREE_CODE (pred_node) != VEC_PERM_EXPR + || SLP_TREE_LANES (pred_node) != 1) + { + match = false; + break; + } + gcc_assert (SLP_TREE_CHILDREN (pred_node).length () == 1); + } + if (!match) + continue; + /* Now we can mark the nodes as to use load lanes. */ + node->ldst_lanes = true; + for (graph_edge *pred = m_slpg->vertices[node->vertex].pred; + pred; pred = pred->pred_next) + m_vertices[pred->src].node->ldst_lanes = true; + /* The catch is we have to massage the mask. We have arranged + analyzed uniform masks to be represented by a splat VEC_PERM + which we can now simply elide as we cannot easily re-do SLP + discovery here. */ + slp_tree new_mask = SLP_TREE_CHILDREN (mask)[0]; + SLP_TREE_REF_COUNT (new_mask)++; + SLP_TREE_CHILDREN (node)[0] = new_mask; + vect_free_slp_tree (mask); + } +} + /* Main entry point for the SLP graph optimization pass. */ void @@ -7146,6 +7277,7 @@ vect_optimize_slp_pass::run () } else remove_redundant_permutations (); + decide_masked_load_lanes (); free_graph (m_slpg); }