From patchwork Fri Sep 12 12:20:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 120133 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A4EBC3857B96 for ; Fri, 12 Sep 2025 12:21:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A4EBC3857B96 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=PJoCo1ac; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=1nVyxYYd; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=PJoCo1ac; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=1nVyxYYd X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id DE80B3858D21 for ; Fri, 12 Sep 2025 12:20:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DE80B3858D21 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DE80B3858D21 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757679626; cv=none; b=Qmcl6b8q9XHOHO5dWR3tWme3U4vVxIfnUWmRdY74WC1hvEjLdGh5wLEnCfDsHzgWU1HNu6UAIYQHQRN6TYde3JtHT10CfY5Q2QmWo1ImfzbO+TjcQme17KneCh5gILqVHuUieotNOmwsvrxYlYmv9hwwaOvaOJSS2elTYDkyu5Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757679626; c=relaxed/simple; bh=byrGxSR0MCBZ3stQbZbyCakyYX4GS8tiBJ9ucUMKoUc=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version:Message-Id; b=QJGKmXsAKWRDuG3h84yRYnfVUQYpLwaecLYXOYsVrwLk4COhE6O9RW6gZm+LTSweESd5IXHC5wuUZ1bEjNntoaYC1+ZHeijN/Y9PYn/PkE2fbBSyIUcjqN/JIltmlJgVXsytmXQzz5UyLxUSCbFbIt0RYZ6VUfFVGBiMbH5Db0Q= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DE80B3858D21 Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A7AC922963; Fri, 12 Sep 2025 12:20:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1757679624; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=s93X+F8lRfe1Z28Im+aAodbap+qxKVDLbxzDFlHihX4=; b=PJoCo1aces/d7Ffd7O6rR/ekTSR4MsSJRDCbgf92qq1eXRdEdPUb0vpq5czRqpx2UVq2Z3 z8poYTZt8ZdTmN/0YWrkWUd1z7AUcdr5gV5P2zS+6vwZ8A5X+582KBnHHK98g7D1tfg09I /2vgKGm5mbXGRgfbD3GLOI6pIliXW1w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1757679624; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=s93X+F8lRfe1Z28Im+aAodbap+qxKVDLbxzDFlHihX4=; b=1nVyxYYdTsKzKs/kelGwG7DLHzAjaleZ68mqcBWSntObNnmQI3/LMNTs2aBTGZCZ70mB36 BwfGqKKzbNXhDXDw== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1757679624; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=s93X+F8lRfe1Z28Im+aAodbap+qxKVDLbxzDFlHihX4=; b=PJoCo1aces/d7Ffd7O6rR/ekTSR4MsSJRDCbgf92qq1eXRdEdPUb0vpq5czRqpx2UVq2Z3 z8poYTZt8ZdTmN/0YWrkWUd1z7AUcdr5gV5P2zS+6vwZ8A5X+582KBnHHK98g7D1tfg09I /2vgKGm5mbXGRgfbD3GLOI6pIliXW1w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1757679624; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=s93X+F8lRfe1Z28Im+aAodbap+qxKVDLbxzDFlHihX4=; b=1nVyxYYdTsKzKs/kelGwG7DLHzAjaleZ68mqcBWSntObNnmQI3/LMNTs2aBTGZCZ70mB36 BwfGqKKzbNXhDXDw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 8930B136DB; Fri, 12 Sep 2025 12:20:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id ACPfHwgQxGigeQAAD6G6ig (envelope-from ); Fri, 12 Sep 2025 12:20:24 +0000 Date: Fri, 12 Sep 2025 14:20:24 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: RISC-V Subject: [PATCH] Do less redundant vect_transform_slp_perm_load calls MIME-Version: 1.0 Message-Id: <20250912122024.8930B136DB@imap1.dmz-prg2.suse.org> X-Spamd-Result: default: False [-4.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCPT_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_DN_SOME(0.00)[] X-Spam-Level: X-Spam-Score: -4.30 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org The following tries to do vect_transform_slp_perm_load exactly once during analysis and once during transform. There's a 2nd case left during analysis in get_load_store_type. Temporarily this records n_perms in the load-store info and verifies that against the value computed at transform stage. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. * tree-vectorizer.h (vect_load_store_data::n_perms): New. * tree-vect-stmts.cc (vectorizable_load): Analyze SLP_TREE_LOAD_PERMUTATION only once and remember n_perms. Verify the transform-time n_perms against the value stored during analysis. --- gcc/tree-vect-stmts.cc | 47 +++++++++++++++++++++++------------------- gcc/tree-vectorizer.h | 1 + 2 files changed, 27 insertions(+), 21 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 7eabf169a2b..d0ae19baebb 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -9478,6 +9478,7 @@ vectorizable_load (vec_info *vinfo, /* ??? The following checks should really be part of get_load_store_type. */ + unsigned n_perms = -1U; if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists () && !((memory_access_type == VMAT_ELEMENTWISE || mat_gather_scatter_p (memory_access_type)) @@ -9485,7 +9486,7 @@ vectorizable_load (vec_info *vinfo, { slp_perm = true; - if (!loop_vinfo) + if (!loop_vinfo && cost_vec) { /* In BB vectorization we may not actually use a loaded vector accessing elements in excess of DR_GROUP_SIZE. */ @@ -9508,17 +9509,21 @@ vectorizable_load (vec_info *vinfo, } } - auto_vec tem; - unsigned n_perms; - if (!vect_transform_slp_perm_load (vinfo, slp_node, tem, NULL, vf, - true, &n_perms)) + if (cost_vec) { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, - vect_location, - "unsupported load permutation\n"); - return false; + if (!vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL, vf, + true, &n_perms)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "unsupported load permutation\n"); + return false; + } + ls.n_perms = n_perms; } + else + n_perms = ls.n_perms; } if (slp_node->ldst_lanes @@ -9989,18 +9994,19 @@ vectorizable_load (vec_info *vinfo, } if (slp_perm) { - unsigned n_perms; if (costing_p) { - unsigned n_loads; - vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL, vf, - true, &n_perms, &n_loads); + gcc_assert (n_perms != -1U); inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm, slp_node, 0, vect_body); } else - vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf, - false, &n_perms); + { + unsigned n_perms2; + vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf, + false, &n_perms2); + gcc_assert (n_perms == n_perms2); + } } if (costing_p) @@ -11378,25 +11384,24 @@ vectorizable_load (vec_info *vinfo, if (slp_perm) { - unsigned n_perms; /* For SLP we know we've seen all possible uses of dr_chain so direct vect_transform_slp_perm_load to DCE the unused parts. ??? This is a hack to prevent compile-time issues as seen in PR101120 and friends. */ if (costing_p) { - vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf, - true, &n_perms, nullptr); + gcc_assert (n_perms != -1U); if (n_perms != 0) inside_cost = record_stmt_cost (cost_vec, n_perms, vec_perm, slp_node, 0, vect_body); } else { + unsigned n_perms2; bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, - gsi, vf, false, &n_perms, + gsi, vf, false, &n_perms2, nullptr, true); - gcc_assert (ok); + gcc_assert (ok && n_perms == n_perms2); } dr_chain.release (); } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 749a9830e07..6ac4299ede2 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -288,6 +288,7 @@ struct vect_load_store_data : vect_data { } gs; tree strided_offset_vectype; // VMAT_GATHER_SCATTER_IFN, originally strided auto_vec elsvals; + unsigned n_perms; // SLP_TREE_LOAD_PERMUTATION }; /* A computation tree of an SLP instance. Each node corresponds to a group of