From patchwork Thu Sep 5 13:36:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 97172 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3A77B3864835 for ; Thu, 5 Sep 2024 13:36:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id B09D5385843B for ; Thu, 5 Sep 2024 13:36:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B09D5385843B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B09D5385843B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725543390; cv=none; b=r6o0DRr9i0cLp93eCw8puLjSRlM55LVIWwMKHP5Cg0cedO0mH24l+3+TbcusysU6BzmbBRDOEQCSJsslTfYpFLxDhfixkaLAl5URG2vGQR+SsIiGFEXk1wM0lMlqpIKQuFi1CuPqA4lacGK0xXAcHapc7E9mSTLUW6BkowXLD64= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725543390; c=relaxed/simple; bh=qEMMo5+Yyga5uDMX9DMxzXpz+8uOeQ4oOzu2JMeksp8=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=DW6Kgi0+ByHu2d75lLY8ftKoiVfCR9h2XrtIGTLEa/xjBWrVOjsMgzjJej1qcDQPJnsQIPBeFjX3FDO0uaX3Y0b5Il88PkIjwpisaEuGH5wUhXV1Jrj+Ptx6UC6qY9ClZ2MEbU1cjYG7UXOuUUP255BBvTYAbZZWjZTg1YLz4qE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A9A1921A6A for ; Thu, 5 Sep 2024 13:36:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1725543387; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=nZdVYrZ6EDRsU39d17BXmFf7X7TxlT9WPklCFhJXXRQ=; b=Yo9p8FJlMEY2PzZfGlcSFUHhZRex8gfF4hscrTNz4eGmuGmKZOcu+0oqPGXD0FxOcgcRVm 6ANfw7uM/ixt+mb91JPDiUdDKPko9UYhsUzIjLFtQA+o1IUSnhccG4/Td7aFhHTx+E9GXh YgsX8kGfnXU97gQG1d74dyuQAAUlwXs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1725543387; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=nZdVYrZ6EDRsU39d17BXmFf7X7TxlT9WPklCFhJXXRQ=; b=XqFQVYGPgODj5A8+96GwoL1HIJOCHdxtE5Od+fna/cRz+213cb3MjEYJ25Ll3x8f+A6XVq 4yAGVvIaqdXBDICw== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1725543387; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=nZdVYrZ6EDRsU39d17BXmFf7X7TxlT9WPklCFhJXXRQ=; b=Yo9p8FJlMEY2PzZfGlcSFUHhZRex8gfF4hscrTNz4eGmuGmKZOcu+0oqPGXD0FxOcgcRVm 6ANfw7uM/ixt+mb91JPDiUdDKPko9UYhsUzIjLFtQA+o1IUSnhccG4/Td7aFhHTx+E9GXh YgsX8kGfnXU97gQG1d74dyuQAAUlwXs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1725543387; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=nZdVYrZ6EDRsU39d17BXmFf7X7TxlT9WPklCFhJXXRQ=; b=XqFQVYGPgODj5A8+96GwoL1HIJOCHdxtE5Od+fna/cRz+213cb3MjEYJ25Ll3x8f+A6XVq 4yAGVvIaqdXBDICw== Date: Thu, 5 Sep 2024 15:36:27 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/3] tree-optimization/116609 - SLP live lane vectorization with partial vectors MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-0.21 / 50.00]; BAYES_HAM(-3.00)[99.99%]; MISSING_MID(2.50)[]; NEURAL_SPAM_LONG(0.59)[0.168]; NEURAL_HAM_SHORT(-0.19)[-0.965]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[] X-Spam-Score: -0.21 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Message-Id: <20240905133656.3A77B3864835@sourceware.org> The following implements the simple case of single-lane SLP when using partial vectors which can use the VEC_EXTRACT_LAST code generation without changes. I'll keep the PR open for further enhancements. This avoids FAILs of gcc.target/aarch64/sve/live_1.c when using single-lane SLP for non-grouped stores. PR tree-optimization/116609 * tree-vect-loop.cc (vectorizable_live_operation_1): Support partial vectors for single-lane SLP. --- gcc/tree-vect-loop.cc | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 242d5e2d916..31cdc4bf53d 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10961,7 +10961,8 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo, where VEC_LHS is the vectorized live-out result and MASK is the loop mask for the final iteration. */ - gcc_assert (ncopies == 1 && !slp_node); + gcc_assert (ncopies == 1 + && (!slp_node || SLP_TREE_LANES (slp_node) == 1)); gimple_seq tem = NULL; gimple_stmt_iterator gsi = gsi_last (tem); tree len = vect_get_loop_len (loop_vinfo, &gsi, @@ -10995,7 +10996,7 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo, where VEC_LHS is the vectorized live-out result and MASK is the loop mask for the final iteration. */ - gcc_assert (!slp_node); + gcc_assert (!slp_node || SLP_TREE_LANES (slp_node) == 1); tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info)); gimple_seq tem = NULL; gimple_stmt_iterator gsi = gsi_last (tem); @@ -11147,7 +11148,7 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, /* No transformation required. */ if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) { - if (slp_node) + if (slp_node && SLP_TREE_LANES (slp_node) != 1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -11156,7 +11157,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, "the loop.\n"); LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } - else if (ncopies > 1) + else if (ncopies > 1 + || (slp_node && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -11166,7 +11168,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, } else { - gcc_assert (ncopies == 1 && !slp_node); + gcc_assert (ncopies == 1 + && (!slp_node || SLP_TREE_LANES (slp_node) == 1)); if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype, OPTIMIZE_FOR_SPEED)) vect_record_loop_mask (loop_vinfo, @@ -11213,8 +11216,9 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, if (slp_node) { gcc_assert (!loop_vinfo - || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) - && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))); + || ((!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + || SLP_TREE_LANES (slp_node) == 1)); /* Get the correct slp vectorized stmt. */ vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry]; From patchwork Thu Sep 5 13:36:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 97174 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3AD7A384AB42 for ; Thu, 5 Sep 2024 13:37:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id 5ADF7385EC1D for ; Thu, 5 Sep 2024 13:36:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5ADF7385EC1D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5ADF7385EC1D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725543397; cv=none; b=LEblnIcJmbLNKME7+sE1QjhTVQLlw+CyG/NDLem5HyO6svNHgvk29l7BlAILMEc65+DpoNFyWRqN3yluRyCm8Uh1Yt1rcRtMKW5CvfNm7+12Tk9JsBCUAbDy+f2aILGoHrJya8QaHvnuXwiBCqj3UO3men8cub8b9ygm2nCw9ds= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725543397; c=relaxed/simple; bh=3d5zr12K4dB5vVG5LpVvLi4x+HPIRMuQrHee8p2bgx0=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=aYvxZj7qVbiQewmSTkRLZ8B4qGq7LMsHD6xixJ+tvunWXoVMJ1Uin5D4gKs6elRV03aI4reZCl7yfc+vMMkD/jRkxKbDFIWSuPs4KY9vTqylKb17Dzu89Bnh+UY9eSB4fOSVrQxnw2IyVkRRbY3VKiAi+93VFrwU9823lskaxOs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 707191F7D4; Thu, 5 Sep 2024 13:36:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1725543394; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=BExDlxvc+kZv9nENKrNaiPZWglk7+yeq8eBpn23aLwE=; b=Pu0cSXX4dQHnUipwSLiKWOdfINNvU65/cdl8fpTnmr6CnxHKa6cB8b7PzhTeyN8VOubfz3 OT4kRO+ZNFugTDcedN8u8/S2nQHR2LoHBaUgUS36aw2ieg7pFdS8tqZmd2TbidA31G+8zm ReNLuxeGq2HsNoZy9fZgHm9ifDin9D0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1725543394; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=BExDlxvc+kZv9nENKrNaiPZWglk7+yeq8eBpn23aLwE=; b=UgnRmR85H6Q8L3z1S4gi9XKArYddU3+GhrOqu7roBoi6S9ooK0VotirRn5gyyeJpFEu54t Dhu4ViAJ1Dty2SCA== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1725543394; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=BExDlxvc+kZv9nENKrNaiPZWglk7+yeq8eBpn23aLwE=; b=Pu0cSXX4dQHnUipwSLiKWOdfINNvU65/cdl8fpTnmr6CnxHKa6cB8b7PzhTeyN8VOubfz3 OT4kRO+ZNFugTDcedN8u8/S2nQHR2LoHBaUgUS36aw2ieg7pFdS8tqZmd2TbidA31G+8zm ReNLuxeGq2HsNoZy9fZgHm9ifDin9D0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1725543394; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=BExDlxvc+kZv9nENKrNaiPZWglk7+yeq8eBpn23aLwE=; b=UgnRmR85H6Q8L3z1S4gi9XKArYddU3+GhrOqu7roBoi6S9ooK0VotirRn5gyyeJpFEu54t Dhu4ViAJ1Dty2SCA== Date: Thu, 5 Sep 2024 15:36:34 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: RISC-V CI Subject: [PATCH 2/3] tree-optimization/116610 - wrong SLP induction bias for mask peeling MIME-Version: 1.0 X-Spam-Score: -1.13 X-Spamd-Result: default: False [-1.13 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-0.34)[-0.341]; NEURAL_HAM_SHORT(-0.19)[-0.968]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_TWO(0.00)[2]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; MISSING_XM_UA(0.00)[]; TO_DN_SOME(0.00)[] X-Spam-Level: X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Message-Id: <20240905133753.3AD7A384AB42@sourceware.org> The following fixes a mistake when applying the bias for peeling via masking to the inital value of SLP inductions. This resolves gcc.target/aarch64/sve/peel_ind_1.c (a scan-assembler only unfortunately) when forcing single-lane SLP for it. PR tree-optimization/116610 * tree-vect-loop.cc (vectorizable_induction): Use MINUS_EXPR to apply a mask peeling adjustment. --- gcc/tree-vect-loop.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 31cdc4bf53d..a879a13bbf0 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10543,7 +10543,7 @@ vectorizable_induction (loop_vec_info loop_vinfo, vec_steps.safe_push (vec_step); tree step_mul = gimple_build_vector (&init_stmts, &mul_elts); if (peel_mul) - step_mul = gimple_build (&init_stmts, PLUS_EXPR, step_vectype, + step_mul = gimple_build (&init_stmts, MINUS_EXPR, step_vectype, step_mul, peel_mul); if (!init_node) vec_init = gimple_build_vector (&init_stmts, &init_elts); From patchwork Thu Sep 5 13:36:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 97173 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 66EB13865C2B for ; Thu, 5 Sep 2024 13:37:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id A3BB23864832 for ; Thu, 5 Sep 2024 13:36:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A3BB23864832 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A3BB23864832 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725543422; cv=none; b=j7in2iWumFDYaVWATOZwo5cyTnN5eHtDLX0tJZjteCr0fbLJ1rf2suFQW37qQ90Kz3QzWl9hYDyOCqbcoS9v2LwAe+wgfqsIuvM2Weent+kWMCkRdVFEiMNomFaBAxlo1Uva1I6tEwgZhd3NjYLPDcHjrcMrNxRRwXX7MX+Pax0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725543422; c=relaxed/simple; bh=HNu5Zgy5bIYXtvpL0YPu/debX7ZNx8s1YAFRPzR7/hk=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=e+chAJZV86NHY14VmrGMMhIfpGtlUJja7OQC798rk+H2hT7+RbTvK1a52+eMoeVdxVYFpIJWu8eFmVYrtgThDScw+mPgbziobFWrZN0rriRb1wqmkb98X0Nijr15zrcdsM+A5hpZfyrxfiV+8gCmpNOmR5hIG05RuFlw9cl/b6U= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id A542B1F7D4; Thu, 5 Sep 2024 13:36:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1725543417; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=sa4eVebgfCveMTAnP+U9vnMAh6Ea7dmt8agwP1ZQnek=; b=cTG22pUOE40l5kWDHoClJUl0ZpYsXwim8nvTfcZQJztHbhs5LoFDx68TC4mmBHlH7L3YwO VCNcdnFw2s2EGmXg05PnBw5XczWr687TcKwYDScbLVR48rGlQ7RedXBLOHZMzHHcgT4+ba FxHPvROQYKdAFWKBZenzAXevQ17YyY0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1725543417; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=sa4eVebgfCveMTAnP+U9vnMAh6Ea7dmt8agwP1ZQnek=; b=MaKwykBlgfiUQ4ypylLgvDqVqHuM/DTXm5MUiMD7EN3T+6uMPC7FCp4gB2U2BxBHqKTTio MYE8lbHkprBHanBA== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1725543416; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=sa4eVebgfCveMTAnP+U9vnMAh6Ea7dmt8agwP1ZQnek=; b=TiHLApWs7HuLRn0bc4JS1FR5LHOepOD5u0ViqKf5dIBXESgj2v+lxAh4r39mQxpQcYcoIS bxOw6Z0hjdhAhcXzaZH0wHCukFpGgYKcU2dGD6kUCT4PWznhVGznTHnRe9pnOPfL6PsuhB Zn78wLHMH5keK+od0QwAuueKlsInG/M= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1725543416; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=sa4eVebgfCveMTAnP+U9vnMAh6Ea7dmt8agwP1ZQnek=; b=TlYf/frSTsNPV4NNE8PY4MOobabg9YoRtz+9NsB7upeXyyCGZVfMY5rerkR84eihPCD51s Xf6qL9cExyro5pCQ== Date: Thu, 5 Sep 2024 15:36:56 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: RISC-V CI Subject: [PATCH 3/3] Handle non-grouped stores as single-lane SLP MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-0.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_SHORT(-0.19)[-0.960]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-0.01)[-0.006]; MISSING_XM_UA(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; RCPT_COUNT_TWO(0.00)[2]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; DBL_BLOCKED_OPENRESOLVER(0.00)[gcc.target:url] X-Spam-Score: -0.80 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Message-Id: <20240905133732.66EB13865C2B@sourceware.org> The following enables single-lane loop SLP discovery for non-grouped stores and adjusts vectorizable_store to properly handle those. For gfortran.dg/vect/vect-8.f90 we vectorize one additional loop, not running into the "not falling back to strided accesses" bail-out. I have not investigated in detail. There is a set of i386 target assembler test FAILs, gcc.target/i386/pr88531-2[bc].c in particular fail because the target cannot identify SLP emulated gathers, see another mail from me. Others need adjustment, I've adjusted one with this patch only. In particular there are gcc.target/i386/cond_op_fma_*-1.c FAILs that are because we no longer fold a VEC_COND_EXPR during the region value-numbering we do after vectorization since we code-generate a { 0.0, ... } constant in the VEC_COND_EXPR now instead of having a separate statement which gets forwarded and then triggers folding. This leads to sligtly different code generation. The solution is probably to use gimple_build when building stmts or, in this case, directly emit .COND_FMA instead of .FMA and a VEC_COND_EXPR. gcc.dg/vect/slp-19a.c mixes contiguous 8-lane SLP with a single lane contiguous store from one lane of the 8-lane load and we expect to use load-lanes for this reason but the heuristic for forcing single-lane rediscovery as implemented doesn't trigger here as it treats both SLP instances separately. FAILs on RISC-V gcc.dg/vect/slp-19c.c shows we fail to implement an interleaving scheme for group_size 12 (by extension using the group_size 3 scheme to reduce to 4 lanes and then continue with a pow2 scheme would work); we are also not considering load-lanes because of the above reason, but aarch64 cannot do ld12. FAILs on AARCH64 (load requires three vectors) and x86_64. gcc.dg/vect/slp-19c.c FAILs with variable-length vectors because of "SLP induction not supported for variable-length vectors". gcc.target/aarch64/pr110449.c will FAIL because the (contested) optimization in r14-2367-g224fd59b2dc8a5 was only applied to loop-vect but not SLP vect. I'll leave it to target maintainers to either XFAIL (the optimization is bad) or remove the test. * tree-vect-slp.cc (vect_analyze_slp): Perform single-lane loop SLP discovery for non-grouped stores. Move check on the root for re-doing SLP analysis with a single lane for load/store-lanes earlier and make sure we are dealing with a grouped access. * tree-vect-stmts.cc (vectorizable_store): Always set vec_num for SLP. * gcc.dg/vect/O3-pr39675-2.c: Adjust expected number of SLP. * gcc.dg/vect/fast-math-vect-call-1.c: Likewise. * gcc.dg/vect/no-scevccp-slp-31.c: Likewise. * gcc.dg/vect/slp-12b.c: Likewise. * gcc.dg/vect/slp-12c.c: Likewise. * gcc.dg/vect/slp-19a.c: Likewise. * gcc.dg/vect/slp-19b.c: Likewise. * gcc.dg/vect/slp-4-big-array.c: Likewise. * gcc.dg/vect/slp-4.c: Likewise. * gcc.dg/vect/slp-5.c: Likewise. * gcc.dg/vect/slp-7.c: Likewise. * gcc.dg/vect/slp-perm-7.c: Likewise. * gcc.dg/vect/slp-37.c: Likewise. * gcc.dg/vect/slp-26.c: RISC-V can now SLP two instances. * gcc.dg/vect/vect-outer-slp-3.c: Disable vectorization of initialization loop. * gcc.dg/vect/slp-reduc-5.c: Likewise. * gcc.dg/vect/no-scevccp-outer-12.c: Un-XFAIL. SLP can handle inner loop inductions with multiple vector stmt copies. * gfortran.dg/vect/vect-8.f90: Adjust expected number of vectorized loops. * gcc.target/i386/vectorize1.c: Adjust what we scan for. --- gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c | 2 +- .../gcc.dg/vect/fast-math-vect-call-1.c | 2 +- .../gcc.dg/vect/fast-math-vect-call-2.c | 2 +- .../gcc.dg/vect/no-scevccp-outer-12.c | 3 +- gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c | 5 ++- gcc/testsuite/gcc.dg/vect/slp-12b.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-12c.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-19a.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-19b.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-26.c | 3 +- gcc/testsuite/gcc.dg/vect/slp-37.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-4-big-array.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-4.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-5.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-7.c | 4 +- gcc/testsuite/gcc.dg/vect/slp-perm-7.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-reduc-5.c | 3 +- gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c | 1 + gcc/testsuite/gcc.target/i386/vectorize1.c | 4 +- gcc/testsuite/gfortran.dg/vect/vect-8.f90 | 2 +- gcc/tree-vect-slp.cc | 45 +++++++++++++++---- gcc/tree-vect-stmts.cc | 11 +++-- 22 files changed, 69 insertions(+), 36 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c b/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c index c3f0f6dc1be..ddaac56cc0b 100644 --- a/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c +++ b/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c @@ -27,5 +27,5 @@ foo () } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided4 } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c index ad22f6e82b3..6c9b7c37b6e 100644 --- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c +++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c @@ -101,4 +101,4 @@ main () } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target { vect_call_copysignf && vect_call_sqrtf } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target { { vect_call_copysignf && vect_call_sqrtf } && vect_perm3_int } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target { { vect_call_copysignf && vect_call_sqrtf } && vect_perm3_int } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c index d51e17ff656..ed42a21cedc 100644 --- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c +++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c @@ -132,4 +132,4 @@ main () } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" { target vect_call_lrint } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target vect_call_lrint } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" { target vect_call_lrint } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c index c2d3031bc0c..6ace6ad022e 100644 --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c @@ -46,5 +46,4 @@ int main (void) return 0; } -/* Until we support multiple types in the inner loop */ -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */ +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c index 22817a57ef8..f6ac5f60298 100644 --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c @@ -53,6 +53,7 @@ int main (void) return 0; } +/* We cannot handle grouped accesses in outer loops. */ +/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */ - +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-12b.c b/gcc/testsuite/gcc.dg/vect/slp-12b.c index e2ea24d6c53..8e06e3bfa93 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-12b.c +++ b/gcc/testsuite/gcc.dg/vect/slp-12b.c @@ -47,6 +47,6 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided2 && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided2 && vect_int_mult } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided2 && vect_int_mult } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided2 && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided2 && vect_int_mult } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-12c.c b/gcc/testsuite/gcc.dg/vect/slp-12c.c index 9c48dff3bf4..a3536e3053b 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-12c.c +++ b/gcc/testsuite/gcc.dg/vect/slp-12c.c @@ -49,5 +49,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_int_mult } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int_mult } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_int_mult } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-19a.c b/gcc/testsuite/gcc.dg/vect/slp-19a.c index ca7a0a8e456..6c21416046d 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-19a.c +++ b/gcc/testsuite/gcc.dg/vect/slp-19a.c @@ -57,5 +57,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided8 } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided8 } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided8 } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided8} } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-19b.c b/gcc/testsuite/gcc.dg/vect/slp-19b.c index 4d53ac698db..10b84aab3b5 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-19b.c +++ b/gcc/testsuite/gcc.dg/vect/slp-19b.c @@ -54,5 +54,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided4 } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided4 } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided4 } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c b/gcc/testsuite/gcc.dg/vect/slp-26.c index cfb763bf519..cdb5d9c694b 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-26.c +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c @@ -50,4 +50,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || { amdgcn-*-* || loongarch_sx } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target riscv_v } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-37.c b/gcc/testsuite/gcc.dg/vect/slp-37.c index caee2bb508f..8a430e63847 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-37.c +++ b/gcc/testsuite/gcc.dg/vect/slp-37.c @@ -60,4 +60,4 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_hw_misalign } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_hw_misalign } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_hw_misalign } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c index fcda45ff368..f738a613324 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c +++ b/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c @@ -131,5 +131,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-4.c b/gcc/testsuite/gcc.dg/vect/slp-4.c index 29e741df02b..1ecad7415ef 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-4.c +++ b/gcc/testsuite/gcc.dg/vect/slp-4.c @@ -125,5 +125,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-5.c b/gcc/testsuite/gcc.dg/vect/slp-5.c index 6d51f6a7323..484898c2afd 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-5.c +++ b/gcc/testsuite/gcc.dg/vect/slp-5.c @@ -124,5 +124,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 5 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-7.c b/gcc/testsuite/gcc.dg/vect/slp-7.c index 2845a99dedf..f83fdc96d16 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-7.c +++ b/gcc/testsuite/gcc.dg/vect/slp-7.c @@ -125,6 +125,6 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target vect_short_mult } } }*/ /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { ! { vect_short_mult } } } } }*/ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_short_mult } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! { vect_short_mult } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 5 "vect" { target vect_short_mult } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target { ! { vect_short_mult } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c index f15736ef729..9c522ba4705 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c @@ -97,6 +97,6 @@ int main (int argc, const char* argv[]) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm3_int || vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int || vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-5.c b/gcc/testsuite/gcc.dg/vect/slp-reduc-5.c index 11f5a7414cf..0cde79d9e49 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-reduc-5.c +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-5.c @@ -36,6 +36,7 @@ int main (void) check_vect (); +#pragma GCC novector for (i = 0; i < N; i++) c[i] = (i+3) * -1; @@ -44,6 +45,6 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail vect_no_int_min_max } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_min_max } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_min_max } } } */ /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c index 3dce51426b5..d315db5632b 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c @@ -30,6 +30,7 @@ int main () { check_vect (); +#pragma GCC novector for (int i = 0; i < 40; ++i) image[i] = 1.; diff --git a/gcc/testsuite/gcc.target/i386/vectorize1.c b/gcc/testsuite/gcc.target/i386/vectorize1.c index f3b9bfba382..14a8c5f28b3 100644 --- a/gcc/testsuite/gcc.target/i386/vectorize1.c +++ b/gcc/testsuite/gcc.target/i386/vectorize1.c @@ -1,6 +1,6 @@ /* PR middle-end/28915 */ /* { dg-do compile } */ -/* { dg-options "-msse -O2 -ftree-vectorize -fdump-tree-vect" } */ +/* { dg-options "-msse -O2 -ftree-vectorize -fdump-tree-vect-optimized" } */ extern char lanip[3][40]; typedef struct @@ -17,4 +17,4 @@ int set_names (void) tt1.t[ln] = lanip[1]; } -/* { dg-final { scan-tree-dump "vect_cst" "vect" } } */ +/* { dg-final { scan-tree-dump "optimized: loop vectorized" "vect" } } */ diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 b/gcc/testsuite/gfortran.dg/vect/vect-8.f90 index 557a523e2bd..2a3fa90740e 100644 --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90 @@ -708,5 +708,5 @@ END SUBROUTINE kernel ! { dg-final { scan-tree-dump-times "vectorized 2\[56\] loops" 1 "vect" { target aarch64_sve } } } ! { dg-final { scan-tree-dump-times "vectorized 2\[45\] loops" 1 "vect" { target { aarch64*-*-* && { ! aarch64_sve } } } } } -! { dg-final { scan-tree-dump-times "vectorized 2\[234\] loops" 1 "vect" { target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } } +! { dg-final { scan-tree-dump-times "vectorized 2\[345\] loops" 1 "vect" { target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } } ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { target { { ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } } diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index b6839c7707b..3d2973698e2 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -4548,6 +4548,7 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo, opt_result vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) { + loop_vec_info loop_vinfo = dyn_cast (vinfo); unsigned int i; stmt_vec_info first_element; slp_instance instance; @@ -4564,6 +4565,28 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) vect_analyze_slp_instance (vinfo, bst_map, first_element, slp_inst_kind_store, max_tree_size, &limit); + /* For loops also start SLP discovery from non-grouped stores. */ + if (loop_vinfo) + { + data_reference_p dr; + FOR_EACH_VEC_ELT (vinfo->shared->datarefs, i, dr) + if (DR_IS_WRITE (dr)) + { + stmt_vec_info stmt_info = vinfo->lookup_dr (dr)->stmt; + /* Grouped stores are already handled above. */ + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) + continue; + vec stmts; + vec roots = vNULL; + vec remain = vNULL; + stmts.create (1); + stmts.quick_push (stmt_info); + vect_build_slp_instance (vinfo, slp_inst_kind_store, + stmts, roots, remain, max_tree_size, + &limit, bst_map, NULL); + } + } + if (bb_vec_info bb_vinfo = dyn_cast (vinfo)) { for (unsigned i = 0; i < bb_vinfo->roots.length (); ++i) @@ -4750,6 +4773,18 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) int group_size = SLP_TREE_LANES (slp_root); tree vectype = SLP_TREE_VECTYPE (slp_root); + stmt_vec_info rep_info = SLP_TREE_REPRESENTATIVE (slp_root); + gimple *rep = STMT_VINFO_STMT (rep_info); + bool masked = (is_gimple_call (rep) + && gimple_call_internal_p (rep) + && internal_fn_mask_index + (gimple_call_internal_fn (rep)) != -1); + if (!STMT_VINFO_GROUPED_ACCESS (rep_info) + || slp_root->ldst_lanes + || (vect_store_lanes_supported (vectype, group_size, masked) + == IFN_LAST)) + continue; + auto_vec loads; hash_set visited; vect_gather_slp_loads (loads, slp_root, visited); @@ -4773,17 +4808,9 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) } } - gimple *rep = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (slp_root)); - bool masked = (is_gimple_call (rep) - && gimple_call_internal_p (rep) - && internal_fn_mask_index - (gimple_call_internal_fn (rep)) != -1); /* If the loads and stores can use load/store-lanes force re-discovery with single lanes. */ - if (loads_permuted - && !slp_root->ldst_lanes - && vect_store_lanes_supported (vectype, group_size, masked) - != IFN_LAST) + if (loads_permuted) { bool can_use_lanes = true; FOR_EACH_VEC_ELT (loads, j, load_node) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 25b120c158e..f6c5b7a7e87 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -8355,10 +8355,12 @@ vectorizable_store (vec_info *vinfo, return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, ncopies); } - if (grouped_store) + if (grouped_store || slp) { /* FORNOW */ - gcc_assert (!loop || !nested_in_vect_loop_p (loop, stmt_info)); + gcc_assert (!grouped_store + || !loop + || !nested_in_vect_loop_p (loop, stmt_info)); if (slp) { @@ -8367,8 +8369,9 @@ vectorizable_store (vec_info *vinfo, group. */ vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); first_stmt_info = SLP_TREE_SCALAR_STMTS (slp_node)[0]; - gcc_assert (DR_GROUP_FIRST_ELEMENT (first_stmt_info) - == first_stmt_info); + gcc_assert (!STMT_VINFO_GROUPED_ACCESS (first_stmt_info) + || (DR_GROUP_FIRST_ELEMENT (first_stmt_info) + == first_stmt_info)); first_dr_info = STMT_VINFO_DR_INFO (first_stmt_info); op = vect_get_store_rhs (first_stmt_info); }