From patchwork Tue Jul 16 10:09:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 93966 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 72E033858424 for ; Tue, 16 Jul 2024 10:10:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id 583563858424 for ; Tue, 16 Jul 2024 10:10:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 583563858424 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 583563858424 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721124604; cv=none; b=GCwm9byk5GI8qKcd8vwKR4lrUQKavZthCTw6LyWWXuQ9abvY66fbWGJ61plYCV9RJ2d6QojvwvE4TEaRP9VSlcjL1H9h+JuXcsNwogRvnPAho/Nhe43yR/nofkLF9w7NlIGxBt/SwgLPUR6xzjEQ//sHc7/eb2p+XlKCPoUkYuI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721124604; c=relaxed/simple; bh=33XlIa+zrdzn0rIKI9I9T61dTZTpy1yuSKIMgx5w4fQ=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version:Message-Id; b=MABjKWpqirvUo3DnFaCRiHMGaz3Te0GlSEyPIJC2CKqK1W54uoq0IdVmNS3KfLACfEfr2ZiAgsnvk+QPE/vuQfUcJt4yht7T6dU1dNR/Ffp4cCzKkdBQEGlaN67h2I/kkfY0ydgJci9MiUtxMMPSDCNhe21pWBT19hBwz/Iy+Rk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 1832C21BFB for ; Tue, 16 Jul 2024 10:10:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1721124600; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=pDuHpH8gtmePEQteNuETwBMH4RCIPzuR9lGnIj+tte0=; b=RvZsZaauRQcukwbQ/qwoN/BrpZMmw3Exv+LYcLx5VG4ZJLaKxxC6ETe8oDM7xEivMaNQ+M 2ywdrG0HMESRwmPHpFy4d8hPVSIYIvAU8P827DsBDTi5vKI7K1F/pIwkBjbK3CRlz/A35J 5a0/zMi0+L0bzXBQVhAXk2mC67AfcpQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1721124600; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=pDuHpH8gtmePEQteNuETwBMH4RCIPzuR9lGnIj+tte0=; b=n7cvM0FNE687FyOA8gcM9masdcO+Wb0p5xVUfur5GuLqFlK+nzRNqb3mT5cB6v7waDAE1O HCM7gTIZ4LxgdOCQ== Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=RvZsZaau; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=n7cvM0FN DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1721124600; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=pDuHpH8gtmePEQteNuETwBMH4RCIPzuR9lGnIj+tte0=; b=RvZsZaauRQcukwbQ/qwoN/BrpZMmw3Exv+LYcLx5VG4ZJLaKxxC6ETe8oDM7xEivMaNQ+M 2ywdrG0HMESRwmPHpFy4d8hPVSIYIvAU8P827DsBDTi5vKI7K1F/pIwkBjbK3CRlz/A35J 5a0/zMi0+L0bzXBQVhAXk2mC67AfcpQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1721124600; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=pDuHpH8gtmePEQteNuETwBMH4RCIPzuR9lGnIj+tte0=; b=n7cvM0FNE687FyOA8gcM9masdcO+Wb0p5xVUfur5GuLqFlK+nzRNqb3mT5cB6v7waDAE1O HCM7gTIZ4LxgdOCQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id F2833136E5 for ; Tue, 16 Jul 2024 10:09:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id boWROfdGlmbBKQAAD6G6ig (envelope-from ) for ; Tue, 16 Jul 2024 10:09:59 +0000 Date: Tue, 16 Jul 2024 12:09:55 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/115841 - reduction epilogue placement issue MIME-Version: 1.0 Message-Id: <20240716100959.F2833136E5@imap1.dmz-prg2.suse.org> X-Spamd-Result: default: False [-4.51 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; MISSING_XM_UA(0.00)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:dkim,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[gcc-patches@gcc.gnu.org]; DKIM_TRACE(0.00)[suse.de:+] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -4.51 X-Spam-Level: X-Rspamd-Queue-Id: 1832C21BFB X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org When emitting the compensation to the vectorized main loop for a vector reduction value to be re-used in the vectorized epilogue we fail to place it in the correct block when the main loop is known to be entered (no loop_vinfo->main_loop_edge) but the epilogue is not (a loop_vinfo->skip_this_loop_edge). The code currently disregards this situation. With the recent znver4 cost fix I couldn't trigger this situation with the testcase but I adjusted it so it could eventually trigger on other targets. Bootstrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimization/115841 * tree-vect-loop.cc (vect_transform_cycle_phi): Correctly place the partial vector reduction for the accumulator re-use when the main loop cannot be skipped but the epilogue can. * gcc.dg/vect/pr115841.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr115841.c | 42 ++++++++++++++++++++++++++++ gcc/tree-vect-loop.cc | 7 +++-- 2 files changed, 46 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr115841.c diff --git a/gcc/testsuite/gcc.dg/vect/pr115841.c b/gcc/testsuite/gcc.dg/vect/pr115841.c new file mode 100644 index 00000000000..aa5c66004a0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr115841.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Ofast -fcommon -fvect-cost-model=dynamic --param vect-partial-vector-usage=1" } */ +/* { dg-additional-options "-mavx512vl" { target avx512vl } } */ + +/* To trigger the bug costing needs to determine that aligning the A170 + accesses with a prologue is good and there should be a vectorized + epilogue with a smaller vector size, re-using the vector accumulator + from the vectorized main loop that's statically known to execute + but the epilogue loop is not. */ + +static unsigned char xl[192]; +unsigned char A170[192*3]; + +void jerate (unsigned char *, unsigned char *); +float foo (unsigned n) +{ + jerate (xl, A170); + + unsigned i = 32; + int kr = 1; + float sfn11s = 0.f; + float sfn12s = 0.f; + do + { + int krm1 = kr - 1; + long j = krm1; + float a = (*(float(*)[n])A170)[j]; + float b = (*(float(*)[n])xl)[j]; + float c = a * b; + float d = c * 6.93149983882904052734375e-1f; + float e = (*(float(*)[n])A170)[j+48]; + float f = (*(float(*)[n])A170)[j+96]; + float g = d * e; + sfn11s = sfn11s + g; + float h = f * d; + sfn12s = sfn12s + h; + kr++; + } + while (--i != 0); + float tem = sfn11s + sfn12s; + return tem; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a64b5082bd1..b8124a32128 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9026,14 +9026,15 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, /* And the reduction could be carried out using a different sign. */ if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def))) def = gimple_convert (&stmts, vectype_out, def); - if (loop_vinfo->main_loop_edge) + edge e; + if ((e = loop_vinfo->main_loop_edge) + || (e = loop_vinfo->skip_this_loop_edge)) { /* While we'd like to insert on the edge this will split blocks and disturb bookkeeping, we also will eventually need this on the skip edge. Rely on sinking to fixup optimal placement and insert in the pred. */ - gimple_stmt_iterator gsi - = gsi_last_bb (loop_vinfo->main_loop_edge->src); + gimple_stmt_iterator gsi = gsi_last_bb (e->src); /* Insert before a cond that eventually skips the epilogue. */ if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))