From patchwork Wed Apr 13 12:57:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 52851 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A06353858C53 for ; Wed, 13 Apr 2022 12:57:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A06353858C53 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1649854666; bh=MNcBqna8BMk8InjJGZqQ5OW7baue7bYfN8ZYQRSrqcc=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=B8DpXqm1gt1lQI73yvl0OWIpZwgkYheNlQycGATxB4/rdBZZCzKxhLOmXmgng25Il iYuCg31yO1whF6WXDSDMIoi94xRGhxFYDWhCWFYu8smJMb2SH6mNmKA07hOQmwCmwx mb8Ud7kz8ZU77BL5cWDfNjWbAk4lcVzfuYmeQHtg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 68D893858C53 for ; Wed, 13 Apr 2022 12:57:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 68D893858C53 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 825761F856; Wed, 13 Apr 2022 12:57:16 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5CA0F13AB8; Wed, 13 Apr 2022 12:57:16 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qFaLFazIVmIAcQAAMHmgww (envelope-from ); Wed, 13 Apr 2022 12:57:16 +0000 Date: Wed, 13 Apr 2022 14:57:15 +0200 (CEST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns MIME-Version: 1.0 Message-Id: <20220413125716.5CA0F13AB8@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Cc: richard.sandiford@arm.com, richard.earnshaw@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" When doing BB vectorization the scalar cost compute is derailed by patterns, causing lanes to be considered live and thus not costed on the scalar side. For the testcase in PR104010 this prevents vectorization which was done by GCC 11. PR103941 shows similar cases of missed optimizations that are fixed by this patch. Bootstrapped and tested on x86_64-unknown-linux-gnu. I'm only considering this now because PR104010 is identified as regression on arm - Richards, what do you think? I do think this will enable vectorization of more stuff now which might be good or bad - who knowns, but at least it needs to involve patterns. Thanks, Richard. 2022-04-13 Richard Biener PR tree-optimization/104010 PR tree-optimization/103941 * tree-vect-slp.cc (vect_bb_slp_scalar_cost): When we run into stmts in patterns continue walking those for uses outside of the vectorized region instead of marking the lane live. * gcc.target/i386/pr103941-1.c: New testcase. * gcc.target/i386/pr103941-2.c: Likewise. --- gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++ gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++ gcc/tree-vect-slp.cc | 47 ++++++++++++++++------ 3 files changed, 61 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c new file mode 100644 index 00000000000..524fdd0b4b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ + +unsigned char ur[16], ua[16], ub[16]; + +void avgu_v2qi (void) +{ + int i; + + for (i = 0; i < 2; i++) + ur[i] = (ua[i] + ub[i] + 1) >> 1; +} + +/* { dg-final { scan-assembler "pavgb" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c new file mode 100644 index 00000000000..972a32be997 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ + +void foo (int *c, float *x, float *y) +{ + c[0] = x[0] < y[0]; + c[1] = x[1] < y[1]; + c[2] = x[2] < y[2]; + c[3] = x[3] < y[3]; +} + +/* { dg-final { scan-assembler "cmpltps" } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 4ac2b70303c..c7687065374 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo, the scalar cost. */ if (!STMT_VINFO_LIVE_P (stmt_info)) { - FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF) + auto_vec worklist; + hash_set *worklist_visited = NULL; + worklist.quick_push (orig_stmt); + do { - imm_use_iterator use_iter; - gimple *use_stmt; - FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) - if (!is_gimple_debug (use_stmt)) - { - stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt); - if (!use_stmt_info - || !vectorized_scalar_stmts.contains (use_stmt_info)) + gimple *work_stmt = worklist.pop (); + FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF) + { + imm_use_iterator use_iter; + gimple *use_stmt; + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, + DEF_FROM_PTR (def_p)) + if (!is_gimple_debug (use_stmt)) { - (*life)[i] = true; - break; + stmt_vec_info use_stmt_info + = vinfo->lookup_stmt (use_stmt); + if (!use_stmt_info + || !vectorized_scalar_stmts.contains (use_stmt_info)) + { + if (STMT_VINFO_IN_PATTERN_P (use_stmt_info)) + { + /* For stmts participating in patterns we have + to check its uses recursively. */ + if (!worklist_visited) + worklist_visited = new hash_set (); + if (!worklist_visited->add (use_stmt)) + worklist.safe_push (use_stmt); + continue; + } + (*life)[i] = true; + goto next_lane; + } } - } + } } + while (!worklist.is_empty ()); +next_lane: + if (worklist_visited) + delete worklist_visited; if ((*life)[i]) continue; }