From patchwork Wed Jan 19 13:43:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 50232 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 982B6385841B for ; Wed, 19 Jan 2022 13:44:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 982B6385841B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1642599848; bh=yylXdsZngbk2BI8mQ6tVDafxVtZp7mk1wbyokQWE7+g=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=XiSF38/6zufJ04eRFV8SBQhBOsMhHHLCKA3LIa/1Ug0YLJ29EhlRhqS+ThGGRo4kb 9Lefgwks1pFBDD/XRMN2F2t1cpcYAKfwiaFSmRq1TIMbtWQW+BmIbknHIiY+DW7Svi SE3lEmGn4Nh1DPRFK1yEgX3B1YpAKSVeWuUE8Ss4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id C96873858D37 for ; Wed, 19 Jan 2022 13:43:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C96873858D37 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CEF221F38A; Wed, 19 Jan 2022 13:43:38 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B7B2413B56; Wed, 19 Jan 2022 13:43:38 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id tMCtK4oV6GEARwAAMHmgww (envelope-from ); Wed, 19 Jan 2022 13:43:38 +0000 Date: Wed, 19 Jan 2022 14:43:38 +0100 (CET) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/104112 - add check for vect epilogue reduc reuse Message-ID: MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Cc: richard.sandiford@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This adds a missing check for the availability of intermediate vector types required to re-use the accumulator of a vectorized reduction in the vectorized epilogue. For SVE and VNx2DF vs V2DF with -msve-vector-bits=512 for example V4DF is not available. In addition to that we have to verify the reduction operation is supported, otherwise we for example on i?86 get vector code that's later decomposed again by vector lowering when trying to use a V2HI epilogue for a V8HI reduction with a target without TARGET_MMX_WITH_SSE. It might be we want -Wvector-operation-performance for all vect.exp tests but that seems to have existing regressions. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. 2022-01-19 Richard Biener PR tree-optimization/104112 * tree-vect-loop.cc (vect_find_reusable_accumulator): Check for required intermediate vector types. * gcc.dg/vect/pr104112-1.c: New testcase. * gcc.dg/vect/pr104112-2.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr104112-1.c | 18 ++++++++++++++++++ gcc/testsuite/gcc.dg/vect/pr104112-2.c | 11 +++++++++++ gcc/tree-vect-loop.cc | 15 ++++++++++++++- 3 files changed, 43 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr104112-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/pr104112-2.c diff --git a/gcc/testsuite/gcc.dg/vect/pr104112-1.c b/gcc/testsuite/gcc.dg/vect/pr104112-1.c new file mode 100644 index 00000000000..84e69b85170 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr104112-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Ofast" } */ +/* { dg-additional-options "-march=armv8.2-a+sve -msve-vector-bits=512" { target aarch64-*-* } } */ + +void +boom(int n, double *a, double *x) +{ + int i, j; + double temp; + + for (j = n; j >= 1; --j) + { + temp = x[j]; + for (i = j - 1; i >= 1; --i) + temp += a[i + j] * x[i]; + x[j] = temp; + } +} diff --git a/gcc/testsuite/gcc.dg/vect/pr104112-2.c b/gcc/testsuite/gcc.dg/vect/pr104112-2.c new file mode 100644 index 00000000000..7469b3c5d84 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr104112-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* Diagnose vector ops that are later decomposed. */ +/* { dg-additional-options "-Wvector-operation-performance" } */ + +unsigned short foo (unsigned short *a, int n) +{ + unsigned short sum = 0; + for (int i = 0; i < n; ++i) + sum += a[i]; + return sum; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 0fe3529b2d1..0b2785a5ed6 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -4979,9 +4979,22 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo, /* Handle the case where we can reduce wider vectors to narrower ones. */ tree vectype = STMT_VINFO_VECTYPE (reduc_info); tree old_vectype = TREE_TYPE (accumulator->reduc_input); + unsigned HOST_WIDE_INT m; if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype), - TYPE_VECTOR_SUBPARTS (vectype))) + TYPE_VECTOR_SUBPARTS (vectype), &m)) return false; + /* Check the intermediate vector types are available. */ + while (m > 2) + { + m /= 2; + tree intermediate_vectype = get_related_vectype_for_scalar_type + (TYPE_MODE (vectype), TREE_TYPE (vectype), + exact_div (TYPE_VECTOR_SUBPARTS (old_vectype), m)); + if (!intermediate_vectype + || !directly_supported_p (STMT_VINFO_REDUC_CODE (reduc_info), + intermediate_vectype)) + return false; + } /* Non-SLP reductions might apply an adjustment after the reduction operation, in order to simplify the initialization of the accumulator.