From patchwork Thu Jan 13 01:48:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 49943 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 51F403857830 for ; Thu, 13 Jan 2022 01:49:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 51F403857830 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1642038597; bh=dFbZX5S0N9d29m/3QWnYSo6npIs8vIH84tSYC7CkIZc=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=TCpORFsrVH8pIMQTqigsnlC6N3y2ygEJlinoUD6o9a3v6iURB+grwM7RoHGLoFt6+ VMo791uuhkxussKwj8l/bD19O7Y0Ho8XnNziZXJhcfOLfrN9VqCsTmwBf/2AN7cBD5 l5jr6dTHG3R2P84yp2YA3kY7aoJE/OFkvyXPaieE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8F8973858D39 for ; Thu, 13 Jan 2022 01:48:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F8973858D39 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 20D0vTPO030528; Thu, 13 Jan 2022 01:48:31 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3dja0cgnh5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 13 Jan 2022 01:48:31 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 20D1aJpU030002; Thu, 13 Jan 2022 01:48:30 GMT Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 3dja0cgngn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 13 Jan 2022 01:48:30 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 20D1fVgh012240; Thu, 13 Jan 2022 01:48:28 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma03fra.de.ibm.com with ESMTP id 3df289pe8a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 13 Jan 2022 01:48:28 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 20D1mObU44433916 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 13 Jan 2022 01:48:24 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CC0FEAE057; Thu, 13 Jan 2022 01:48:24 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 97A8FAE04D; Thu, 13 Jan 2022 01:48:23 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 13 Jan 2022 01:48:23 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/2] Add assumption combining iv Date: Thu, 13 Jan 2022 09:48:21 +0800 Message-Id: <20220113014821.168869-2-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220113014821.168869-1-guojiufu@linux.ibm.com> References: <20220113014821.168869-1-guojiufu@linux.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 29H5WIgVpk3vi0MaRjFrvIhbRDI7ANan X-Proofpoint-ORIG-GUID: 1M6aUa8VOEN8Wyq05o_dCA836JaLoXs8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-12_08,2022-01-11_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 malwarescore=0 spamscore=0 lowpriorityscore=0 phishscore=0 clxscore=1015 bulkscore=0 suspectscore=0 mlxscore=0 impostorscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2201130002 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jiufu Guo via Gcc-patches From: Jiufu Guo Reply-To: Jiufu Guo Cc: rguenther@suse.de, segher@kernel.crashing.org, wschmidt@linux.ibm.com, jlaw@tachyum.com, dje.gcc@gmail.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This is the second patch for two IVs combining. When one IV is chasing another one, to make it safe, we should check if there is wrap/overflow for either IV. With the assumption, which computed as this patch, the number of iterations can be caculated, even the no_overflow flag is not updated for unsigned IVs, like the test case of this patch. This patch pass bootstrap and regtest on ppc64le and x86_64. Is this ok for trunk, or it may more suitable for stage1. BR, Jiufu PR tree-optimization/102131 gcc/ChangeLog: * tree-ssa-loop-niter.c (get_step_count): New function. (iv_chase_assumption): New function. (number_of_iterations_cond): Call iv_chase_assumption. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr102131.c: New test. --- gcc/tree-ssa-loop-niter.c | 73 ++++++++++++++++++++++++---- gcc/testsuite/gcc.dg/vect/pr102131.c | 47 ++++++++++++++++++ 2 files changed, 110 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr102131.c diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 439d595a79f..56261164f28 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -1788,6 +1788,63 @@ dump_affine_iv (FILE *file, affine_iv *iv) } } +/* Generate expr: (HIGH - LOW) / STEP, under UTYPE. */ + +static tree +get_step_count (tree high, tree low, tree step, tree utype, + bool end_inclusive = false) +{ + tree delta = fold_build2 (MINUS_EXPR, TREE_TYPE (low), high, low); + delta = fold_convert (utype, delta); + if (end_inclusive) + delta = fold_build2 (PLUS_EXPR, utype, delta, build_one_cst (utype)); + + if (tree_int_cst_sign_bit (step)) + step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step); + step = fold_convert (utype, step); + + return fold_build2 (FLOOR_DIV_EXPR, utype, delta, step); +} + +/* Get the additional assumption if both two steps are not zero. + Assumptions satisfy that there is no overflow or wrap during + v0 and v1 chasing. */ + +static tree +iv_chase_assumption (affine_iv *iv0, affine_iv *iv1, tree step, + enum tree_code code) +{ + /* Here, no need additional assumptions for NE. */ + if (code == NE_EXPR) + return boolean_true_node; + + /* No need addition assumption for pointer. */ + tree type = TREE_TYPE (iv0->base); + if (POINTER_TYPE_P (type)) + return boolean_true_node; + + bool positive0 = !tree_int_cst_sign_bit (iv0->step); + bool positive1 = !tree_int_cst_sign_bit (iv1->step); + tree utype = unsigned_type_for (type); + bool add1 = code == LE_EXPR; + tree niter = get_step_count (iv1->base, iv0->base, step, utype, add1); + + int prec = TYPE_PRECISION (type); + signop sgn = TYPE_SIGN (type); + tree max = wide_int_to_tree (type, wi::max_value (prec, sgn)); + tree min = wide_int_to_tree (type, wi::min_value (prec, sgn)); + tree valid_niter0, valid_niter1; + + valid_niter0 = positive0 ? get_step_count (max, iv0->base, iv0->step, utype) + : get_step_count (iv0->base, min, iv0->step, utype); + valid_niter1 = positive1 ? get_step_count (max, iv1->base, iv1->step, utype) + : get_step_count (iv1->base, min, iv1->step, utype); + + tree e0 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter0); + tree e1 = fold_build2 (LT_EXPR, boolean_type_node, niter, valid_niter1); + return fold_build2 (TRUTH_AND_EXPR, boolean_type_node, e0, e1); +} + /* Determine the number of iterations according to condition (for staying inside loop) which compares two induction variables using comparison operator CODE. The induction variable on left side of the comparison @@ -1879,11 +1936,10 @@ number_of_iterations_cond (class loop *loop, {iv0.base, iv0.step - iv1.step} cmp_code {iv1.base, 0} provided that either below condition is satisfied: + a. iv0.step and iv1.step are integer. + b. Additional condition: before iv0 chase up v1, iv0 and iv1 should not + step over min or max of the type. */ - a) the test is NE_EXPR; - b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow. - - This rarely occurs in practice, but it is simple enough to manage. */ if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step)) { tree step_type = POINTER_TYPE_P (type) ? sizetype : type; @@ -1894,15 +1950,12 @@ number_of_iterations_cond (class loop *loop, if (tree_int_cst_sign_bit (step)) return false; - if (code != NE_EXPR - && (TREE_CODE (step) != INTEGER_CST - || !iv0->no_overflow || !iv1->no_overflow)) + niter->assumptions = iv_chase_assumption (iv0, iv1, step, code); + if (integer_zerop (niter->assumptions)) return false; iv0->step = step; - if (!POINTER_TYPE_P (type)) - iv0->no_overflow = false; - + iv0->no_overflow = true; iv1->step = build_int_cst (step_type, 0); iv1->no_overflow = true; } diff --git a/gcc/testsuite/gcc.dg/vect/pr102131.c b/gcc/testsuite/gcc.dg/vect/pr102131.c new file mode 100644 index 00000000000..23975cfeadb --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr102131.c @@ -0,0 +1,47 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-O3" } */ +#define MAX ((unsigned int) 0xffffffff) +#define MIN ((unsigned int) (0)) + +int arr[512]; + +#define FUNC(NAME, CODE, S0, S1) \ + unsigned __attribute__ ((noinline)) NAME (unsigned int b0, unsigned int b1) \ + { \ + unsigned int n = 0; \ + unsigned int i0, i1; \ + int *p = arr; \ + for (i0 = b0, i1 = b1; i0 CODE i1; i0 += S0, i1 += S1) \ + { \ + n++; \ + *p++ = i0 + i1; \ + } \ + return n; \ + } + +FUNC (lt_5_1, <, 5, 1); +FUNC (le_1_m5, <=, 1, -5); +FUNC (lt_1_10, <, 1, 10); + +int +main () +{ + int fail = 0; + if (lt_5_1 (MAX - 124, MAX - 27) != 28) + fail++; + + /* to save time, do not run this. */ + /* + if (le_1_m5 (MIN + 1, MIN + 9) != 715827885) + fail++; */ + + if (lt_1_10 (MAX - 1000, MAX - 500) != 51) + fail++; + + if (fail) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */