From patchwork Fri Feb 18 18:20:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 51215 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5C18738515C1 for ; Fri, 18 Feb 2022 18:20:22 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id CD455385842F for ; Fri, 18 Feb 2022 18:20:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CD455385842F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=pm3KuP5sEqt3JkPj+XfPXPEK1vs1XlZM6gk8Bge/e4I=; b=N1Hu+V2hnG3IJmiovkCTf39Iog 4UbBIPEr8lFzSWZxOsPNcZU5i8WhFviBkMpAtmxzyUfIzJYEbY5w6zDL/VXLDGY1AAnTo/D0MRY/D fvrkGEacwcYenhpyr1WZuXk8PLlnTlMrBWNSAFP4f4PyWx1OzbVlE5DhZo1xUtFygU52OI13/zmI8 4052Cynml12LAG576VeLn3hg8nH9NEf6rfj4Y2qOCHYOrG4X0a5jTK6e3gp+6ZbpRfxPlijLmO1zO rZeFTszlF4EnbBDz3TlWap/lyDv/zgrxqFqs5X8aPUiuH157fa4MRXxnhbQvHokwcNoZ1NXskSW4a lHrUZQvQ==; Received: from [185.62.158.67] (port=61567 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nL7rP-0004AY-0u; Fri, 18 Feb 2022 13:20:03 -0500 From: "Roger Sayle" To: Subject: [PATCH] PR middle-end/65855: Scalar evolution for quadratic chrecs Date: Fri, 18 Feb 2022 18:20:02 -0000 Message-ID: <003301d824f4$25ec8f70$71c5ae50$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adgk9CDwYgj6GqTnTFG1+NdnNBfrcw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch improves GCC's scalar evolution and final value replacement optimizations by supporting triangular/quadratic/trapezoid chrecs which resolves both PR middle-end/65855 and PR c/80852, but alas not (yet) PR tree-optimization/46186. I've listed Richard Biener as co-author, as this solution is based heavily on his proposed patch in comment #4 of PR 65855 from 2015, but with several significant changes. The most important change is a correction to formula used. For the scalar evolution {a, +, {b, +, c}}, there was an off-by-one error, so chrec_apply should not return a + b*x + c*x*(x+1)/2, but a + b*x + c*x*(x-1)/2, which explains why the original patch didn't produce the expected results. Another significant set of changes involves handling the various type mismatches and overflows. In chrec_apply the evolution is evaluated after x iterations (which is an unsigned integer type, called orig_type in this patch) but a, b and c may be any type including floating point (see PR tree-opt/23391) and including types that trap on overflow with -ftrapv (see PR tree-opt/79721), and in the case of pointer arithmetic, b and c may have a different type (sizetype) to a! Additionally, the division by two in "c*x*(x-1)/2" requires the correct top bit in modulo arithmetic, which means that the multiplication typically needs to be performed with more precision (in a wider mode) than orig_type [unless c is an even integer constant, or x (the number of iterations) is a known constant at compile-time]. Finally, final value replacement is only an effective optimization if the expression used to compute the result of the loop is cheaper than the loop itself, hence chrec_apply needs to return an optimized folded tree with the minimal number of operators. Hence if b == c, this patch calculates "a + c*x*(x+1)/2", when c is even we can perform the division at compile-time, and when x is a non-trivial expression, we wrap it in a SAVE_EXPR so that the lowering to gimple can reuse the common subexpression. Correctly handling all of the corner cases results in a patch significantly larger than the original "proof-of-concept". There's one remaining refinement, marked as TODO in the patch, which is to also support unsigned 64-bit to 128-bit widening multiplications (umulditi3) on targets that support it, such as x86_64, as used by LLVM, but this patch provides the core target-independent functionality. [This last piece would handle/resolve the testcase in PR tree-opt/46186 which requires 128-bit TImode operations, not suitable for all backend targets]. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-02-18 Roger Sayle Richard Biener gcc/ChangeLog PR target/65855 PR c/80852 * tree-chrec.cc (chrec_fold_divide_by_2): New function to divide a chrec by two, honoring the type of the chrec. (chrec_save_expr): New function to wrap a chrec in a SAVE_EXPR, but without TREE_SIDE_EFFECTS. (chrec_apply_triangular): New helper function of chrec apply to evaluate a quadratic/triangular chrec. (chrec_apply): Expand/clarify comment before function. Handle application of any chrec after zero iterations, i.e. A. Recursively handle cases where the iteration count is conditional. Handle quadratic/triangular chrecs by calling the new chrec_apply_triangular function. (chrec_convert_rhs): Handle conversion of integer constants to scalar floating point types here (moved from chrec_apply). * tree-scalar-evolution.cc (interpret_expr): Handle SAVE_EXPR by (tail call) recursion. (expression_expensive_p): Calculate the cost of a SAVE_EXPR as half the cost of its operand, i.e. assume it is reused. gcc/testsuite/ChangeLog PR target/65855 PR c/80852 * gcc.dg/tree-ssa/pr65855.c: New test case. * gcc.dg/tree-ssa/pr80852.c: New test case. * gcc.dg/vect/vect-iv-11.c: Update to reflect that the loop is no longer vectorized, but calculated by final value replacement. Roger diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc index c44cea7..63d7522 100644 --- a/gcc/tree-chrec.cc +++ b/gcc/tree-chrec.cc @@ -573,6 +573,246 @@ chrec_evaluate (unsigned var, tree chrec, tree n, unsigned int k) chrec_convert (ctype, chrec, NULL), binomial_n_k); } + +/* Fold the (exact) division of chrec by two. */ + +static tree +chrec_fold_divide_by_2 (tree type, tree op) +{ + if (SCALAR_FLOAT_TYPE_P (type)) + return fold_build2 (MULT_EXPR, type, op, build_real (type, dconsthalf)); + return fold_build2 (RSHIFT_EXPR, type, op, build_int_cst (type, 1)); +} + + +/* Indicate that a chrec subexpression is used multiple times. */ + +static tree +chrec_save_expr (tree t) +{ + if (is_gimple_min_invariant (t)) + return t; + t = save_expr (t); + if (TREE_CODE (t) == SAVE_EXPR) + TREE_SIDE_EFFECTS (t) = 0; + return t; +} + + +/* Helper function of chrec_apply. Evaluate a CHREC of the + form {A, +, {B, +, C}} at X to A + B*X + C*X*(X-1)/2. + The tricky part is avoiding information loss on overflow + and the potential mismatch of types between A, B and C + (of type TYPE) and X which is an unsigned integral type. */ + +static tree +chrec_apply_triangular (tree type, tree a, tree b, tree c, tree x) +{ + tree res, tmp; + + /* Determine whether c is even. */ + tree half_c = NULL_TREE; + if (TREE_CODE (c) == INTEGER_CST + && (wi::to_widest (c) & 1) == 0) + half_c = chrec_fold_divide_by_2 (type, c); + + /* Avoid introducing trapping arithmetic with -ftrapv, c.f. PR 79721. */ + tree ctype = type; + if (INTEGRAL_TYPE_P (type) && !TYPE_OVERFLOW_WRAPS (type)) + { + ctype = unsigned_type_for (type); + b = chrec_convert (ctype, b, NULL); + c = chrec_convert (ctype, c, NULL); + if (half_c) + half_c = chrec_convert (ctype, half_c, NULL); + } + + /* First, handle the easy case where X is an integer constant, + so we can evaluate X*(X-1)/2 using widest_int precision at + compile-time. */ + if (TREE_CODE (x) == INTEGER_CST) + { + wi::tree_to_widest_ref wix = wi::to_widest (x); + widest_int wit = wix*(wix-1)/2; + if (operand_equal_p (b, c)) + { + wit += wix; + res = SCALAR_FLOAT_TYPE_P (ctype) + ? build_real_from_wide (ctype, wit, UNSIGNED) + : wide_int_to_tree (ctype, wit); + res = chrec_fold_multiply (ctype, b, res); + } + else + { + if (SCALAR_FLOAT_TYPE_P (ctype)) + { + res = build_real_from_wide (ctype, wit, UNSIGNED); + tmp = build_real_from_wide (ctype, wix, UNSIGNED); + } + else + { + res = wide_int_to_tree (ctype, wit); + tmp = wide_int_to_tree (ctype, wix); + } + res = chrec_fold_multiply (ctype, c, res); + tmp = chrec_fold_multiply (ctype, b, tmp); + res = chrec_fold_plus (ctype, res, tmp); + } + if (ctype != type) + res = chrec_convert (type, res, NULL); + res = chrec_fold_plus (TREE_TYPE (a), a, res); + return res; + } + + /* Alas, calculating C*X*(X-1)/2 can wrap/overflow incorrectly, unless + (i) C is an even integer constant, or (ii) the multiplications are + performed in a wider mode. */ + + /* Try to narrow the original type, by stripping zero extensions. */ + tree orig_type = TREE_TYPE (x); + if (TREE_CODE (x) == NOP_EXPR) + { + tree nx = TREE_OPERAND (x, 0); + tree nt = TREE_TYPE (nx); + if (INTEGRAL_TYPE_P (nt) + && TYPE_UNSIGNED (nt) + && TYPE_PRECISION (nt) < TYPE_PRECISION (orig_type)) + { + orig_type = nt; + x = nx; + } + } + else if (TREE_CODE (x) == PLUS_EXPR + && integer_all_onesp (TREE_OPERAND (x, 1)) + && TREE_CODE (TREE_OPERAND (x, 0)) == NOP_EXPR) + { + /* We know the number of iterations can't be negative. */ + tree nt = TREE_TYPE (TREE_OPERAND (TREE_OPERAND (x, 0), 0)); + if (INTEGRAL_TYPE_P (nt) + && TYPE_UNSIGNED (nt) + && TYPE_PRECISION (nt) < TYPE_PRECISION (orig_type)) + { + orig_type = nt; + x = fold_convert (nt, x); + } + } + + /* We require an unsigned type with more precision than PREC. */ + int prec = TYPE_PRECISION (orig_type); + if (half_c) + prec--; + + tree wide_type; + /* Prefer to perform multiplications in type CTYPE. */ + if (INTEGRAL_TYPE_P (ctype) + && TYPE_UNSIGNED (ctype) + && TYPE_PRECISION (ctype) > prec) + wide_type = ctype; + else if (TYPE_PRECISION (unsigned_type_node) > prec) + wide_type = unsigned_type_node; + else if (TYPE_PRECISION (long_unsigned_type_node) > prec) + wide_type = long_unsigned_type_node; + else if (TYPE_PRECISION (long_long_unsigned_type_node) > prec) + wide_type = long_long_unsigned_type_node; + else + /* TODO: Try TImode on targets that support umul_widen_optab. */ + return chrec_dont_know; + + /* X is used multiple times, so wrap it in a SAVE_EXPR. */ + x = chrec_convert_rhs (wide_type, x, NULL); + tree save_x = chrec_save_expr (x); + + /* When B == C, "{A, +, {B, +, B}} (x) -> "A + B*X*(X+1)/2". */ + if (operand_equal_p (b, c)) + { + /* b*x*(x+1)/2 */ + tree one = build_int_cst (wide_type, 1); + res = chrec_fold_plus (wide_type, save_x, one); + res = chrec_fold_multiply (wide_type, res, save_x); + if (half_c) + { + res = chrec_convert (ctype, res, NULL); + res = chrec_fold_multiply (ctype, half_c, res); + } + else + { + res = chrec_fold_divide_by_2 (wide_type, res); + res = chrec_convert (ctype, res, NULL); + res = chrec_fold_multiply (ctype, b, res); + } + /* + a */ + if (ctype != type) + res = chrec_convert (type, res, NULL); + res = chrec_fold_plus (TREE_TYPE (a), a, res); + return res; + } + + /* When 2B is a multiple of C, "{A, +, {B, +, C}} (x)" may be + optimized to "A + C*X*(X+TMP)/2" where TMP is "2B/C-1". */ + if (TREE_CODE (b) == INTEGER_CST && TREE_CODE (c) == INTEGER_CST) + { + wi::tree_to_widest_ref wic = wi::to_widest (c); + wi::tree_to_widest_ref wib = wi::to_widest (b); + signop sign = TYPE_SIGN (type); + wi::overflow_type overflow = wi::OVF_NONE; + widest_int wib2 = wi::add (wib, wib, sign, &overflow); + if (wi::multiple_of_p (wib2, wic, sign)) + { + widest_int tmp = wi::div_trunc (wib2, wic, sign, &overflow); + tmp = wi::sub (tmp, 1, sign, &overflow); + if (!overflow && wi::fits_to_tree_p (tmp, orig_type)) + { + /* c*x*(x+tmp)/2 */ + res = chrec_fold_plus (wide_type, save_x, + wide_int_to_tree (wide_type, tmp)); + res = chrec_fold_multiply (wide_type, res, save_x); + if (half_c) + { + res = chrec_convert (ctype, res, NULL); + res = chrec_fold_multiply (ctype, res, half_c); + } + else + { + res = chrec_fold_divide_by_2 (wide_type, res); + res = chrec_convert (ctype, res, NULL); + res = chrec_fold_multiply (ctype, res, c); + } + /* + a */ + if (ctype != type) + res = chrec_convert (type, res, NULL); + res = chrec_fold_plus (TREE_TYPE (a), a, res); + return res; + } + } + } + + /* c*x*(x-1)/2 */ + tree one = build_int_cst (wide_type, 1); + res = chrec_fold_minus (wide_type, save_x, one); + res = chrec_fold_multiply (wide_type, res, save_x); + if (half_c) + { + res = chrec_convert (ctype, res, NULL); + res = chrec_fold_multiply (ctype, half_c, res); + } + else + { + res = chrec_fold_divide_by_2 (wide_type, res); + res = chrec_convert (ctype, res, NULL); + res = chrec_fold_multiply (ctype, c, res); + } + /* + b*x */ + tmp = chrec_convert (ctype, save_x, NULL); + tmp = chrec_fold_multiply (ctype, b, tmp); + res = chrec_fold_plus (ctype, res, tmp); + /* + a */ + if (ctype != type) + res = chrec_convert (type, res, NULL); + res = chrec_fold_plus (TREE_TYPE (a), a, res); + return res; +} + + /* Evaluates "CHREC (X)" when the varying variable is VAR. Example: Given the following parameters, @@ -581,7 +821,8 @@ chrec_evaluate (unsigned var, tree chrec, tree n, unsigned int k) x = 10 The result is given by the Newton's interpolating formula: - 3 * \binom{10}{0} + 4 * \binom{10}{1}. + 3 * \binom{10}{0} + 4 * \binom{10}{1}, which is 3 + 4 * 10 = 43. + In general, {a, +, b} (x) = a + b*x. */ tree @@ -604,13 +845,12 @@ chrec_apply (unsigned var, if (dump_file && (dump_flags & TDF_SCEV)) fprintf (dump_file, "(chrec_apply \n"); - if (TREE_CODE (x) == INTEGER_CST && SCALAR_FLOAT_TYPE_P (type)) - x = build_real_from_int_cst (type, x); - switch (TREE_CODE (chrec)) { case POLYNOMIAL_CHREC: - if (evolution_function_is_affine_p (chrec)) + if (integer_zerop (x) && CHREC_VARIABLE (chrec) == var) + res = CHREC_LEFT (chrec); + else if (evolution_function_is_affine_p (chrec)) { if (CHREC_VARIABLE (chrec) != var) return build_polynomial_chrec @@ -623,6 +863,29 @@ chrec_apply (unsigned var, res = chrec_fold_multiply (TREE_TYPE (x), CHREC_RIGHT (chrec), x); res = chrec_fold_plus (type, CHREC_LEFT (chrec), res); } + else if (TREE_CODE (x) == COND_EXPR + && integer_zerop (TREE_OPERAND (x, 2)) + && CHREC_VARIABLE (chrec) == var + && TREE_CODE (CHREC_LEFT (chrec)) == INTEGER_CST) + { + /* Optimize "{a, +, ...} (cond ? x : 0)" by hoisting the ternary + operator as "cond ? ({ a, +, ...} (x)) : a" for simple "a". */ + res = chrec_apply (var, chrec, TREE_OPERAND (x, 1)); + if (res != chrec_dont_know) + res = fold_build3 (COND_EXPR, type, TREE_OPERAND (x, 0), + res, CHREC_LEFT (chrec)); + } + else if (CHREC_VARIABLE (chrec) == var + && evolution_function_is_affine_p (CHREC_RIGHT (chrec)) + && CHREC_VARIABLE (CHREC_RIGHT (chrec)) == var) + { + /* "{a, +, {b, +, c} } (x)" -> "a + b*x + c*x*(x-1)/2". */ + tree chrecr = CHREC_RIGHT (chrec); + res = chrec_apply_triangular (TREE_TYPE (chrecr), + CHREC_LEFT (chrec), + CHREC_LEFT (chrecr), + CHREC_RIGHT (chrecr), x); + } else if (TREE_CODE (x) == INTEGER_CST && tree_int_cst_sgn (x) == 1) /* testsuite/.../ssa-chrec-38.c. */ @@ -632,7 +895,7 @@ chrec_apply (unsigned var, break; CASE_CONVERT: - res = chrec_convert (TREE_TYPE (chrec), + res = chrec_convert (type, chrec_apply (var, TREE_OPERAND (chrec, 0), x), NULL); break; @@ -1401,6 +1664,10 @@ chrec_convert_rhs (tree type, tree chrec, gimple *at_stmt) if (POINTER_TYPE_P (type)) type = sizetype; + if (TREE_CODE (chrec) == INTEGER_CST + && SCALAR_FLOAT_TYPE_P (type)) + return build_real_from_int_cst (type, chrec); + return chrec_convert (type, chrec, at_stmt); } diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc index 61d72c2..4cfdc5d 100644 --- a/gcc/tree-scalar-evolution.cc +++ b/gcc/tree-scalar-evolution.cc @@ -1901,6 +1901,9 @@ interpret_expr (class loop *loop, gimple *at_stmt, tree expr) || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS) return chrec_dont_know; + if (TREE_CODE (expr) == SAVE_EXPR) + return interpret_expr (loop, at_stmt, TREE_OPERAND (expr, 0)); + extract_ops_from_tree (expr, &code, &op0, &op1); return interpret_rhs_expr (loop, at_stmt, type, @@ -3469,6 +3472,19 @@ expression_expensive_p (tree expr, hash_map &cache, cost += op_cost + 1; return false; + case tcc_expression: + if (code == SAVE_EXPR) + { + if (expression_expensive_p (TREE_OPERAND (expr, 0), cache, op_cost)) + return true; + /* Assume SAVE_EXPR is instanced twice. */ + op_cost /= 2; + *cache.get (expr) += op_cost; + cost += op_cost; + return false; + } + return true; + default: return true; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr65855.c b/gcc/testsuite/gcc.dg/tree-ssa/pr65855.c new file mode 100644 index 0000000..9553196 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr65855.c @@ -0,0 +1,14 @@ +/* PR middle-end/65855 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned long foo(unsigned int n) +{ + unsigned long t = 0; + for (unsigned long i=1; i<=n; i++) + t += i; + return t; +} + +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " goto " 2 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr80852.c b/gcc/testsuite/gcc.dg/tree-ssa/pr80852.c new file mode 100644 index 0000000..329e413 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr80852.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +int foo(int num) { + int a = 0; + for (int x = 0; x < num; x+=2) { + if (!(x % 2)) { + a += x; + } + } + return a; +} + +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " goto " 2 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-11.c b/gcc/testsuite/gcc.dg/vect/vect-iv-11.c index 7dc353f..7a3a5c5 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-iv-11.c +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-11.c @@ -9,7 +9,8 @@ main1 (int len) int s = 0; int i = len; - /* vectorization of induction with reduction. */ + /* This used to test vectorization of induction with reduction. + * It now checks correctness of scev's final value replacement. */ for ( ; i > 1; i -=2) s += i; @@ -28,4 +29,3 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */