From patchwork Thu Nov 18 15:18:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 47889 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0F9FC385E00D for ; Thu, 18 Nov 2021 15:18:23 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id F225C3858433 for ; Thu, 18 Nov 2021 15:18:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F225C3858433 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=OLoTQU62L20lne9BWb37JwMlIuW6aYB9sP1ZPKNy4bE=; b=DX4CIVfNGrcnncSFF29g+q5w8d BFHJ6fbkOhWqXB6uXkQlL8KhEM+xI8o9ccRfnQYD7kRB+0wKgbAATGXO1oyPdQ1Nf18oSUklcca6r 1/FOY9SlNeXaQkfKf/+qQmfHWarJ5NgD/SPo/VDcumfbXPkuY2GJUk6TWx7yY/lOKbawGu9rWNl09 kUQ6S7APiJqKuswm0vLmZ1R7Mh5pt13LWx/TAmb5HjbDTjCHvI+xHBmf3r8I9Yw+bedThfiYl0Ri6 yw0GnzYM68vCyatkHZ+C/W5bYpDLzPekQqumUGmmfQesgr9rhQJiMr5SuS+wl8OrZCiyxXWkP3SUS mJVsHLvQ==; Received: from [185.62.158.67] (port=58090 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mnjB2-0007t7-Sn; Thu, 18 Nov 2021 10:18:04 -0500 From: "Roger Sayle" To: "'Richard Biener'" , "'Andrew MacLeod'" Subject: [PATCH take 2] ivopts: Improve code generated for very simple loops. Date: Thu, 18 Nov 2021 15:18:02 -0000 Message-ID: <007e01d7dc8f$7b87f260$7297d720$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdfchxE/VzQaaD02R4KQeI3krlFKEQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 'GCC Patches' Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi Richard, Many thanks for the patch review. On Tue, Nov 16, 2021 at 12:38 Richard Biener wrote: > On Mon, Nov 15, 2021 at 2:04 PM Roger Sayle > wrote: > > > > This patch tidies up the code that GCC generates for simple loops, by > > selecting/generating a simpler loop bound expression in ivopts. > > Previously: > > > > void v1 (unsigned long *in, unsigned long *out, unsigned int n) { > > int i; > > for (i = 0; i < n; i++) { > > out[i] = in[i]; > > } > > } > > > > on x86_64 generated: > > v1: testl %edx, %edx > > je .L1 > > movl %edx, %edx > > xorl %eax, %eax > > .L3: movq (%rdi,%rax,8), %rcx > > movq %rcx, (%rsi,%rax,8) > > addq $1, %rax > > cmpq %rax, %rdx > > jne .L3 > > .L1: ret > > > > and now instead generates: > > v1: testl %edx, %edx > > je .L1 > > movl %edx, %edx > > xorl %eax, %eax > > leaq 0(,%rdx,8), %rcx > > .L3: movq (%rdi,%rax), %rdx > > movq %rdx, (%rsi,%rax) > > addq $8, %rax > > cmpq %rax, %rcx > > jne .L3 > > .L1: ret > > Is that actually better? IIRC the addressing modes are both complex and we > now have an extra lea? Technically the induction variable elimination is removing two multiplies by 8 (or left shifts) from the body of the loop and replacing them with a single multiply by 8 prior to the loop, which is exactly the induction variable optimization that ivopts is designed to do. It's true that with x86's complex addressing modes these "multiplications" are free, but ivopts is run on all targets including those without indexed addressing, and even on x86 there's a benefit from shorter instruction encodings. > For this case I see we generate > _15 = n_10(D) + 4294967295; > _8 = (unsigned long) _15; > _7 = _8 + 1; > > where n is unsigned int so if we know that n is not zero we can simplify the > addition and conveniently the loop header test provides this guarantee. > IIRC there were some attempts to enhance match.pd for some cases of such > expressions. Exactly. The loop optimizers are generating the expression (x-1)+1 and assuming that the middle-end will simplify this to x. Unfortunately, the change of modes (designed to prevent overflow) actually makes this impossible to simplify, due to concerns about overflow. > + /* If AFTER_ADJUST is required, the code below generates the equivalent > + * of BASE + NITER * STEP + STEP, when ideally we'd prefer the expression > + * BASE + (NITER + 1) * STEP, especially when NITER is often of the form > + * SSA_NAME - 1. Unfortunately, guaranteeing that adding 1 to NITER > + * doesn't overflow is tricky, so we peek inside the TREE_NITER_DESC > + * class for common idioms that we know are safe. */ > > No '* ' each line. Doh! Thanks. Sometimes I hate vi. > I wonder if the non-overflowing can be captured by > integer_onep (iv->step) > && max_stmt_executions (loop, &max) Unfortunately, max_stmt_executions is intended to return the wide_int count of iterations, either the known constant value or a profile-based estimate, while the optimizations I'm proposing work with symbolic values from variable iteration counts. When the iteration count is a constant, fold-const is already able to simplify (x-1)+1 to an integer constant even with type conversions. > if we then do (niter + 1) * step instead of niter*step + step would that do the > same? Yes. I've extended the scope of the patch to now also handle loops of the form (for i=beg; i That said - what the change does is actually ensure that we CSE niter + 1 > with the bound of the simplified exit test? Not quite, this simply provides a simplified expression for "niter + 1" that takes advantage of the implicit range information we have. You're right in theory that ranger/vrp may be able to help simplify (long)((int)x-1)+1L more generally. In this case, I suspect it's just a historical artifact that much of the literature on (iv) loop optimizations dates back to Fortran and Algol where arrays were 1-based rather than 0-based, so decades later gcc 11 on x86_64 sometimes requires an extra inc/dec instruction for purely historical reasons. > The patch doesn't add any testcase. The three new attached tests check that the critical invariants have a simpler form, and hopefully shouldn't be affected by whether the optimizer and/or backend costs actually decide to perform this iv substitution or not. > Thanks, > Richard. This revised patch has been tested on x86_64-pc-linux-gnu with a make bootstrap and make -k check with no new failures. 2021-11-18 Roger Sayle gcc/ChangeLog * tree-ssa-loop-ivopts.c (cand_value_at): Take a class tree_niter_desc* argument instead of just a tree for NITER. If we require the iv candidate value at the end of the final loop iteration, try using the original loop bound as the NITER for sufficiently simple loops. (may_eliminate_iv): Update (only) call to cand_value_at. gcc/testsuite * gcc.dg/tree-ssa/ivopts-5.c: New test case. * gcc.dg/tree-ssa/ivopts-6.c: New test case. * gcc.dg/tree-ssa/ivopts-7.c: New test case. * gcc.dg/wrapped-binop-simplify.c: Update expected test result. Many thanks in advance (fingers-crossed this is still suitable/grandfathered-in at the start of Stage 3). Roger diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 4a498ab..8f98595 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -5034,28 +5034,57 @@ determine_group_iv_cost_address (struct ivopts_data *data, return !sum_cost.infinite_cost_p (); } -/* Computes value of candidate CAND at position AT in iteration NITER, and - stores it to VAL. */ +/* Computes value of candidate CAND at position AT in iteration DESC->NITER, + and stores it to VAL. */ static void -cand_value_at (class loop *loop, struct iv_cand *cand, gimple *at, tree niter, - aff_tree *val) +cand_value_at (class loop *loop, struct iv_cand *cand, gimple *at, + class tree_niter_desc *desc, aff_tree *val) { aff_tree step, delta, nit; struct iv *iv = cand->iv; tree type = TREE_TYPE (iv->base); + tree niter = desc->niter; + bool after_adjust = stmt_after_increment (loop, cand, at); tree steptype; + if (POINTER_TYPE_P (type)) steptype = sizetype; else steptype = unsigned_type_for (type); + /* If AFTER_ADJUST is required, the code below generates the equivalent + of BASE + NITER * STEP + STEP, when ideally we'd prefer the expression + BASE + (NITER + 1) * STEP, especially when NITER is often of the form + SSA_NAME - 1. Unfortunately, guaranteeing that adding 1 to NITER + doesn't overflow is tricky, so we peek inside the TREE_NITER_DESC + class for common idioms that we know are safe. */ + if (after_adjust + && desc->control.no_overflow + && integer_onep (desc->control.step) + && (desc->cmp == LT_EXPR + || desc->cmp == NE_EXPR) + && TREE_CODE (desc->bound) == SSA_NAME) + { + if (integer_onep (desc->control.base)) + { + niter = desc->bound; + after_adjust = false; + } + else if (TREE_CODE (niter) == MINUS_EXPR + && integer_onep (TREE_OPERAND (niter, 1))) + { + niter = TREE_OPERAND (niter, 0); + after_adjust = false; + } + } + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &step); aff_combination_convert (&step, steptype); tree_to_aff_combination (niter, TREE_TYPE (niter), &nit); aff_combination_convert (&nit, steptype); aff_combination_mult (&nit, &step, &delta); - if (stmt_after_increment (loop, cand, at)) + if (after_adjust) aff_combination_add (&delta, &step); tree_to_aff_combination (iv->base, type, val); @@ -5406,7 +5435,7 @@ may_eliminate_iv (struct ivopts_data *data, return true; } - cand_value_at (loop, cand, use->stmt, desc->niter, &bnd); + cand_value_at (loop, cand, use->stmt, desc, &bnd); *bound = fold_convert (TREE_TYPE (cand->iv->base), aff_combination_to_tree (&bnd)); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c new file mode 100644 index 0000000..7a7661c --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ + +int* +foo (int* mem, int sz, int val) +{ + int i; + for (i = 0; i < sz; i++) + if (mem[i] == val) + return &mem[i]; + return 0; +} + +/* { dg-final { scan-tree-dump "inv_expr \[0-9\]: \\t\\(unsigned long\\) sz_\[0-9\]\\(D\\) \\* 4 \\+ \\(unsigned long\\) mem_\[0-9\]\\(D\\)" "ivopts" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c new file mode 100644 index 0000000..773bfe4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ + +int* +foo (int* mem, int sz, int val) +{ + int i; + for (i = 0; i != sz; i++) + if (mem[i] == val) + return &mem[i]; + return 0; +} + +/* { dg-final { scan-tree-dump "inv_expr \[0-9\]: \\t\\(unsigned long\\) sz_\[0-9\]\\(D\\) \\* 4 \\+ \\(unsigned long\\) mem_\[0-9\]\\(D\\)" "ivopts" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c new file mode 100644 index 0000000..3d2de48 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ + +int* +foo (int* mem, int beg, int end, int val) +{ + int i; + for (i = beg; i < end; i++) + if (mem[i] == val) + return &mem[i]; + return 0; +} +/* { dg-final { scan-tree-dump "inv_expr \[0-9\]: \\t\\(unsigned long\\) \\(\\(unsigned int\\) end_\[0-9\]\\(D\\) - \\(unsigned int\\) beg_\[0-9\]\\(D\\)\\)" "ivopts" } } */ + diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c index a5d953b..706eed8 100644 --- a/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c @@ -1,6 +1,6 @@ /* { dg-do compile { target { { i?86-*-* x86_64-*-* s390*-*-* } && lp64 } } } */ /* { dg-options "-O2 -fdump-tree-vrp2-details" } */ -/* { dg-final { scan-tree-dump-times "gimple_simplified to" 4 "vrp2" } } */ +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 1 "vrp2" } } */ void v1 (unsigned long *in, unsigned long *out, unsigned int n) {