From patchwork Sun Dec 5 21:55:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 48515 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1397F3858408 for ; Sun, 5 Dec 2021 21:55:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1397F3858408 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638741333; bh=9JGKDqR2Pa8NFHm+I8PBzA8AkEwmhkVtty1b+cCHfRI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=YO7xuutOGmuYz6BbCUHZveUoMN2+cHcxwB+oNeYZReicaolNtiHZWgJ/RjzMKrBzO 89VzjV8H++jpsyy+nOfcEl+vpeMZiVamrWtEhZ4wIjv5NxW6sisH6Zm2Z4ZY3EA9eh HvW4HMN25YnIOhz3HH+n3IzrJ+XtKRYyItjQZDew= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A6D223858D28 for ; Sun, 5 Dec 2021 21:55:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A6D223858D28 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3502E113E for ; Sun, 5 Dec 2021 13:55:02 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C93403F5A1 for ; Sun, 5 Dec 2021 13:55:01 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] ranger: Optimise irange_union Date: Sun, 05 Dec 2021 21:55:00 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" When compiling an optabs.ii at -O2 with a release-checking build, the hottest function in the profile was irange_union. This patch tries to optimise it a bit. The specific changes are: - Use quick_push rather than safe_push, since the final number of entries is known in advance. - Avoid assigning wi::to_wide & co. to a temporary wide_int, such as in: wide_int val_j = wi::to_wide (res[j]); wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST storage. Assigning the result to wide_int forces an unnecessary copy to temporary storage. This is one area where "auto" helps a lot. In the end though, it seemed more readable to inline the wi::to_*s rather than use auto. - Use to_widest_int rather than to_wide_int. Both are functionally correct, but to_widest_int is more efficient, for three reasons: - to_wide returns a wide-int representation in which the most significant element might not be canonically sign-extended. This is because we want to allow the storage of an INTEGER_CST like 0x1U << 31 to be accessed directly with both a wide_int view (where only 32 bits matter) and a widest_int view (where many more bits matter, and where the 32 bits are zero-extended to match the unsigned type). However, operating on uncanonicalised wide_int forms is less efficient than operating on canonicalised forms. - to_widest_int has a constant rather than variable precision and there are never any redundant upper bits to worry about. - Using widest_int avoids the need for an overflow check, since there is enough precision to add 1 to any IL constant without wrap-around. This gives a ~2% compile-time speed up with the test above. I also tried adding a path for two single-pair ranges, but it wasn't a win. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard gcc/ * value-range.cc (irange::irange_union): Use quick_push rather than safe_push. Use widest_int rather than wide_int. Avoid assigning wi::to_* results to wide*_int temporaries. --- gcc/value-range.cc | 46 +++++++++++++--------------------------------- 1 file changed, 13 insertions(+), 33 deletions(-) diff --git a/gcc/value-range.cc b/gcc/value-range.cc index 82509fa55a7..d38d0786072 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -1550,70 +1550,50 @@ irange::irange_union (const irange &r) // the merge is performed. // // [Xi,Yi]..[Xn,Yn] U [Xj,Yj]..[Xm,Ym] --> [Xk,Yk]..[Xp,Yp] - tree ttype = r.type (); - signop sign = TYPE_SIGN (ttype); - - auto_vec res; - wide_int u1 ; - wi::overflow_type ovf; + auto_vec res (m_num_ranges * 2 + r.m_num_ranges * 2); unsigned i = 0, j = 0, k = 0; while (i < m_num_ranges * 2 && j < r.m_num_ranges * 2) { // lower of Xi and Xj is the lowest point. - if (wi::le_p (wi::to_wide (m_base[i]), wi::to_wide (r.m_base[j]), sign)) + if (wi::to_widest (m_base[i]) <= wi::to_widest (r.m_base[j])) { - res.safe_push (m_base[i]); - res.safe_push (m_base[i + 1]); + res.quick_push (m_base[i]); + res.quick_push (m_base[i + 1]); k += 2; i += 2; } else { - res.safe_push (r.m_base[j]); - res.safe_push (r.m_base[j + 1]); + res.quick_push (r.m_base[j]); + res.quick_push (r.m_base[j + 1]); k += 2; j += 2; } } for ( ; i < m_num_ranges * 2; i += 2) { - res.safe_push (m_base[i]); - res.safe_push (m_base[i + 1]); + res.quick_push (m_base[i]); + res.quick_push (m_base[i + 1]); k += 2; } for ( ; j < r.m_num_ranges * 2; j += 2) { - res.safe_push (r.m_base[j]); - res.safe_push (r.m_base[j + 1]); + res.quick_push (r.m_base[j]); + res.quick_push (r.m_base[j + 1]); k += 2; } // Now normalize the vector removing any overlaps. i = 2; - int prec = TYPE_PRECISION (ttype); - wide_int max_val = wi::max_value (prec, sign); for (j = 2; j < k ; j += 2) { - wide_int val_im1 = wi::to_wide (res[i - 1]); - if (val_im1 == max_val) - break; - u1 = wi::add (val_im1, 1, sign, &ovf); - - // Overflow indicates we are at MAX already. - // A wide int bug requires the previous max_val check - // trigger: gcc.c-torture/compile/pr80443.c with -O3 - if (ovf == wi::OVF_OVERFLOW) - break; - - wide_int val_j = wi::to_wide (res[j]); - wide_int val_jp1 = wi::to_wide (res[j+1]); // Current upper+1 is >= lower bound next pair, then we merge ranges. - if (wi::ge_p (u1, val_j, sign)) + if (wi::to_widest (res[i - 1]) + 1 >= wi::to_widest (res[j])) { // New upper bounds is greater of current or the next one. - if (wi::gt_p (val_jp1, val_im1, sign)) - res [i - 1] = res[j + 1]; + if (wi::to_widest (res[j + 1]) > wi::to_widest (res[i - 1])) + res[i - 1] = res[j + 1]; } else {