From patchwork Tue Dec 7 15:07:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 48588 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0B73A3858430 for ; Tue, 7 Dec 2021 15:07:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0B73A3858430 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638889653; bh=1ND8UEpyB+Y/xoZHWVUECWH2RSJkxNbGJtxxUglwaCo=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=IycRJUv+FKQVjPdKETW1mvNCjjZCC7hpaC0jskB407DU/BFkgMmGYkguLg9lnWUBe fXgc3/1mVp+hapqGkm9ogvJhM+VZuu/cqBvUjkaJiaelX0Gwj9FpOOjHdggf82rYLd IpXBVQ2WIZuysHwARXgZgFgiNJ3SM1zpdRPRyWXU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 803963858D3C for ; Tue, 7 Dec 2021 15:07:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 803963858D3C Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 18341280942; Tue, 7 Dec 2021 16:07:01 +0100 (CET) Date: Tue, 7 Dec 2021 16:07:01 +0100 To: gcc-patches@gcc.gnu.org, mjambor@suse.cz Subject: Limit inlining functions called once Message-ID: <20211207150701.GK18150@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Hubicka via Gcc-patches From: Jan Hubicka Reply-To: Jan Hubicka Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, as dicussed in PR ipa/103454 there are several benchmarks that regresses for -finline-functions-called once. Runtmes: - tramp3d with -Ofast. 31% - exchange2 with -Ofast 11-21% - roms O2 9%-10% - tonto 2.5-3.5% with LTO Build times: - specfp2006 41% (mostly wrf that builds 71% faster) - specint2006 1.5-3% - specfp2017 64% (again mostly wrf) - specint2017 2.5-3.5% This patch adds two params to tweak the behaviour: 1) max-inline-functions-called-once-loop-depth limiting the loop depth (this is useful primarily for exchange where the inlined function is in loop depth 9) 2) max-inline-functions-called-once-insns We already have large-function-insns/growth parameters, but these are limiting also inlining small functions, so reducing them will regress very large functions that are hot. Because inlining functions called once is meant just as a cleanup pass I think it makes sense to have separate limit for it. I set the parmaeters to 6 and 4000. 4000 was chosen to make fatigue benchmark happy and that seems to be only one holding the value pretty high. I opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103585 to track this. I plan to reduce the value during before christmas after bit more testing since it seems to be overall win even if we trade fatigue2 performance, but I would like to get more testing on larger C++ APPs first. The benchmarks can be seen here: https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.02&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C19fdeff21d84a2612c9902daa80085f382b88c73%2C67b183fac7b08067fdd3c09abd3efd2691083395%2Ce14bd12e373f7612b00a44f22705950e1f70adcf%2C17f383c6fd95b2b2915aac38327c7628f6160a8d&include_user_branches=on https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.02&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C19fdeff21d84a2612c9902daa80085f382b88c73%2C67b183fac7b08067fdd3c09abd3efd2691083395%2Ce14bd12e373f7612b00a44f22705950e1f70adcf%2C17f383c6fd95b2b2915aac38327c7628f6160a8d&include_user_branches=on Here baseline and first column is the unmodified trunk (to see noise), second column is -fno-inline-insns-called-once, third column is the patch with limits set to 6 and 500. Last column is the version of patch attached with limits 6 and 4000 So in current form the patch improves exhcange2 and WRF build times but does not affect the other issues (tramp3d, roms, tonto and rest of build time) Bootstrapped/regtested x86_64-linux, plan to commit tomorrow if there are no complains. PR ipa/103454 * ipa-inline.c (check_callers): Handle param_inline_functions_called_once_loop_depth and param_inline_functions_called_once_insns. (edge_badness): Fix linebreaks. * params.opt (param=max-inline-functions-called-once-loop-depth, param=max-inline-functions-called-once-insn): New params. * invoke.texi (max-inline-functions-called-once-loop-depth, max-inline-functions-called-once-insns): New parameters. diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index 012b326b5e9..54cd085a84d 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -1091,20 +1091,30 @@ static bool check_callers (struct cgraph_node *node, void *has_hot_call) { struct cgraph_edge *e; - for (e = node->callers; e; e = e->next_caller) - { - if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once) - || !opt_for_fn (e->caller->decl, optimize)) - return true; - if (!can_inline_edge_p (e, true)) - return true; - if (e->recursive_p ()) - return true; - if (!can_inline_edge_by_limits_p (e, true)) - return true; - if (!(*(bool *)has_hot_call) && e->maybe_hot_p ()) - *(bool *)has_hot_call = true; - } + for (e = node->callers; e; e = e->next_caller) + { + if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once) + || !opt_for_fn (e->caller->decl, optimize)) + return true; + if (!can_inline_edge_p (e, true)) + return true; + if (e->recursive_p ()) + return true; + if (!can_inline_edge_by_limits_p (e, true)) + return true; + /* Inlining large functions to large loop depth is often harmful because + of register pressure it implies. */ + if ((int)ipa_call_summaries->get (e)->loop_depth + > param_inline_functions_called_once_loop_depth) + return true; + /* Do not produce gigantic functions. */ + if (estimate_size_after_inlining (e->caller->inlined_to ? + e->caller->inlined_to : e->caller, e) + > param_inline_functions_called_once_insns) + return true; + if (!(*(bool *)has_hot_call) && e->maybe_hot_p ()) + *(bool *)has_hot_call = true; + } return false; } @@ -1327,9 +1337,12 @@ edge_badness (struct cgraph_edge *edge, bool dump) " %i (compensated)\n", badness.to_double (), freq.to_double (), - edge->count.ipa ().initialized_p () ? edge->count.ipa ().to_gcov_type () : -1, - caller->count.ipa ().initialized_p () ? caller->count.ipa ().to_gcov_type () : -1, - inlining_speedup (edge, freq, unspec_edge_time, edge_time).to_double (), + edge->count.ipa ().initialized_p () + ? edge->count.ipa ().to_gcov_type () : -1, + caller->count.ipa ().initialized_p () + ? caller->count.ipa ().to_gcov_type () : -1, + inlining_speedup (edge, freq, unspec_edge_time, + edge_time).to_double (), estimate_growth (callee), callee_info->growth, overall_growth); } diff --git a/gcc/params.opt b/gcc/params.opt index e725c99e5e4..f1b5757461c 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -545,6 +545,14 @@ The maximum expansion factor when copying basic blocks. Common Joined UInteger Var(param_max_hoist_depth) Init(30) Param Optimization Maximum depth of search in the dominator tree for expressions to hoist. +-param=max-inline-functions-called-once-loop-depth= +Common Joined UInteger Var(param_inline_functions_called_once_loop_depth) Init(6) Optimization Param +Maximum loop depth of a call which is considered for inlining functions called once + +-param=max-inline-functions-called-once-insns= +Common Joined UInteger Var(param_inline_functions_called_once_insns) Init(4000) Optimization Param +Maximum combinaed size of caller and callee wich is inlined if callee is called once. + -param=max-inline-insns-auto= Common Joined UInteger Var(param_max_inline_insns_auto) Init(15) Optimization Param The maximum number of instructions when automatically inlining. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 3bddfbaae6a..cd03fd93c7c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13587,6 +13587,14 @@ The maximum number of backtrack attempts the scheduler should make when modulo scheduling a loop. Larger values can exponentially increase compilation time. +@item max-inline-functions-called-once-loop-depth +Maximal loop depth of a call considered by inline heuristics that tries to +inline all functions called once. + +@item max-inline-functions-called-once-insns +Maximal estimated size of functions produced while inlining functions called +once. + @item max-inline-insns-single Several parameters control the tree inliner used in GCC@. This number sets the maximum number of instructions (counted in GCC's internal representation) in a