Limit inlining functions called once
Commit Message
Hi,
as dicussed in PR ipa/103454 there are several benchmarks that regresses
for -finline-functions-called once. Runtmes:
- tramp3d with -Ofast. 31%
- exchange2 with -Ofast 11-21%
- roms O2 9%-10%
- tonto 2.5-3.5% with LTO
Build times:
- specfp2006 41% (mostly wrf that builds 71% faster)
- specint2006 1.5-3%
- specfp2017 64% (again mostly wrf)
- specint2017 2.5-3.5%
This patch adds two params to tweak the behaviour:
1) max-inline-functions-called-once-loop-depth limiting the loop depth
(this is useful primarily for exchange where the inlined function is in
loop depth 9)
2) max-inline-functions-called-once-insns
We already have large-function-insns/growth parameters, but these are
limiting also inlining small functions, so reducing them will regress
very large functions that are hot.
Because inlining functions called once is meant just as a cleanup pass
I think it makes sense to have separate limit for it.
I set the parmaeters to 6 and 4000.
4000 was chosen to make fatigue benchmark happy and that seems to be only one
holding the value pretty high. I opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103585 to track this.
I plan to reduce the value during before christmas after bit more testing since
it seems to be overall win even if we trade fatigue2 performance, but I would
like to get more testing on larger C++ APPs first.
The benchmarks can be seen here:
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.02&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C19fdeff21d84a2612c9902daa80085f382b88c73%2C67b183fac7b08067fdd3c09abd3efd2691083395%2Ce14bd12e373f7612b00a44f22705950e1f70adcf%2C17f383c6fd95b2b2915aac38327c7628f6160a8d&include_user_branches=on
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.02&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C19fdeff21d84a2612c9902daa80085f382b88c73%2C67b183fac7b08067fdd3c09abd3efd2691083395%2Ce14bd12e373f7612b00a44f22705950e1f70adcf%2C17f383c6fd95b2b2915aac38327c7628f6160a8d&include_user_branches=on
Here baseline and first column is the unmodified trunk (to see noise), second
column is -fno-inline-insns-called-once, third column is the patch with limits
set to 6 and 500. Last column is the version of patch attached with limits
6 and 4000
So in current form the patch improves exhcange2 and WRF build times but
does not affect the other issues (tramp3d, roms, tonto and rest of build time)
Bootstrapped/regtested x86_64-linux, plan to commit tomorrow if there
are no complains.
PR ipa/103454
* ipa-inline.c (check_callers): Handle
param_inline_functions_called_once_loop_depth and
param_inline_functions_called_once_insns.
(edge_badness): Fix linebreaks.
* params.opt (param=max-inline-functions-called-once-loop-depth,
param=max-inline-functions-called-once-insn): New params.
* invoke.texi (max-inline-functions-called-once-loop-depth,
max-inline-functions-called-once-insns): New parameters.
Comments
On Tue, 7 Dec 2021 16:07:01 +0100
Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> Hi,
> as dicussed in PR ipa/103454 there are several benchmarks that regresses
> for -finline-functions-called once. Runtmes:
> - tramp3d with -Ofast. 31%
> - exchange2 with -Ofast 11-21%
> - roms O2 9%-10%
> - tonto 2.5-3.5% with LTO
> Build times:
> - specfp2006 41% (mostly wrf that builds 71% faster)
> - specint2006 1.5-3%
> - specfp2017 64% (again mostly wrf)
> - specint2017 2.5-3.5%
>
>
> This patch adds two params to tweak the behaviour:
> 1) max-inline-functions-called-once-loop-depth limiting the loop depth
> (this is useful primarily for exchange where the inlined function is in
> loop depth 9)
> 2) max-inline-functions-called-once-insns
> We already have large-function-insns/growth parameters, but these are
> limiting also inlining small functions, so reducing them will regress
> very large functions that are hot.
>
> Because inlining functions called once is meant just as a cleanup pass
> I think it makes sense to have separate limit for it.
>
> I set the parmaeters to 6 and 4000.
> 4000 was chosen to make fatigue benchmark happy and that seems to be only one
> holding the value pretty high. I opened
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103585 to track this.
>
> I plan to reduce the value during before christmas after bit more testing since
> it seems to be overall win even if we trade fatigue2 performance, but I would
> like to get more testing on larger C++ APPs first.
Will this hurt -Os -finline-limit=0 ?
thanks,
> > I plan to reduce the value during before christmas after bit more testing since
> > it seems to be overall win even if we trade fatigue2 performance, but I would
> > like to get more testing on larger C++ APPs first.
>
> Will this hurt -Os -finline-limit=0 ?
Why do you use -finline-limit=0 with -Os?
The patch does affect inlining even with -Os. On my benchmarks inlining
very large functions is hit or miss code size wise (in pure theory
inlining those should be always a win but it is not - even ignoring
build times we stress regalloc and more likely hit various --param
thresholds)
I guess we could experiment with code size impact and possibly make -Os
defaults to differ from -O defaults like we do for some other params.
Honza
> thanks,
@@ -1091,20 +1091,30 @@ static bool
check_callers (struct cgraph_node *node, void *has_hot_call)
{
struct cgraph_edge *e;
- for (e = node->callers; e; e = e->next_caller)
- {
- if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once)
- || !opt_for_fn (e->caller->decl, optimize))
- return true;
- if (!can_inline_edge_p (e, true))
- return true;
- if (e->recursive_p ())
- return true;
- if (!can_inline_edge_by_limits_p (e, true))
- return true;
- if (!(*(bool *)has_hot_call) && e->maybe_hot_p ())
- *(bool *)has_hot_call = true;
- }
+ for (e = node->callers; e; e = e->next_caller)
+ {
+ if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once)
+ || !opt_for_fn (e->caller->decl, optimize))
+ return true;
+ if (!can_inline_edge_p (e, true))
+ return true;
+ if (e->recursive_p ())
+ return true;
+ if (!can_inline_edge_by_limits_p (e, true))
+ return true;
+ /* Inlining large functions to large loop depth is often harmful because
+ of register pressure it implies. */
+ if ((int)ipa_call_summaries->get (e)->loop_depth
+ > param_inline_functions_called_once_loop_depth)
+ return true;
+ /* Do not produce gigantic functions. */
+ if (estimate_size_after_inlining (e->caller->inlined_to ?
+ e->caller->inlined_to : e->caller, e)
+ > param_inline_functions_called_once_insns)
+ return true;
+ if (!(*(bool *)has_hot_call) && e->maybe_hot_p ())
+ *(bool *)has_hot_call = true;
+ }
return false;
}
@@ -1327,9 +1337,12 @@ edge_badness (struct cgraph_edge *edge, bool dump)
" %i (compensated)\n",
badness.to_double (),
freq.to_double (),
- edge->count.ipa ().initialized_p () ? edge->count.ipa ().to_gcov_type () : -1,
- caller->count.ipa ().initialized_p () ? caller->count.ipa ().to_gcov_type () : -1,
- inlining_speedup (edge, freq, unspec_edge_time, edge_time).to_double (),
+ edge->count.ipa ().initialized_p ()
+ ? edge->count.ipa ().to_gcov_type () : -1,
+ caller->count.ipa ().initialized_p ()
+ ? caller->count.ipa ().to_gcov_type () : -1,
+ inlining_speedup (edge, freq, unspec_edge_time,
+ edge_time).to_double (),
estimate_growth (callee),
callee_info->growth, overall_growth);
}
@@ -545,6 +545,14 @@ The maximum expansion factor when copying basic blocks.
Common Joined UInteger Var(param_max_hoist_depth) Init(30) Param Optimization
Maximum depth of search in the dominator tree for expressions to hoist.
+-param=max-inline-functions-called-once-loop-depth=
+Common Joined UInteger Var(param_inline_functions_called_once_loop_depth) Init(6) Optimization Param
+Maximum loop depth of a call which is considered for inlining functions called once
+
+-param=max-inline-functions-called-once-insns=
+Common Joined UInteger Var(param_inline_functions_called_once_insns) Init(4000) Optimization Param
+Maximum combinaed size of caller and callee wich is inlined if callee is called once.
+
-param=max-inline-insns-auto=
Common Joined UInteger Var(param_max_inline_insns_auto) Init(15) Optimization Param
The maximum number of instructions when automatically inlining.
@@ -13587,6 +13587,14 @@ The maximum number of backtrack attempts the scheduler should make
when modulo scheduling a loop. Larger values can exponentially increase
compilation time.
+@item max-inline-functions-called-once-loop-depth
+Maximal loop depth of a call considered by inline heuristics that tries to
+inline all functions called once.
+
+@item max-inline-functions-called-once-insns
+Maximal estimated size of functions produced while inlining functions called
+once.
+
@item max-inline-insns-single
Several parameters control the tree inliner used in GCC@. This number sets the
maximum number of instructions (counted in GCC's internal representation) in a