From patchwork Tue Dec  7 15:07:01 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Hubicka <hubicka@kam.mff.cuni.cz>
X-Patchwork-Id: 48588
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 0B73A3858430
	for <patchwork@sourceware.org>; Tue,  7 Dec 2021 15:07:33 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0B73A3858430
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1638889653;
	bh=1ND8UEpyB+Y/xoZHWVUECWH2RSJkxNbGJtxxUglwaCo=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=IycRJUv+FKQVjPdKETW1mvNCjjZCC7hpaC0jskB407DU/BFkgMmGYkguLg9lnWUBe
	 fXgc3/1mVp+hapqGkm9ogvJhM+VZuu/cqBvUjkaJiaelX0Gwj9FpOOjHdggf82rYLd
	 IpXBVQ2WIZuysHwARXgZgFgiNJ3SM1zpdRPRyWXU=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16])
 by sourceware.org (Postfix) with ESMTPS id 803963858D3C
 for <gcc-patches@gcc.gnu.org>; Tue,  7 Dec 2021 15:07:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 803963858D3C
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)
 id 18341280942; Tue,  7 Dec 2021 16:07:01 +0100 (CET)
Date: Tue, 7 Dec 2021 16:07:01 +0100
To: gcc-patches@gcc.gnu.org, mjambor@suse.cz
Subject: Limit inlining functions called once
Message-ID: <20211207150701.GK18150@kam.mff.cuni.cz>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Jan Hubicka via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Jan Hubicka <hubicka@kam.mff.cuni.cz>
Reply-To: Jan Hubicka <hubicka@kam.mff.cuni.cz>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi,
as dicussed in PR ipa/103454 there are several benchmarks that regresses
for -finline-functions-called once. Runtmes:
 - tramp3d with -Ofast. 31%
 - exchange2 with -Ofast 11-21%
 - roms O2 9%-10%
 - tonto 2.5-3.5% with LTO
Build times:
 - specfp2006 41% (mostly wrf that builds 71% faster)
 - specint2006 1.5-3%
 - specfp2017 64% (again mostly wrf)
 - specint2017 2.5-3.5%


This patch adds two params to tweak the behaviour:
 1) max-inline-functions-called-once-loop-depth limiting the loop depth
    (this is useful primarily for exchange where the inlined function is in
     loop depth 9)
 2) max-inline-functions-called-once-insns
    We already have large-function-insns/growth parameters, but these are
    limiting also inlining small functions, so reducing them will regress
    very large functions that are hot.

    Because inlining functions called once is meant just as a cleanup pass
    I think it makes sense to have separate limit for it.

I set the parmaeters to 6 and 4000.
4000 was chosen to make fatigue benchmark happy and that seems to be only one
holding the value pretty high.  I opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103585 to track this.

I plan to reduce the value during before christmas after bit more testing since
it seems to be overall win even if we trade fatigue2 performance, but I would
like to get more testing on larger C++ APPs first.

The benchmarks can be seen here:
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.02&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C19fdeff21d84a2612c9902daa80085f382b88c73%2C67b183fac7b08067fdd3c09abd3efd2691083395%2Ce14bd12e373f7612b00a44f22705950e1f70adcf%2C17f383c6fd95b2b2915aac38327c7628f6160a8d&include_user_branches=on
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.02&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C19fdeff21d84a2612c9902daa80085f382b88c73%2C67b183fac7b08067fdd3c09abd3efd2691083395%2Ce14bd12e373f7612b00a44f22705950e1f70adcf%2C17f383c6fd95b2b2915aac38327c7628f6160a8d&include_user_branches=on

Here baseline and first column is the unmodified trunk (to see noise), second
column is -fno-inline-insns-called-once, third column is the patch with limits
set to 6 and 500. Last column is the version of patch attached with limits
6 and 4000

So in current form the patch improves exhcange2 and WRF build times but
does not affect the other issues (tramp3d, roms, tonto and rest of build time)

Bootstrapped/regtested x86_64-linux, plan to commit tomorrow if there
are no complains.

	PR ipa/103454
	* ipa-inline.c (check_callers): Handle
	param_inline_functions_called_once_loop_depth and
	param_inline_functions_called_once_insns.
	(edge_badness): Fix linebreaks.
	* params.opt (param=max-inline-functions-called-once-loop-depth,
	param=max-inline-functions-called-once-insn): New params.
	* invoke.texi (max-inline-functions-called-once-loop-depth,
	max-inline-functions-called-once-insns): New parameters.

diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 012b326b5e9..54cd085a84d 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1091,20 +1091,30 @@ static bool
 check_callers (struct cgraph_node *node, void *has_hot_call)
 {
   struct cgraph_edge *e;
-   for (e = node->callers; e; e = e->next_caller)
-     {
-       if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once)
-	   || !opt_for_fn (e->caller->decl, optimize))
-	 return true;
-       if (!can_inline_edge_p (e, true))
-         return true;
-       if (e->recursive_p ())
-	 return true;
-       if (!can_inline_edge_by_limits_p (e, true))
-         return true;
-       if (!(*(bool *)has_hot_call) && e->maybe_hot_p ())
-	 *(bool *)has_hot_call = true;
-     }
+  for (e = node->callers; e; e = e->next_caller)
+    {
+      if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once)
+	  || !opt_for_fn (e->caller->decl, optimize))
+	return true;
+      if (!can_inline_edge_p (e, true))
+	return true;
+      if (e->recursive_p ())
+	return true;
+      if (!can_inline_edge_by_limits_p (e, true))
+	return true;
+      /* Inlining large functions to large loop depth is often harmful because
+	 of register pressure it implies.  */
+      if ((int)ipa_call_summaries->get (e)->loop_depth
+	  > param_inline_functions_called_once_loop_depth)
+	return true;
+      /* Do not produce gigantic functions.  */
+      if (estimate_size_after_inlining (e->caller->inlined_to ?
+					e->caller->inlined_to : e->caller, e)
+	  > param_inline_functions_called_once_insns)
+	return true;
+      if (!(*(bool *)has_hot_call) && e->maybe_hot_p ())
+	*(bool *)has_hot_call = true;
+    }
   return false;
 }
 
@@ -1327,9 +1337,12 @@ edge_badness (struct cgraph_edge *edge, bool dump)
 		   " %i (compensated)\n",
 		   badness.to_double (),
 		   freq.to_double (),
-		   edge->count.ipa ().initialized_p () ? edge->count.ipa ().to_gcov_type () : -1,
-		   caller->count.ipa ().initialized_p () ? caller->count.ipa ().to_gcov_type () : -1,
-		   inlining_speedup (edge, freq, unspec_edge_time, edge_time).to_double (),
+		   edge->count.ipa ().initialized_p ()
+		   ? edge->count.ipa ().to_gcov_type () : -1,
+		   caller->count.ipa ().initialized_p ()
+		   ? caller->count.ipa ().to_gcov_type () : -1,
+		   inlining_speedup (edge, freq, unspec_edge_time,
+				     edge_time).to_double (),
 		   estimate_growth (callee),
 		   callee_info->growth, overall_growth);
 	}
diff --git a/gcc/params.opt b/gcc/params.opt
index e725c99e5e4..f1b5757461c 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -545,6 +545,14 @@ The maximum expansion factor when copying basic blocks.
 Common Joined UInteger Var(param_max_hoist_depth) Init(30) Param Optimization
 Maximum depth of search in the dominator tree for expressions to hoist.
 
+-param=max-inline-functions-called-once-loop-depth=
+Common Joined UInteger Var(param_inline_functions_called_once_loop_depth) Init(6) Optimization Param
+Maximum loop depth of a call which is considered for inlining functions called once
+
+-param=max-inline-functions-called-once-insns=
+Common Joined UInteger Var(param_inline_functions_called_once_insns) Init(4000) Optimization Param
+Maximum combinaed size of caller and callee wich is inlined if callee is called once.
+
 -param=max-inline-insns-auto=
 Common Joined UInteger Var(param_max_inline_insns_auto) Init(15) Optimization Param
 The maximum number of instructions when automatically inlining.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3bddfbaae6a..cd03fd93c7c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13587,6 +13587,14 @@ The maximum number of backtrack attempts the scheduler should make
 when modulo scheduling a loop.  Larger values can exponentially increase
 compilation time.
 
+@item max-inline-functions-called-once-loop-depth
+Maximal loop depth of a call considered by inline heuristics that tries to
+inline all functions called once.
+
+@item max-inline-functions-called-once-insns
+Maximal estimated size of functions produced while inlining functions called
+once.
+
 @item max-inline-insns-single
 Several parameters control the tree inliner used in GCC@.  This number sets the
 maximum number of instructions (counted in GCC's internal representation) in a