[v2] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

Message ID 20231120144854.676590-1-maxim.kuvyrkov@linaro.org
State New
Headers
Series [v2] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed

Commit Message

Maxim Kuvyrkov Nov. 20, 2023, 2:48 p.m. UTC
  This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===
... in this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA.  After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*.  This gets us N*M new
dependencies.

In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics.  After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.

gcc/ChangeLog:

	PR rtl-optimization/96388
	PR rtl-optimization/111554
	* sched-deps.cc (find_inc): Avoid exponential behavior.
---
 gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)
  

Patch

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index c23218890f3..a286ff319e2 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4779,24 +4779,59 @@  parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
 /* Once a suitable mem reference has been found and the corresponding data
    in MII has been filled in, this function is called to find a suitable
    add or inc insn involving the register we found in the memory
-   reference.  */
+   reference.
+   If successful, this function will create additional dependencies between
+   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
+   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
+*/
 
 static bool
 find_inc (struct mem_inc_info *mii, bool backwards)
 {
   sd_iterator_def sd_it;
   dep_t dep;
+  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
+  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
 
-  sd_it = sd_iterator_start (mii->mem_insn,
-			     backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
     {
       dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
       rtx_insn *pro = DEP_PRO (dep);
       rtx_insn *con = DEP_CON (dep);
-      rtx_insn *inc_cand = backwards ? pro : con;
+      rtx_insn *inc_cand;
+      int n_inc_deps;
+
       if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
 	goto next;
+
+      if (backwards)
+	{
+	  inc_cand = pro;
+	  n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+	}
+      else
+	{
+	  inc_cand = con;
+	  n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+	}
+
+      /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
+	 for mem_insn.  This by itself is not a problem, since each mem_insn
+	 will have only a few inc_insns associated with it.  However, if
+	 we consider that a single inc_insn may have a lot of mem_insns, AND,
+	 on top of that, a few other inc_insns associated with it --
+	 those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
+	 dependencies created for them.  This may cause an exponential
+	 growth of memory usage and scheduling time.
+	 See PR96388 for details.
+	 We [heuristically] use n_inc_deps as a proxy for the number of MEM
+	 insns, and drop opportunities for breaking modifiable_mem dependencies
+	 when dependency lists grow beyond reasonable size.  */
+      if (n_mem_deps * n_inc_deps
+	  >= param_max_pending_list_length * param_max_pending_list_length)
+	goto next;
+
       if (parse_add_or_inc (mii, inc_cand, backwards))
 	{
 	  struct dep_replacement *desc;
@@ -4838,6 +4873,8 @@  find_inc (struct mem_inc_info *mii, bool backwards)
 	  desc->insn = mii->mem_insn;
 	  move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
 			 INSN_SPEC_BACK_DEPS (con));
+
+	  gcc_assert (mii->inc_insn == inc_cand);
 	  if (backwards)
 	    {
 	      FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)