[draft,v2] sched: Don't skip empty block in scheduling [PR108273]

Message ID c55a3078-56d0-1646-a96c-4e923a90833d@linux.ibm.com
State New
Headers
Series [draft,v2] sched: Don't skip empty block in scheduling [PR108273] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Testing failed

Commit Message

Kewen.Lin Dec. 12, 2023, 7:02 a.m. UTC
  Hi,

on 2023/11/22 17:30, Kewen.Lin wrote:
> on 2023/11/17 20:55, Alexander Monakov wrote:
>>
>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>>> to put it early in schedule_insns.
>>>
>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
>>> instead, since schedule_insns invokes haifa_sched_init, although the
>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>>> ahead but they are all "setup" functions, shouldn't affect or be affected
>>> by this placement.
>>
>> I was worried because sched_init invokes df_analyze, and I'm not sure if
>> cfg_cleanup can invalidate it.
> 
> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> makes more sense.
> 
>>
>>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>>
>>> By looking into some failures, I found that although cleanup_cfg is executed
>>> there would be still some empty blocks left, by analyzing a few failures there
>>> are at least such cases:
>>>   1. empty function body
>>>   2. block holding a label for return.
>>>   3. block without any successor.
>>>   4. block which becomes empty after scheduling some other block.
>>>   5. block which looks mergeable with its always successor but left.
>>>   ...
>>>
>>> For 1,2, there is one single successor EXIT block, I think they don't affect
>>> state transition, for 3, it's the same.  For 4, it depends on if we can have
>>> the assumption this kind of empty block doesn't have the chance to have debug
>>> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
>>> a reduced test case is:
>>
>> Oh, I should have thought of cases like these, really sorry about the slip
>> of attention, and thanks for showing a testcase for item 5. As Richard as
>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
>> would be changing no_real_insns_p to always return false, and see if the
>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
>> a non-bootstrapped compiler are still informative).
> 
> As you suggested, I forced no_real_insns_p to return false all the time, some
> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
> encountered in those places, so the adjustments for most of them are just to
> consider NOTE_P or this kind of special block and so on.  One draft patch is
> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> empty blocks and expose more issues).  The draft isn't qualified for code
> review but I hope it can provide some information on what kinds of changes
> are needed for the proposal.  If this is the direction which we all agree on,
> I'll further refine it and post a formal patch.  One thing I want to note is
> that this patch disable one assertion below:
> 
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..abd334864fb 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
>      }
> 
>    /* Sanity check: verify that all region insns were scheduled.  */
> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
>    sched_finish_ready_list ();
> 
> Some cases can cause this assertion to fail, it's due to the mismatch on
> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
> one block previously has only one INSN_P but while scheduling some other blocks
> it gets moved as well then we ends up with an empty block so that the only
> NOTE_P insn was counted then, but since this block isn't empty initially and
> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
> it in.  It can be fixed with special-casing this kind of block for counting
> like initially recording which block is empty and if a block isn't recorded
> before then fix up the count for it accordingly.  I'm not sure if someone may
> have an argument that all the complication make this proposal beaten by
> previous special-casing debug insn approach, looking forward to more comments.
> 

The attached one is the improved draft patch v2 for skipping empty BB, against
the previous draft, it does:
  1) use NONDEBUG_INSN_P for !DEBUG_INSN_P && !NOTE_P when it's appropriate;
  2) merge NOTE_P special handling into the one on DEBUG_INSN_P;
  3) fix exposed issue on broad testing on EBB;
  4) introduce rgn_init_empty_bb for mismatch count issue;
  5) add/refine some comments;

It's bootstrapped and regress-tested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.  I also tested with EBB turned on by default, one
issue in schedule_ebb got exposed and fixed, bootstrapped on x86_64-redhat-linux
and powerpc64{,le}-linux-gnu, regress-tested on powerpc64{,le}-linux-gnu, one
guality failure on x86_64-redhat-linux.  With seletive-scheduling turned on by
default, it's bootstrapped on x86_64-redhat-linux and a few guality failures
and some failures with extra warnings (like with -fvar-tracking-assignments),
those failures are trivial.  Note that I'm unable to test forced seletive-scheduling
on Power since it's even broken without this patch, there seems a few issues,
I'll file some PRs separately.

Any comments are highly appreciated.  I'll go forward to make a formal patch if
there is no objection this week.  I think we still can keep this no_real_insns_p
since in some places like free_block_dependencies it's still good for early
return, we can just remove its uses in some places. 

BR,
Kewen

----
Subject: [PATCH draft v2] sched: Don't skip empty block in scheduling

---
 gcc/haifa-sched.cc | 172 ++++++++++++++++++++++++++-------------------
 gcc/rtl.h          |   4 +-
 gcc/sched-ebb.cc   |   7 +-
 gcc/sched-rgn.cc   |  23 +++++-
 4 files changed, 126 insertions(+), 80 deletions(-)
  

Patch

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..7b4c4a92bb0 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -1207,6 +1207,11 @@  recompute_todo_spec (rtx_insn *next, bool for_backtrack)
   int n_replace = 0;
   bool first_p = true;
 
+  /* Since we don't skip no_real_insns_p block any more, it's
+     possible to meet NOTE insn now, early return if so.  */
+  if (NOTE_P (next))
+    return 0;
+
   if (sd_lists_empty_p (next, SD_LIST_BACK))
     /* NEXT has all its dependencies resolved.  */
     return 0;
@@ -1726,6 +1731,11 @@  setup_insn_reg_pressure_info (rtx_insn *insn)
   int *max_reg_pressure;
   static int death[N_REG_CLASSES];
 
+  /* Since we don't skip no_real_insns_p block any more, it's possible to
+     schedule NOTE insn now, we should check for it first.  */
+  if (NOTE_P (insn))
+    return;
+
   gcc_checking_assert (!DEBUG_INSN_P (insn));
 
   excess_cost_change = 0;
@@ -4017,10 +4027,10 @@  schedule_insn (rtx_insn *insn)
 
   /* Scheduling instruction should have all its dependencies resolved and
      should have been removed from the ready list.  */
-  gcc_assert (sd_lists_empty_p (insn, SD_LIST_HARD_BACK));
+  gcc_assert (NOTE_P (insn) || sd_lists_empty_p (insn, SD_LIST_HARD_BACK));
 
   /* Reset debug insns invalidated by moving this insn.  */
-  if (MAY_HAVE_DEBUG_BIND_INSNS && !DEBUG_INSN_P (insn))
+  if (MAY_HAVE_DEBUG_BIND_INSNS && NONDEBUG_INSN_P (insn))
     for (sd_it = sd_iterator_start (insn, SD_LIST_BACK);
 	 sd_iterator_cond (&sd_it, &dep);)
       {
@@ -4106,61 +4116,66 @@  schedule_insn (rtx_insn *insn)
 
   check_clobbered_conditions (insn);
 
-  /* Update dependent instructions.  First, see if by scheduling this insn
-     now we broke a dependence in a way that requires us to change another
-     insn.  */
-  for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
-       sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+  /* Since we don't skip no_real_insns_p block any more, it's possible to
+     schedule NOTE insn now, we should check for it first.  */
+  if (!NOTE_P (insn))
     {
-      struct dep_replacement *desc = DEP_REPLACE (dep);
-      rtx_insn *pro = DEP_PRO (dep);
-      if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED
-	  && desc != NULL && desc->insn == pro)
-	apply_replacement (dep, false);
-    }
+      /* Update dependent instructions.  First, see if by scheduling this insn
+	 now we broke a dependence in a way that requires us to change another
+	 insn.  */
+      for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
+	   sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+	{
+	  struct dep_replacement *desc = DEP_REPLACE (dep);
+	  rtx_insn *pro = DEP_PRO (dep);
+	  if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED && desc != NULL
+	      && desc->insn == pro)
+	    apply_replacement (dep, false);
+	}
 
-  /* Go through and resolve forward dependencies.  */
-  for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
-       sd_iterator_cond (&sd_it, &dep);)
-    {
-      rtx_insn *next = DEP_CON (dep);
-      bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0;
+      /* Go through and resolve forward dependencies.  */
+      for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
+	   sd_iterator_cond (&sd_it, &dep);)
+	{
+	  rtx_insn *next = DEP_CON (dep);
+	  bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0;
 
-      /* Resolve the dependence between INSN and NEXT.
-	 sd_resolve_dep () moves current dep to another list thus
-	 advancing the iterator.  */
-      sd_resolve_dep (sd_it);
+	  /* Resolve the dependence between INSN and NEXT.
+	     sd_resolve_dep () moves current dep to another list thus
+	     advancing the iterator.  */
+	  sd_resolve_dep (sd_it);
 
-      if (cancelled)
-	{
-	  if (must_restore_pattern_p (next, dep))
-	    restore_pattern (dep, false);
-	  continue;
-	}
+	  if (cancelled)
+	    {
+	      if (must_restore_pattern_p (next, dep))
+		restore_pattern (dep, false);
+	      continue;
+	    }
 
-      /* Don't bother trying to mark next as ready if insn is a debug
-	 insn.  If insn is the last hard dependency, it will have
-	 already been discounted.  */
-      if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next))
-	continue;
+	  /* Don't bother trying to mark next as ready if insn is a debug
+	     insn.  If insn is the last hard dependency, it will have
+	     already been discounted.  */
+	  if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next))
+	    continue;
 
-      if (!IS_SPECULATION_BRANCHY_CHECK_P (insn))
-	{
-	  int effective_cost;
+	  if (!IS_SPECULATION_BRANCHY_CHECK_P (insn))
+	    {
+	      int effective_cost;
 
-	  effective_cost = try_ready (next);
+	      effective_cost = try_ready (next);
 
-	  if (effective_cost >= 0
-	      && SCHED_GROUP_P (next)
-	      && advance < effective_cost)
-	    advance = effective_cost;
-	}
-      else
-	/* Check always has only one forward dependence (to the first insn in
-	   the recovery block), therefore, this will be executed only once.  */
-	{
-	  gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW));
-	  fix_recovery_deps (RECOVERY_BLOCK (insn));
+	      if (effective_cost >= 0 && SCHED_GROUP_P (next)
+		  && advance < effective_cost)
+		advance = effective_cost;
+	    }
+	  else
+	    /* Check always has only one forward dependence (to the first insn
+	       in the recovery block), therefore, this will be executed only
+	       once.  */
+	    {
+	      gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW));
+	      fix_recovery_deps (RECOVERY_BLOCK (insn));
+	    }
 	}
     }
 
@@ -4170,9 +4185,9 @@  schedule_insn (rtx_insn *insn)
      may use this information to decide how the instruction should
      be aligned.  */
   if (issue_rate > 1
+      && NONDEBUG_INSN_P (insn)
       && GET_CODE (PATTERN (insn)) != USE
-      && GET_CODE (PATTERN (insn)) != CLOBBER
-      && !DEBUG_INSN_P (insn))
+      && GET_CODE (PATTERN (insn)) != CLOBBER)
     {
       if (reload_completed)
 	PUT_MODE (insn, clock_var > last_clock_var ? TImode : VOIDmode);
@@ -5036,8 +5051,11 @@  get_ebb_head_tail (basic_block beg, basic_block end,
 /* Return true if there are no real insns in the range [ HEAD, TAIL ].  */
 
 bool
-no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
+no_real_insns_p (const rtx_insn *head ATTRIBUTE_UNUSED,
+		 const rtx_insn *tail ATTRIBUTE_UNUSED)
 {
+  return false;
+#if 0
   while (head != NEXT_INSN (tail))
     {
       if (!NOTE_P (head) && !LABEL_P (head))
@@ -5045,6 +5063,7 @@  no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
       head = NEXT_INSN (head);
     }
   return true;
+#endif
 }
 
 /* Restore-other-notes: NOTE_LIST is the end of a chain of notes
@@ -6224,8 +6243,12 @@  commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb)
        scheduled_insns.iterate (i, &insn);
        i++)
     {
-      if (control_flow_insn_p (last_scheduled_insn)
-	  || current_sched_info->advance_target_bb (*target_bb, insn))
+      /* Since we don't skip no_real_insns_p block any more, it's possible
+	 to schedule NOTE insn now, we should check for it here to avoid
+	 unexpected target bb advance.  */
+      if ((control_flow_insn_p (last_scheduled_insn)
+	   || current_sched_info->advance_target_bb (*target_bb, insn))
+	  && !NOTE_P (insn))
 	{
 	  *target_bb = current_sched_info->advance_target_bb (*target_bb, 0);
 
@@ -6245,7 +6268,7 @@  commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb)
 	(*current_sched_info->begin_move_insn) (insn, last_scheduled_insn);
       move_insn (insn, last_scheduled_insn,
 		 current_sched_info->next_tail);
-      if (!DEBUG_INSN_P (insn))
+      if (NONDEBUG_INSN_P (insn))
 	reemit_notes (insn);
       last_scheduled_insn = insn;
     }
@@ -6296,7 +6319,7 @@  prune_ready_list (state_t temp_state, bool first_cycle_insn_p,
 	  int cost = 0;
 	  const char *reason = "resource conflict";
 
-	  if (DEBUG_INSN_P (insn))
+	  if (DEBUG_INSN_P (insn) || NOTE_P (insn))
 	    continue;
 
 	  if (sched_group_found && !SCHED_GROUP_P (insn)
@@ -6504,7 +6527,7 @@  schedule_block (basic_block *target_bb, state_t init_state)
      and caused problems because schedule_block and compute_forward_dependences
      had different notions of what the "head" insn was.  */
 
-  gcc_assert (head != tail || INSN_P (head));
+  gcc_assert (head != tail || INSN_P (head) || NOTE_P (head));
 
   haifa_recovery_bb_recently_added_p = false;
 
@@ -6539,15 +6562,15 @@  schedule_block (basic_block *target_bb, state_t init_state)
   if (targetm.sched.init)
     targetm.sched.init (sched_dump, sched_verbose, ready.veclen);
 
+  gcc_assert (((NOTE_P (prev_head) || DEBUG_INSN_P (prev_head))
+	       && BLOCK_FOR_INSN (prev_head) == *target_bb)
+	      || (head == tail && NOTE_P (head)));
+
   /* We start inserting insns after PREV_HEAD.  */
   last_scheduled_insn = prev_head;
   last_nondebug_scheduled_insn = NULL;
   nonscheduled_insns_begin = NULL;
 
-  gcc_assert ((NOTE_P (last_scheduled_insn)
-	       || DEBUG_INSN_P (last_scheduled_insn))
-	      && BLOCK_FOR_INSN (last_scheduled_insn) == *target_bb);
-
   /* Initialize INSN_QUEUE.  Q_SIZE is the total number of insns in the
      queue.  */
   q_ptr = 0;
@@ -6725,15 +6748,16 @@  schedule_block (basic_block *target_bb, state_t init_state)
 		}
 	    }
 
-	  /* We don't want md sched reorder to even see debug isns, so put
-	     them out right away.  */
-	  if (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0))
+	  /* We don't want md sched reorder to even see debug and note insns,
+	     so put them out right away.  */
+	  if (ready.n_ready
+	      && !NONDEBUG_INSN_P (ready_element (&ready, 0))
 	      && (*current_sched_info->schedule_more_p) ())
 	    {
-	      while (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0)))
+	      while (ready.n_ready && !NONDEBUG_INSN_P (ready_element (&ready, 0)))
 		{
 		  rtx_insn *insn = ready_remove_first (&ready);
-		  gcc_assert (DEBUG_INSN_P (insn));
+		  gcc_assert (DEBUG_INSN_P (insn) || NOTE_P (insn));
 		  (*current_sched_info->begin_schedule_ready) (insn);
 		  scheduled_insns.safe_push (insn);
 		  last_scheduled_insn = insn;
@@ -7145,17 +7169,18 @@  schedule_block (basic_block *target_bb, state_t init_state)
 int
 set_priorities (rtx_insn *head, rtx_insn *tail)
 {
+  /* Since we don't skip no_real_insns_p block any more, it's
+     possible to meet NOTE insn now, we don't need to compute
+     priority for such block, so early return.  */
+  if (head == tail && !INSN_P (head))
+    return 1;
+
   rtx_insn *insn;
-  int n_insn;
+  int n_insn = 0;
   int sched_max_insns_priority =
 	current_sched_info->sched_max_insns_priority;
   rtx_insn *prev_head;
 
-  if (head == tail && ! INSN_P (head))
-    gcc_unreachable ();
-
-  n_insn = 0;
-
   prev_head = PREV_INSN (head);
   for (insn = tail; insn != prev_head; insn = PREV_INSN (insn))
     {
@@ -7688,7 +7713,8 @@  fix_tick_ready (rtx_insn *next)
 {
   int tick, delay;
 
-  if (!DEBUG_INSN_P (next) && !sd_lists_empty_p (next, SD_LIST_RES_BACK))
+  if (NONDEBUG_INSN_P (next)
+      && !sd_lists_empty_p (next, SD_LIST_RES_BACK))
     {
       int full_p;
       sd_iterator_def sd_it;
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e4b6cc0dbb5..34b3f31d1ee 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2695,8 +2695,8 @@  do {								        \
 /* During sched, 1 if RTX is an insn that must be scheduled together
    with the preceding insn.  */
 #define SCHED_GROUP_P(RTX)						\
-  (RTL_FLAG_CHECK4 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN,		\
-		    JUMP_INSN, CALL_INSN)->in_struct)
+  (RTL_FLAG_CHECK5 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN,		\
+		    JUMP_INSN, CALL_INSN, NOTE)->in_struct)
 
 /* For a SET rtx, SET_DEST is the place that is set
    and SET_SRC is the value it is set to.  */
diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
index 110fcdbca4d..c07e65696b9 100644
--- a/gcc/sched-ebb.cc
+++ b/gcc/sched-ebb.cc
@@ -478,12 +478,10 @@  schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
      a note or two.  */
   while (head != tail)
     {
-      if (NOTE_P (head) || DEBUG_INSN_P (head))
+      if (LABEL_P (head) || NOTE_P (head) || DEBUG_INSN_P (head))
 	head = NEXT_INSN (head);
       else if (NOTE_P (tail) || DEBUG_INSN_P (tail))
 	tail = PREV_INSN (tail);
-      else if (LABEL_P (head))
-	head = NEXT_INSN (head);
       else
 	break;
     }
@@ -494,7 +492,8 @@  schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
   if (no_real_insns_p (head, tail))
     return BLOCK_FOR_INSN (tail);
 
-  gcc_assert (INSN_P (head) && INSN_P (tail));
+  gcc_assert ((NOTE_P (head) && head == tail)
+	      || (INSN_P (head) && INSN_P (tail)));
 
   if (!bitmap_bit_p (&dont_calc_deps, first_bb->index))
     {
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..658349ba2b6 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -228,6 +228,9 @@  static edgeset *pot_split;
 /* For every bb, a set of its ancestor edges.  */
 static edgeset *ancestor_edges;
 
+/* Indicate the bb is empty initially if set.  */
+static bitmap rgn_init_empty_bb;
+
 #define INSN_PROBABILITY(INSN) (SRC_PROB (BLOCK_TO_BB (BLOCK_NUM (INSN))))
 
 /* Speculative scheduling functions.  */
@@ -3216,6 +3219,14 @@  schedule_region (int rgn)
       /* Clean up.  */
       if (current_nr_blocks > 1)
 	free_trg_info ();
+
+      /* This empty block isn't empty initially, it means the only NOTE
+	 inside was not counted when computing rgn_n_insns, so fix it up
+	 now.  */
+      if (head == tail
+	  && NOTE_P (head)
+	  && !bitmap_bit_p (rgn_init_empty_bb, bb))
+	rgn_n_insns++;
     }
 
   /* Sanity check: verify that all region insns were scheduled.  */
@@ -3448,7 +3459,16 @@  sched_rgn_local_init (int rgn)
 	    continue;
 	  FOR_EACH_EDGE (e, ei, block->succs)
 	    e->aux = NULL;
-        }
+	}
+    }
+
+  rgn_init_empty_bb = BITMAP_ALLOC (NULL);
+  for (bb = 0; bb < current_nr_blocks; bb++)
+    {
+      rtx_insn *head, *tail;
+      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+      if (head == tail && NOTE_P (head))
+	bitmap_set_bit (rgn_init_empty_bb, bb);
     }
 }
 
@@ -3461,6 +3481,7 @@  sched_rgn_local_free (void)
   sbitmap_vector_free (pot_split);
   sbitmap_vector_free (ancestor_edges);
   free (rgn_edges);
+  BITMAP_FREE (rgn_init_empty_bb);
 }
 
 /* Free data computed for the finished region.  */