[5/6] ira: Consider modelling caller-save allocations as loop spills

Message ID mpty23szzry.fsf@arm.com
State Committed
Commit 01f3e6a40e7202310abbeb41c345d325bd69554f
Headers
Series ira: Fix performance regression in exchange2 [PR98782] |

Commit Message

Richard Sandiford Jan. 6, 2022, 2:48 p.m. UTC
  If an allocno A in an inner loop L spans a call, a parent allocno AP
can choose to handle a call-clobbered/caller-saved hard register R
in one of two ways:

(1) save R before each call in L and restore R after each call
(2) spill R to memory throughout L

(2) can be cheaper than (1) in some cases, particularly if L does
not reference A.

Before the patch we always did (1).  The patch adds support for
picking (2) instead, when it seems cheaper.  It builds on the
earlier support for not propagating conflicts to parent allocnos.

gcc/
	PR rtl-optimization/98782
	* ira-int.h (ira_caller_save_cost): New function.
	(ira_caller_save_loop_spill_p): Likewise.
	* ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
	cheaper to spill a call-clobbered register throughout a loop rather
	than spill it around each individual call.  If so, treat all
	call-clobbered registers as conflicts and...
	(propagate_allocno_info): ...do not propagate call information
	from the child to the parent.
	* ira-color.c (move_spill_restore): Update accordingly.
	* ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.

gcc/testsuite/
	* gcc.target/aarch64/reg-alloc-3.c: New test.
---
 gcc/ira-build.c                               | 23 ++++---
 gcc/ira-color.c                               | 13 ++--
 gcc/ira-costs.c                               |  7 +-
 gcc/ira-int.h                                 | 39 +++++++++++
 .../gcc.target/aarch64/reg-alloc-3.c          | 65 +++++++++++++++++++
 5 files changed, 129 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/reg-alloc-3.c
  

Comments

Vladimir Makarov Jan. 10, 2022, 1:51 p.m. UTC | #1
On 2022-01-06 09:48, Richard Sandiford wrote:
> If an allocno A in an inner loop L spans a call, a parent allocno AP
> can choose to handle a call-clobbered/caller-saved hard register R
> in one of two ways:
>
> (1) save R before each call in L and restore R after each call
> (2) spill R to memory throughout L
>
> (2) can be cheaper than (1) in some cases, particularly if L does
> not reference A.
>
> Before the patch we always did (1).  The patch adds support for
> picking (2) instead, when it seems cheaper.  It builds on the
> earlier support for not propagating conflicts to parent allocnos.
Another cost calculation improvement for calls would be taking into 
account that allocno can be saved and restored once for several 
subsequent calls (e.g. in one BB).
> gcc/
> 	PR rtl-optimization/98782
> 	* ira-int.h (ira_caller_save_cost): New function.
> 	(ira_caller_save_loop_spill_p): Likewise.
> 	* ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
> 	cheaper to spill a call-clobbered register throughout a loop rather
> 	than spill it around each individual call.  If so, treat all
> 	call-clobbered registers as conflicts and...
> 	(propagate_allocno_info): ...do not propagate call information
> 	from the child to the parent.
> 	* ira-color.c (move_spill_restore): Update accordingly.
> 	* ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.
>
> gcc/testsuite/
> 	* gcc.target/aarch64/reg-alloc-3.c: New test.
OK for me.  Thank you for the patch.
  
Hans-Peter Nilsson Jan. 11, 2022, 4:21 a.m. UTC | #2
> From: Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org>
> Date: Thu, 6 Jan 2022 15:48:01 +0100

> If an allocno A in an inner loop L spans a call, a parent allocno AP
> can choose to handle a call-clobbered/caller-saved hard register R
> in one of two ways:
> 
> (1) save R before each call in L and restore R after each call
> (2) spill R to memory throughout L
> 
> (2) can be cheaper than (1) in some cases, particularly if L does
> not reference A.
> 
> Before the patch we always did (1).  The patch adds support for
> picking (2) instead, when it seems cheaper.  It builds on the
> earlier support for not propagating conflicts to parent allocnos.
> 
> gcc/
> 	PR rtl-optimization/98782
> 	* ira-int.h (ira_caller_save_cost): New function.
> 	(ira_caller_save_loop_spill_p): Likewise.
> 	* ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
> 	cheaper to spill a call-clobbered register throughout a loop rather
> 	than spill it around each individual call.  If so, treat all
> 	call-clobbered registers as conflicts and...
> 	(propagate_allocno_info): ...do not propagate call information
> 	from the child to the parent.
> 	* ira-color.c (move_spill_restore): Update accordingly.
> 	* ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.

I bisected a broken build for cris-elf to this patch.
Details in https://gcc.gnu.org/PR103974 supposedly
sufficient to find a quick resolution.

(JFTR, as you're already CC:ed by your @gcc.gnu.org account.)

Perhaps some of these patches are better postponed for stage 1?

brgds, H-P
  
Robin Dapp Jan. 11, 2022, 8:38 a.m. UTC | #3
Hi Richard,

this causes a bootstrap error on s390 (where
IRA_HARD_REGNO_ADD_COST_MULTIPLIER is defined). rclass is used in the
#define-guarded area.  I guess you also wanted to move this to the new
function ira_caller_save_cost?

Regards
 Robin

--

../../gcc/ira-costs.c: In function ‘void ira_tune_allocno_costs()’:
../../gcc/ira-costs.c:2354:45: error: ‘rclass’ was not declared in this
scope; did you mean ‘aclass’?
 2354 |        cost += ((ira_memory_move_cost[mode][rclass][0]
      |                                             ^~~~~~
      |                                             aclass
  
Richard Sandiford Jan. 11, 2022, 8:52 a.m. UTC | #4
Robin Dapp <rdapp@linux.ibm.com> writes:
> Hi Richard,
>
> this causes a bootstrap error on s390 (where
> IRA_HARD_REGNO_ADD_COST_MULTIPLIER is defined). rclass is used in the
> #define-guarded area.

Gah, sorry about that.

> I guess you also wanted to move this to the new function
> ira_caller_save_cost?

No, the IRA_HARD_REGNO_ADD_COST_MULTIPLIER heuristic is a separate thing.
It's just that I had to remove the rclass variable to allow bootstrap on
other targets.

Could you try the attached?

Thanks,
Richard
  
Robin Dapp Jan. 11, 2022, 11:31 a.m. UTC | #5
Hi Richard,

> Could you try the attached?

build and bootstrap look OK with it.  Testsuite shows lots of fallout
but the proper bisect isn't finished yet.  The commit before your series
is still fine - the problem could also be after it, though.  Will report
back later.

Thanks
 Robin
  
Martin Liška Jan. 11, 2022, 12:43 p.m. UTC | #6
On 1/11/22 09:52, Richard Sandiford via Gcc-patches wrote:
> Robin Dapp <rdapp@linux.ibm.com> writes:
>> Hi Richard,
>>
>> this causes a bootstrap error on s390 (where
>> IRA_HARD_REGNO_ADD_COST_MULTIPLIER is defined). rclass is used in the
>> #define-guarded area.
> 
> Gah, sorry about that.
> 
>> I guess you also wanted to move this to the new function
>> ira_caller_save_cost?
> 
> No, the IRA_HARD_REGNO_ADD_COST_MULTIPLIER heuristic is a separate thing.
> It's just that I had to remove the rclass variable to allow bootstrap on
> other targets.
> 
> Could you try the attached?

Hello.

I noticed the same failure. Please push the patch.

Thanks,
Martin

> 
> Thanks,
> Richard
> 
>
  
Robin Dapp Jan. 11, 2022, 12:47 p.m. UTC | #7
> Could you try the attached?

The series with the patch is OK from a testsuite point of view.  The
other problem appears later.

Regards
 Robin
  
Richard Sandiford Jan. 11, 2022, 12:49 p.m. UTC | #8
Robin Dapp <rdapp@linux.ibm.com> writes:
>> Could you try the attached?
>
> The series with the patch is OK from a testsuite point of view.  The
> other problem appears later.

OK, thanks for checking.  I've pushed the patch as obvious.

Richard
  

Patch

diff --git a/gcc/ira-build.c b/gcc/ira-build.c
index 875b4d8ed7c..ab3e87164e1 100644
--- a/gcc/ira-build.c
+++ b/gcc/ira-build.c
@@ -2000,6 +2000,8 @@  ira_propagate_hard_reg_costs (ira_allocno_t parent_a, ira_allocno_t a,
 			      int spill_cost)
 {
   HARD_REG_SET conflicts = ira_total_conflict_hard_regs (a);
+  if (ira_caller_save_loop_spill_p (parent_a, a, spill_cost))
+    conflicts |= ira_need_caller_save_regs (a);
   conflicts &= ~ira_total_conflict_hard_regs (parent_a);
 
   auto costs = ALLOCNO_HARD_REG_COSTS (a);
@@ -2069,15 +2071,18 @@  propagate_allocno_info (void)
 	  if (!ira_subloop_allocnos_can_differ_p (parent_a))
 	    merge_hard_reg_conflicts (a, parent_a, true);
 
-	  ALLOCNO_CALL_FREQ (parent_a) += ALLOCNO_CALL_FREQ (a);
-	  ALLOCNO_CALLS_CROSSED_NUM (parent_a)
-	    += ALLOCNO_CALLS_CROSSED_NUM (a);
-	  ALLOCNO_CHEAP_CALLS_CROSSED_NUM (parent_a)
-	    += ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a);
-	  ALLOCNO_CROSSED_CALLS_ABIS (parent_a)
-	    |= ALLOCNO_CROSSED_CALLS_ABIS (a);
-	  ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (parent_a)
-	    |= ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a);
+	  if (!ira_caller_save_loop_spill_p (parent_a, a, spill_cost))
+	    {
+	      ALLOCNO_CALL_FREQ (parent_a) += ALLOCNO_CALL_FREQ (a);
+	      ALLOCNO_CALLS_CROSSED_NUM (parent_a)
+		+= ALLOCNO_CALLS_CROSSED_NUM (a);
+	      ALLOCNO_CHEAP_CALLS_CROSSED_NUM (parent_a)
+		+= ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a);
+	      ALLOCNO_CROSSED_CALLS_ABIS (parent_a)
+		|= ALLOCNO_CROSSED_CALLS_ABIS (a);
+	      ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (parent_a)
+		|= ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a);
+	    }
 	  ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (parent_a)
 	    += ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a);
 	  aclass = ALLOCNO_CLASS (a);
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 4344ee6689e..1487afc5ef1 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -3597,11 +3597,16 @@  move_spill_restore (void)
 		 propagate_allocno_info will have propagated
 		 the cost of spilling HARD_REGNO in SUBLOOP_NODE.
 		 (ira_subloop_allocnos_can_differ_p must be true
-		 in that case.)  Otherwise, SPILL_COST acted as
-		 a cap on the propagated register cost, in cases
-		 where the allocations can differ.  */
+		 in that case.)  If HARD_REGNO is a caller-saved
+		 register, we might have modelled it in the same way.
+
+		 Otherwise, SPILL_COST acted as a cap on the propagated
+		 register cost, in cases where the allocations can differ.  */
 	      auto conflicts = ira_total_conflict_hard_regs (subloop_allocno);
-	      if (TEST_HARD_REG_BIT (conflicts, hard_regno))
+	      if (TEST_HARD_REG_BIT (conflicts, hard_regno)
+		  || (ira_need_caller_save_p (subloop_allocno, hard_regno)
+		      && ira_caller_save_loop_spill_p (a, subloop_allocno,
+						       spill_cost)))
 		reg_cost = spill_cost;
 	      else if (ira_subloop_allocnos_can_differ_p (a))
 		reg_cost = MIN (reg_cost, spill_cost);
diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 280befc58da..cbb58d32be8 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2308,7 +2308,7 @@  ira_tune_allocno_costs (void)
 {
   int j, n, regno;
   int cost, min_cost, *reg_costs;
-  enum reg_class aclass, rclass;
+  enum reg_class aclass;
   machine_mode mode;
   ira_allocno_t a;
   ira_allocno_iterator ai;
@@ -2347,12 +2347,9 @@  ira_tune_allocno_costs (void)
 		}
 	      if (skip_p)
 		continue;
-	      rclass = REGNO_REG_CLASS (regno);
 	      cost = 0;
 	      if (ira_need_caller_save_p (a, regno))
-		cost += (ALLOCNO_CALL_FREQ (a)
-			 * (ira_memory_move_cost[mode][rclass][0]
-			    + ira_memory_move_cost[mode][rclass][1]));
+		cost += ira_caller_save_cost (a);
 #ifdef IRA_HARD_REGNO_ADD_COST_MULTIPLIER
 	      cost += ((ira_memory_move_cost[mode][rclass][0]
 			+ ira_memory_move_cost[mode][rclass][1])
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index 8b87498f77f..a78811eb416 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -1660,4 +1660,43 @@  ira_total_conflict_hard_regs (ira_allocno_t a)
   return conflicts;
 }
 
+/* Return the cost of saving a caller-saved register before each call
+   in A's live range and restoring the same register after each call.  */
+inline int
+ira_caller_save_cost (ira_allocno_t a)
+{
+  auto mode = ALLOCNO_MODE (a);
+  auto rclass = ALLOCNO_CLASS (a);
+  return (ALLOCNO_CALL_FREQ (a)
+	  * (ira_memory_move_cost[mode][rclass][0]
+	     + ira_memory_move_cost[mode][rclass][1]));
+}
+
+/* A and SUBLOOP_A are allocnos for the same pseudo register, with A's
+   loop immediately enclosing SUBLOOP_A's loop.  If we allocate to A a
+   hard register R that is clobbered by a call in SUBLOOP_A, decide
+   which of the following approaches should be used for handling the
+   conflict:
+
+   (1) Spill R on entry to SUBLOOP_A's loop, assign memory to SUBLOOP_A,
+       and restore R on exit from SUBLOOP_A's loop.
+
+   (2) Spill R before each necessary call in SUBLOOP_A's live range and
+       restore R after each such call.
+
+   Return true if (1) is better than (2).  SPILL_COST is the cost of
+   doing (1).  */
+inline bool
+ira_caller_save_loop_spill_p (ira_allocno_t a, ira_allocno_t subloop_a,
+			      int spill_cost)
+{
+  if (!ira_subloop_allocnos_can_differ_p (a))
+    return false;
+
+  /* Calculate the cost of saving a call-clobbered register
+     before each call and restoring it afterwards.  */
+  int call_cost = ira_caller_save_cost (subloop_a);
+  return call_cost && call_cost >= spill_cost;
+}
+
 #endif /* GCC_IRA_INT_H */
diff --git a/gcc/testsuite/gcc.target/aarch64/reg-alloc-3.c b/gcc/testsuite/gcc.target/aarch64/reg-alloc-3.c
new file mode 100644
index 00000000000..7acdc432b0c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/reg-alloc-3.c
@@ -0,0 +1,65 @@ 
+/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define PROB 0.1
+
+struct L
+{
+  int data;
+  volatile struct L *next;
+  volatile struct L *inner;
+};
+
+void ext();
+
+/* The thing we're testing here is that the !head->inner path of the outer loop
+   body has no stack accesses.  It's possible that we'll need to update this
+   pattern for unrelated code changes. but the test should be XFAILed rather
+   than changed if any new stack accesses creep into the !head->inner path.  */
+/*
+** foo:
+**	...
+**	ldr	(w[0-9]+), \[(x[0-9]+)\]
+**	add	(w[0-9]+), (?:\3, \1|\1, \3)
+**	ldr	(x[0-9]+), \[\2, #?16\]
+**	str	\3, \[\2\]
+**	ldr	\2, \[\2, #?8\]
+**	cbn?z	\4, .*
+**	...
+**	ret
+*/
+void
+foo (volatile struct L *head, int inc, double *ptr)
+{
+  double d = *ptr;
+  while (head)
+    {
+      /* Clobber all call-preserved GPRs, so that the loop has to use
+	 call-clobbered GPRs if it is to avoid spilling.  */
+      asm volatile ("" :::
+		    "x19", "x20", "x21", "x22", "x23",
+		    "x24", "x25", "x26", "x27", "x28");
+      inc = head->data + inc;
+      volatile struct L *inner = head->inner;
+      head->data = inc;
+      head = head->next;
+      if (__builtin_expect_with_probability (inner != 0, 0, PROB))
+	for (int i = 0; i < 1000; ++i)
+	  {
+	    ext ();
+	    /* Hack to create high register pressure, so that IRA doesn't
+	       collapse this loop into the parent loop.  */
+	    d += 1;
+	    asm volatile ("// foo" :::
+			  "d0", "d1", "d2", "d3",
+			  "d4", "d5", "d6", "d7",
+			  "d8", "d9", "d10", "d11",
+			  "d12", "d13", "d14", "d15",
+			  "d16", "d17", "d18", "d19",
+			  "d20", "d21", "d22", "d23",
+			  "d24", "d25", "d26", "d27",
+			  "d28", "d29", "d30", "d31");
+	  }
+    }
+  *ptr = d;
+}