xtensa: Allow SImode GENERAL_REGS pseudos to spill into FP_REGS hardregs

Message ID a8ede884-34f5-4eee-be25-e458e3c1ba49@yahoo.co.jp
State New
Headers
Series xtensa: Allow SImode GENERAL_REGS pseudos to spill into FP_REGS hardregs |

Commit Message

Takayuki 'January June' Suwa Sept. 11, 2025, 8:48 p.m. UTC
  If the Floating-Point Coprocessor option is configured and there are no
floating-point coprocessor instructions in the function, the unused
floating-point registers (f0 thru f15) can be good targets for spilling
address (integer) registers.

For reference, the ISA manual says that [RW]FR machine instructions have
non-arithmetic behavior and do not raise floating-point exceptions.

     /* example */
     int test(int a, int b) {
       asm volatile ("# clobbers":::"a2","a3","a4","a5","a6","a7","a8","a9","a10","a11","a12","a13","a14","a15");
       return a + b * 8;
     }

     ;; before (-O2 -mabi=windowed)
     test:
     	entry	sp, 48
     	s32i.n	a2, sp, 0
     	s32i.n	a3, sp, 4
     	# clobbers
     	l32i.n	a8, sp, 4
     	l32i.n	a9, sp, 0
     	addx8	a2, a8, a9
     	retw.n

     ;; after (-O2 -mabi=windowed)
     test:
     	entry	sp, 48
     	wfr	f1, a2
     	wfr	f0, a3
     	# clobbers
     	rfr	a8, f0
     	rfr	a9, f1
     	addx8	a2, a8, a9
     	retw.n

gcc/ChangeLog:

	* config/xtensa/xtensa.cc (machine_function):
	Add a new int field "hard_float_insn_usage".
	(xtensa_spill_class): New prototype and function.
	(TARGET_SPILL_CLASS): Define macro.
	(xtensa_option_override): Modify the organization of the
	"xtensa_hard_regno_mode_ok_p" matrix to allow floating-point
	registers to hold SImode values ​​in addition to SFmode.
	* config/xtensa/xtensa.md (movsi_internal):
	Add two machine instructions for moving from/to floating-point
	registers to the instruction template list.
---
  gcc/config/xtensa/xtensa.cc | 74 ++++++++++++++++++++++++++++++++++++-
  gcc/config/xtensa/xtensa.md | 34 +++++++++--------
  2 files changed, 91 insertions(+), 17 deletions(-)
  

Comments

Max Filippov Sept. 12, 2025, 5:14 a.m. UTC | #1
Hi Suwa-san,

On Thu, Sep 11, 2025 at 1:48 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> If the Floating-Point Coprocessor option is configured and there are no
> floating-point coprocessor instructions in the function, the unused
> floating-point registers (f0 thru f15) can be good targets for spilling
> address (integer) registers.
>
> For reference, the ISA manual says that [RW]FR machine instructions have
> non-arithmetic behavior and do not raise floating-point exceptions.

They don't raise floating-point exceptions, but they would still raise
the "coprocessor disabled" exception if the FP coprocessor is disabled.
Also all the conditions that you mentioned may be satisfied in the linux
kernel code, yet it shouldn't clobber the FP registers.
  
Takayuki 'January June' Suwa Sept. 12, 2025, 5:44 a.m. UTC | #2
Hi!

On 2025/09/12 14:14, Max Filippov wrote:
> Hi Suwa-san,
> 
> On Thu, Sep 11, 2025 at 1:48 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
>>
>> If the Floating-Point Coprocessor option is configured and there are no
>> floating-point coprocessor instructions in the function, the unused
>> floating-point registers (f0 thru f15) can be good targets for spilling
>> address (integer) registers.
>>
>> For reference, the ISA manual says that [RW]FR machine instructions have
>> non-arithmetic behavior and do not raise floating-point exceptions.
> 
> They don't raise floating-point exceptions, but they would still raise
> the "coprocessor disabled" exception if the FP coprocessor is disabled.
> Also all the conditions that you mentioned may be satisfied in the linux
> kernel code, yet it shouldn't clobber the FP registers.

Ah, I see.

So instead, let's have a target-specific option that enables this feature,
and it will be disabled by default and not automatically enabled even at
high optimization levels.

=====
jjsuwa_sys3175@yahoo.co.jp
  
Jeff Law Sept. 12, 2025, 1:06 p.m. UTC | #3
On 9/11/25 11:14 PM, Max Filippov wrote:
> Hi Suwa-san,
> 
> On Thu, Sep 11, 2025 at 1:48 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
>>
>> If the Floating-Point Coprocessor option is configured and there are no
>> floating-point coprocessor instructions in the function, the unused
>> floating-point registers (f0 thru f15) can be good targets for spilling
>> address (integer) registers.
>>
>> For reference, the ISA manual says that [RW]FR machine instructions have
>> non-arithmetic behavior and do not raise floating-point exceptions.
> 
> They don't raise floating-point exceptions, but they would still raise
> the "coprocessor disabled" exception if the FP coprocessor is disabled.
> Also all the conditions that you mentioned may be satisfied in the linux
> kernel code, yet it shouldn't clobber the FP registers.
Also note that it's common for moves that cross functional units to have 
extra penalties which often negate any benefit from avoiding the memory 
load/store if you'd spilled the GPR into memory.

I'd strongly suggest benchmarking this before installing.

jeff
  

Patch

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 2e7a9df32d2..7573b55b36b 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -110,6 +110,7 @@  struct GTY(()) machine_function
    rtx last_logues_a9_content;
    HARD_REG_SET eliminated_callee_saved;
    hash_map<rtx, int> *litpool_usage;
+  int hard_float_insn_usage;
  };
  
  static void xtensa_option_override (void);
@@ -203,6 +204,7 @@  static rtx xtensa_delegitimize_address (rtx);
  static reg_class_t xtensa_ira_change_pseudo_allocno_class (int, reg_class_t,
  							   reg_class_t);
  static HARD_REG_SET xtensa_zero_call_used_regs (HARD_REG_SET);
+static reg_class_t xtensa_spill_class (reg_class_t, machine_mode);
  
  
  
@@ -376,6 +378,9 @@  static HARD_REG_SET xtensa_zero_call_used_regs (HARD_REG_SET);
  #undef TARGET_ZERO_CALL_USED_REGS
  #define TARGET_ZERO_CALL_USED_REGS xtensa_zero_call_used_regs
  
+#undef TARGET_SPILL_CLASS
+#define TARGET_SPILL_CLASS xtensa_spill_class
+
  struct gcc_target targetm = TARGET_INITIALIZER;
  
  
@@ -3040,7 +3045,7 @@  xtensa_option_override (void)
  	  else if (GP_REG_P (regno))
  	    temp = ((regno & 1) == 0 || (size <= UNITS_PER_WORD));
  	  else if (FP_REG_P (regno))
-	    temp = (TARGET_HARD_FLOAT && (mode == SFmode));
+	    temp = (TARGET_HARD_FLOAT && (mode == SFmode || mode == SImode));
  	  else if (BR_REG_P (regno))
  	    temp = (TARGET_BOOLEANS && (mode == CCmode));
  	  else
@@ -5671,4 +5676,71 @@  xtensa_zero_call_used_regs (HARD_REG_SET selected_regs)
    return selected_regs;
  }
  
+/* Implement TARGET_SPILL_CLASS.
+
+   This hook is called several times during the LRA pass, for each function
+   being compiled.  */
+
+static reg_class_t
+xtensa_spill_class (reg_class_t rclass, machine_mode mode)
+{
+  rtx_insn *insn;
+
+  /* This does not apply unless all of the following conditions are met:
+      - The Floating-Point Coprocessor Option is enabled (prerequisite)
+      - The optimization level is greater than 1
+      - Optimizing for speed or the Code Density Option is not enabled  */
+  if (! (TARGET_HARD_FLOAT
+	 && optimize > 1
+	 && (!optimize_size || !TARGET_DENSITY)))
+    return NO_REGS;
+
+  /* Allow integer registers to be spilled into FP_REGS rclass (to which
+     floating-point registers belong) only if the function currently being
+     compiled does not contain any floating-point coprocessor instructions.  */
+retry:
+  switch (cfun->machine->hard_float_insn_usage)
+    {
+    case 0:	/* Not yet investigated (the initial state).  */
+      /* Scan the insns in the function for floating-point coprocessor
+	 instructions.  */
+      for (insn = get_insns ();
+	   (insn = next_nonnote_nondebug_insn (insn)); )
+	if (recog_memoized (insn) >= 0)
+	  switch (get_attr_type (insn))
+	    {
+	    /* An insn that is a floating-point coprocessor instruction is
+	       found, so record this.  */
+	    case TYPE_FARITH:
+	    case TYPE_FMADD:
+	    case TYPE_FCONV:
+	    case TYPE_FLOAD:
+	    case TYPE_FSTORE:
+	      cfun->machine->hard_float_insn_usage = 1;
+	      goto retry;
+	    default:
+	      break;
+	    }
+      /* No insn that is a floating-point coprocessor instruction was found,
+	 so record this.  */
+      cfun->machine->hard_float_insn_usage = 2;
+      goto retry;
+    case 1:	/* Contains floating-point coprocessor instructions.  */
+      /* Deny spilling to any rclass.  */
+      return NO_REGS;
+    default:	/* Does not contain floating-point coprocessor
+		   instructions.  */
+      break;
+    }
+
+  /* If rclass belongs to GENERAL_REGS and SImode is specified, spilling
+     to FP_REGS is allowed.  */
+  if (reg_class_subset_p (rclass, GENERAL_REGS)
+      && mode == SImode)
+    return FP_REGS;
+
+  /* Deny spilling to any rclass.  */
+  return NO_REGS;
+}
+
  #include "gt-xtensa.h"
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 52ffb161c0f..6371e3976f1 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1293,22 +1293,24 @@ 
  	(match_operand:SI 1 "move_operand"))]
    "xtensa_valid_move (SImode, operands)"
    {@ [cons: =0, 1; attrs: type, length]
-     [ D,  M; move , 2] movi.n\t%0, %x1
-     [ D,  D; move , 2] mov.n\t%0, %1
-     [ D,  d; move , 2] ^
-     [ D,  R; load , 2] %v1l32i.n\t%0, %1
-     [ R,  D; store, 2] %v0s32i.n\t%1, %0
-     [ R,  d; store, 2] ^
-     [ a,  r; move , 3] mov\t%0, %1
-     [ q,  r; move , 3] movsp\t%0, %1
-     [ a,  I; move , 3] movi\t%0, %x1
-     [ a,  Y; load , 3] movi\t%0, %1
-     [ W,  i; move , 6] const16\t%0, %t1\;const16\t%0, %b1
-     [ a,  T; load , 3] %v1l32r\t%0, %1
-     [ a,  U; load , 3] %v1l32i\t%0, %1
-     [ U,  r; store, 3] %v0s32i\t%1, %0
-     [*a, *A; rsr  , 3] rsr\t%0, ACCLO
-     [*A, *r; wsr  , 3] wsr\t%1, ACCLO
+     [ D,  M; move  , 2] movi.n\t%0, %x1
+     [ D,  D; move  , 2] mov.n\t%0, %1
+     [ D,  d; move  , 2] ^
+     [ D,  R; load  , 2] %v1l32i.n\t%0, %1
+     [ R,  D; store , 2] %v0s32i.n\t%1, %0
+     [ R,  d; store , 2] ^
+     [ a,  r; move  , 3] mov\t%0, %1
+     [ q,  r; move  , 3] movsp\t%0, %1
+     [ a,  I; move  , 3] movi\t%0, %x1
+     [ a,  Y; load  , 3] movi\t%0, %1
+     [ W,  i; move  , 6] const16\t%0, %t1\;const16\t%0, %b1
+     [ a,  T; load  , 3] %v1l32r\t%0, %1
+     [ a,  U; load  , 3] %v1l32i\t%0, %1
+     [ U,  r; store , 3] %v0s32i\t%1, %0
+     [*a, *f; farith, 3] rfr\t%0, %1
+     [*f, *r; farith, 3] wfr\t%0, %1
+     [*a, *A; rsr   , 3] rsr\t%0, ACCLO
+     [*A, *r; wsr   , 3] wsr\t%1, ACCLO
    }
    [(set_attr "mode" "SI")])