[v2] xtensa: constantsynth: Reforge to fix some non-fatal issues

Message ID 5d4276f6-b918-4b8a-877d-6c4140cd0ca9@yahoo.co.jp
State New
Headers
Series [v2] xtensa: constantsynth: Reforge to fix some non-fatal issues |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply

Commit Message

Takayuki 'January June' Suwa June 19, 2024, 2:55 a.m. UTC
  The previous constant synthesis logic had some issues that were non-fatal
but worth considering:

- It didn't work with DFmode literals, because those were cast to SImode
   rather SFmode when splitting into two natural-width words by
   split_double().

- It didn't work with large literals when TARGET_AUTO_LITPOOLS was enabled,
   because those were relaxed MOVI immediates rather references to literal
   pool entries,

- It didn't take into account that when literals with the same RTL
   representation are pooled multiple times within a function, those entries
   are shared (especially important when optimizing for size).

This patch addresses the above issues by making appropriate tweaks to the
constant synthesis logic.

gcc/ChangeLog:

	* config/xtensa/xtensa-protos.h (xtensa_constantsynth):
	Change the second argument from HOST_WIDE_INT to rtx.
	* config/xtensa/xtensa.cc (#include):
	Add "context.h" and "pass_manager.h".
	(machine_function): Add a new hash_map field "litpool_usage".
	(xtensa_constantsynth): Make "src" (the second operand) accept
	RTX literal instead of its value, and treat both bare and pooled
	SI/SFmode literals equally by bit-exact canonicalization into
	CONST_INT RTX internally.  And then, make avoid synthesis if
	such multiple identical canonicalized literals are found in same
	function when optimizing for size.  Finally, for literals where
	synthesis is not possible or has been avoided, re-emit "move"
	RTXes with canonicalized ones to increase the chances of sharing
	literal pool entries.
	* config/xtensa/xtensa.md (split patterns for constant synthesis):
	Change to simply invoke xtensa_constantsynth() as mentioned above,
	and add new patterns for when TARGET_AUTO_LITPOOLS is enabled.
---
  gcc/config/xtensa/xtensa-protos.h |  2 +-
  gcc/config/xtensa/xtensa.cc       | 75 ++++++++++++++++++++++++++++---
  gcc/config/xtensa/xtensa.md       | 56 ++++++++++++++---------
  3 files changed, 103 insertions(+), 30 deletions(-)
  

Comments

Max Filippov June 19, 2024, 8:23 a.m. UTC | #1
On Tue, Jun 18, 2024 at 7:56 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> The previous constant synthesis logic had some issues that were non-fatal
> but worth considering:
>
> - It didn't work with DFmode literals, because those were cast to SImode
>    rather SFmode when splitting into two natural-width words by
>    split_double().
>
> - It didn't work with large literals when TARGET_AUTO_LITPOOLS was enabled,
>    because those were relaxed MOVI immediates rather references to literal
>    pool entries,
>
> - It didn't take into account that when literals with the same RTL
>    representation are pooled multiple times within a function, those entries
>    are shared (especially important when optimizing for size).
>
> This patch addresses the above issues by making appropriate tweaks to the
> constant synthesis logic.
>
> gcc/ChangeLog:
>
>         * config/xtensa/xtensa-protos.h (xtensa_constantsynth):
>         Change the second argument from HOST_WIDE_INT to rtx.
>         * config/xtensa/xtensa.cc (#include):
>         Add "context.h" and "pass_manager.h".
>         (machine_function): Add a new hash_map field "litpool_usage".
>         (xtensa_constantsynth): Make "src" (the second operand) accept
>         RTX literal instead of its value, and treat both bare and pooled
>         SI/SFmode literals equally by bit-exact canonicalization into
>         CONST_INT RTX internally.  And then, make avoid synthesis if
>         such multiple identical canonicalized literals are found in same
>         function when optimizing for size.  Finally, for literals where
>         synthesis is not possible or has been avoided, re-emit "move"
>         RTXes with canonicalized ones to increase the chances of sharing
>         literal pool entries.
>         * config/xtensa/xtensa.md (split patterns for constant synthesis):
>         Change to simply invoke xtensa_constantsynth() as mentioned above,
>         and add new patterns for when TARGET_AUTO_LITPOOLS is enabled.
> ---
>   gcc/config/xtensa/xtensa-protos.h |  2 +-
>   gcc/config/xtensa/xtensa.cc       | 75 ++++++++++++++++++++++++++++---
>   gcc/config/xtensa/xtensa.md       | 56 ++++++++++++++---------
>   3 files changed, 103 insertions(+), 30 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.
  

Patch

diff --git a/gcc/config/xtensa/xtensa-protos.h b/gcc/config/xtensa/xtensa-protos.h
index 77553b0453f..8f645e87de9 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -44,7 +44,7 @@  extern int xtensa_expand_scc (rtx *, machine_mode);
  extern int xtensa_expand_block_move (rtx *);
  extern int xtensa_expand_block_set (rtx *);
  extern void xtensa_split_operand_pair (rtx *, machine_mode);
-extern int xtensa_constantsynth (rtx, HOST_WIDE_INT);
+extern int xtensa_constantsynth (rtx, rtx);
  extern int xtensa_emit_move_sequence (rtx *, machine_mode);
  extern rtx xtensa_copy_incoming_a7 (rtx);
  extern void xtensa_expand_nonlocal_goto (rtx *);
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 45dc1be3ff5..bc127997ac6 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -58,6 +58,8 @@  along with GCC; see the file COPYING3.  If not see
  #include "insn-attr.h"
  #include "tree-pass.h"
  #include "print-rtl.h"
+#include "context.h"
+#include "pass_manager.h"
  #include <math.h>
  
  /* This file should be included last.  */
@@ -107,6 +109,7 @@  struct GTY(()) machine_function
    bool inhibit_logues_a1_adjusts;
    rtx last_logues_a9_content;
    HARD_REG_SET eliminated_callee_saved;
+  hash_map<rtx, int> *litpool_usage;
  };
  
  static void xtensa_option_override (void);
@@ -1104,7 +1107,7 @@  xtensa_split_operand_pair (rtx operands[4], machine_mode mode)
  }
  
  
-/* Try to emit insns to load srcval (that cannot fit into signed 12-bit)
+/* Try to emit insns to load src (either naked or pooled SI/SF constant)
     into dst with synthesizing a such constant value from a sequence of
     load-immediate / arithmetic ones, instead of a L32R instruction
     (plus a constant in litpool).  */
@@ -1190,11 +1193,67 @@  xtensa_constantsynth_rtx_ADDSUBX (rtx reg, HOST_WIDE_INT imm)
  }
  
  int
-xtensa_constantsynth (rtx dst, HOST_WIDE_INT srcval)
+xtensa_constantsynth (rtx dst, rtx src)
  {
+  HOST_WIDE_INT srcval;
+  static opt_pass *pass_rtl_split2;
+  int *pv;
+
+  /* Derefer if src is litpool entry, and get integer constant value.  */
+  src = avoid_constant_pool_reference (src);
+  if (CONST_INT_P (src))
+    srcval = INTVAL (src);
+  else if (CONST_DOUBLE_P (src) && GET_MODE (src) == SFmode)
+    {
+      long l;
+
+      REAL_VALUE_TO_TARGET_SINGLE (*CONST_DOUBLE_REAL_VALUE (src), l);
+      srcval = (int32_t)l, src = GEN_INT (srcval);
+    }
+  else
+    return 0;
+
+  /* Force dst as SImode.  */
+  gcc_assert (REG_P (dst));
+  if (GET_MODE (dst) != SImode)
+    dst = gen_rtx_REG (SImode, REGNO (dst));
+
+  if (optimize_size)
+    {
+      /* During the first split pass after register allocation (rtl-split2),
+	 record the occurrence of integer src value and do nothing.  */
+      if (!pass_rtl_split2)
+	pass_rtl_split2 = g->get_passes ()->get_pass_by_name ("rtl-split2");
+      if (current_pass == pass_rtl_split2)
+	{
+	  if (!cfun->machine->litpool_usage)
+	    cfun->machine->litpool_usage = hash_map<rtx, int>::create_ggc ();
+	  if ((pv = cfun->machine->litpool_usage->get (src)))
+	    ++*pv;
+	  else
+	    cfun->machine->litpool_usage->put (src, 1);
+	  return 0;
+	}
+
+      /* If two or more identical integer constants appear in the function,
+	 the code size can be reduced by re-emitting a "move" (load from an
+	 either litpool entry or relaxed immediate) instruction in SImode
+	 to increase the chances that the litpool entry will be shared.  */
+      if (cfun->machine->litpool_usage
+	  && (pv = cfun->machine->litpool_usage->get (src))
+	  && *pv > 1)
+	{
+	  emit_move_insn (dst, src);
+	  return 1;
+	}
+    }
+
    /* No need for synthesizing for what fits into MOVI instruction.  */
    if (xtensa_simm12b (srcval))
-    return 0;
+    {
+      emit_move_insn (dst, src);
+      return 1;
+    }
  
    /* 2-insns substitution.  */
    if ((optimize_size || (optimize && xtensa_extra_l32r_costs >= 1))
@@ -1240,9 +1299,6 @@  xtensa_constantsynth (rtx dst, HOST_WIDE_INT srcval)
  		  emit_insn (gen_rotlsi3 (dst, dst, GEN_INT (shift)));
  		  return 1;
  		}
-	    }
-	  for (shift = 1; shift < 12; ++shift)
-	    {
  	      v = (int32_t)(((uint32_t)srcval << shift)
  			    | ((uint32_t)srcval >> (32 - shift)));
  	      if (xtensa_simm12b(v))
@@ -1255,7 +1311,12 @@  xtensa_constantsynth (rtx dst, HOST_WIDE_INT srcval)
  	}
      }
  
-  return 0;
+  /* If cannot synthesize the value and also cannot fit into MOVI instruc-
+     tion, re-emit a "move" (load from an either litpool entry or relaxed
+     immediate) instruction in SImode in order to increase the chances that
+     the litpool entry will be shared.  */
+  emit_move_insn (dst, src);
+  return 1;
  }
  
  
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index ef826b63e44..8709ef6d7a7 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1285,17 +1285,25 @@ 
  (define_split
    [(set (match_operand:SHI 0 "register_operand")
  	(match_operand:SHI 1 "constantpool_operand"))]
-  "! optimize_debug && reload_completed"
+  "!optimize_debug && reload_completed"
    [(const_int 0)]
  {
-  rtx x = avoid_constant_pool_reference (operands[1]), dst = operands[0];
-  if (! CONST_INT_P (x))
-    FAIL;
-  if (<MODE>mode == HImode)
-    dst = gen_rtx_REG (SImode, REGNO (dst));
-  if (! xtensa_constantsynth (dst, INTVAL (x)))
-    emit_move_insn (dst, x);
-  DONE;
+  if (xtensa_constantsynth (operands[0], operands[1]))
+    DONE;
+  FAIL;
+})
+
+(define_split
+  [(set (match_operand:SHI 0 "register_operand")
+	(match_operand:SHI 1 "const_int_operand"))]
+  "!optimize_debug && reload_completed
+   && !TARGET_CONST16 && TARGET_AUTO_LITPOOLS
+   && ! xtensa_simm12b (INTVAL (operands[1]))"
+  [(const_int 0)]
+{
+  if (xtensa_constantsynth (operands[0], operands[1]))
+    DONE;
+  FAIL;
  })
  
  ;; 16-bit Integer moves
@@ -1503,21 +1511,25 @@ 
  
  (define_split
    [(set (match_operand:SF 0 "register_operand")
-	(match_operand:SF 1 "constantpool_operand"))]
-  "! optimize_debug && reload_completed"
+	(match_operand 1 "constantpool_operand"))]
+  "!optimize_debug && reload_completed"
    [(const_int 0)]
  {
-  rtx x = avoid_constant_pool_reference (operands[1]);
-  long l;
-  HOST_WIDE_INT value;
-  if (! CONST_DOUBLE_P (x) || GET_MODE (x) != SFmode)
-    FAIL;
-  REAL_VALUE_TO_TARGET_SINGLE (*CONST_DOUBLE_REAL_VALUE (x), l);
-  x = gen_rtx_REG (SImode, REGNO (operands[0]));
-  value = (int32_t)l;
-  if (! xtensa_constantsynth (x, value))
-    emit_move_insn (x, GEN_INT (value));
-  DONE;
+  if (xtensa_constantsynth (operands[0], operands[1]))
+    DONE;
+  FAIL;
+})
+
+(define_split
+  [(set (match_operand:SF 0 "register_operand")
+	(match_operand 1 "const_double_operand"))]
+  "!optimize_debug && reload_completed
+   && !TARGET_CONST16 && TARGET_AUTO_LITPOOLS"
+  [(const_int 0)]
+{
+  if (xtensa_constantsynth (operands[0], operands[1]))
+    DONE;
+  FAIL;
  })
  
  ;; 64-bit floating point moves