[x86] PR target/106303: Fix TImode STV related failures.

Message ID 051501d89e66$55fcbff0$01f63fd0$@nextmovesoftware.com
State New
Headers
Series [x86] PR target/106303: Fix TImode STV related failures. |

Commit Message

Roger Sayle July 23, 2022, 7:32 a.m. UTC
  This patch resolves PR target/106303 (and the related PRs 106347,
106404, 106407) which are ICEs caused by my improvements to x86_64's
128-bit TImode to V1TImode Scalar to Vector (STV) pass.  My apologies
for the breakage.  The issue is that data flow analysis is used to
partition usage of each TImode pseudo into "chains", where each
chain is analyzed and if suitable converted to vector operations.
The problems appears when some chains for a pseudo are converted,
and others aren't as RTL sharing can result in some mode changes
leaking into other instructions that aren't/shouldn't/can't be
converted, which eventually leads to an ICE for mismatched modes.

My first approach to a fix was to unify more of the STV infrastructure,
reasoning that if TImode STV was exhibiting these problems, but DImode
and SImode STV weren't, the issue was likely to be caused/resolved by
these remaining differences.  This appeared to fix some but not all of
the reported PRs.  A better solution was then proposed by H.J. Lu in
Bugzilla (thanks!) that we need to iterate the removal of candidates in the
function timode_remove_non_convertible_regs until there are no further
changes.  As each chain is removed from consideration, it in turn may
affect whether other insns/chains can safely be converted.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-07-23  Roger Sayle  <roger@nextmovesoftware.com>
            H.J. Lu  <hjl.tools@gmail.com>

gcc/ChangeLog
        PR target/106303
        PR target/106347
        * config/i386/i386-features.cc (make_vector_copies): Move from
        general_scalar_chain to scalar_chain.
        (convert_reg): Likewise.
        (convert_insn_common): New scalar_chain method split out from
        general_scalar_chain convert_insn.
        (convert_registers): Move from general_scalar_chain to
        scalar_chain.
        (scalar_chain::convert): Call convert_insn_common before calling
        convert_insn.
        (timode_remove_non_convertible_regs): Iterate until there are
        no further changes to the candidates.
        * config/i386/i386-features.h (scalar_chain::hash_map): Move
        from general_scalar_chain.
        (scalar_chain::convert_reg): Likewise.
        (scalar_chain::convert_insn_common): New shared method.
        (scalar_chain::make_vector_copies): Move from general_scalar_chain.
        (scalar_chain::convert_registers): Likewise.  No longer virtual.
        (general_scalar_chain::hash_map): Delete.  Moved to scalar_chain.
        (general_scalar_chain::convert_reg): Likewise.
        (general_scalar_chain::make_vector_copies): Likewise.
        (general_scalar_chain::convert_registers): Delete virtual method.
        (timode_scalar_chain::convert_registers): Likewise.

gcc/testsuite/ChangeLog
        PR target/106303
        PR target/106347
        * gcc.target/i386/pr106303.c: New test case.
        * gcc.target/i386/pr106347.c: New test case.


Thanks in advance (and sorry again for the inconvenience),
Roger
--
  

Comments

Uros Bizjak July 24, 2022, 10:45 a.m. UTC | #1
On Sat, Jul 23, 2022 at 9:32 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch resolves PR target/106303 (and the related PRs 106347,
> 106404, 106407) which are ICEs caused by my improvements to x86_64's
> 128-bit TImode to V1TImode Scalar to Vector (STV) pass.  My apologies
> for the breakage.  The issue is that data flow analysis is used to
> partition usage of each TImode pseudo into "chains", where each
> chain is analyzed and if suitable converted to vector operations.
> The problems appears when some chains for a pseudo are converted,
> and others aren't as RTL sharing can result in some mode changes
> leaking into other instructions that aren't/shouldn't/can't be
> converted, which eventually leads to an ICE for mismatched modes.
>
> My first approach to a fix was to unify more of the STV infrastructure,
> reasoning that if TImode STV was exhibiting these problems, but DImode
> and SImode STV weren't, the issue was likely to be caused/resolved by
> these remaining differences.  This appeared to fix some but not all of
> the reported PRs.  A better solution was then proposed by H.J. Lu in
> Bugzilla (thanks!) that we need to iterate the removal of candidates in the
> function timode_remove_non_convertible_regs until there are no further
> changes.  As each chain is removed from consideration, it in turn may
> affect whether other insns/chains can safely be converted.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-07-23  Roger Sayle  <roger@nextmovesoftware.com>
>             H.J. Lu  <hjl.tools@gmail.com>
>
> gcc/ChangeLog
>         PR target/106303
>         PR target/106347
>         * config/i386/i386-features.cc (make_vector_copies): Move from
>         general_scalar_chain to scalar_chain.
>         (convert_reg): Likewise.
>         (convert_insn_common): New scalar_chain method split out from
>         general_scalar_chain convert_insn.
>         (convert_registers): Move from general_scalar_chain to
>         scalar_chain.
>         (scalar_chain::convert): Call convert_insn_common before calling
>         convert_insn.
>         (timode_remove_non_convertible_regs): Iterate until there are
>         no further changes to the candidates.
>         * config/i386/i386-features.h (scalar_chain::hash_map): Move
>         from general_scalar_chain.
>         (scalar_chain::convert_reg): Likewise.
>         (scalar_chain::convert_insn_common): New shared method.
>         (scalar_chain::make_vector_copies): Move from general_scalar_chain.
>         (scalar_chain::convert_registers): Likewise.  No longer virtual.
>         (general_scalar_chain::hash_map): Delete.  Moved to scalar_chain.
>         (general_scalar_chain::convert_reg): Likewise.
>         (general_scalar_chain::make_vector_copies): Likewise.
>         (general_scalar_chain::convert_registers): Delete virtual method.
>         (timode_scalar_chain::convert_registers): Likewise.
>
> gcc/testsuite/ChangeLog
>         PR target/106303
>         PR target/106347
>         * gcc.target/i386/pr106303.c: New test case.
>         * gcc.target/i386/pr106347.c: New test case.

OK. (git diff -w shows much clearly what was changed).

Thanks,
Uros.
  

Patch

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 813b203..aa5de71 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -708,7 +708,7 @@  gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr)
    and replace its uses in a chain.  */
 
 void
-general_scalar_chain::make_vector_copies (rtx_insn *insn, rtx reg)
+scalar_chain::make_vector_copies (rtx_insn *insn, rtx reg)
 {
   rtx vreg = *defs_map.get (reg);
 
@@ -772,7 +772,7 @@  general_scalar_chain::make_vector_copies (rtx_insn *insn, rtx reg)
    scalar uses outside of the chain.  */
 
 void
-general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
+scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
 {
   start_sequence ();
   if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
@@ -973,10 +973,10 @@  scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn)
 			 UNSPEC_PTEST);
 }
 
-/* Convert INSN to vector mode.  */
+/* Helper function for converting INSN to vector mode.  */
 
 void
-general_scalar_chain::convert_insn (rtx_insn *insn)
+scalar_chain::convert_insn_common (rtx_insn *insn)
 {
   /* Generate copies for out-of-chain uses of defs and adjust debug uses.  */
   for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
@@ -1037,7 +1037,13 @@  general_scalar_chain::convert_insn (rtx_insn *insn)
 	    XEXP (note, 0) = *vreg;
 	  *DF_REF_REAL_LOC (ref) = *vreg;
 	}
+}
+
+/* Convert INSN to vector mode.  */
 
+void
+general_scalar_chain::convert_insn (rtx_insn *insn)
+{
   rtx def_set = single_set (insn);
   rtx src = SET_SRC (def_set);
   rtx dst = SET_DEST (def_set);
@@ -1475,7 +1481,7 @@  timode_scalar_chain::convert_insn (rtx_insn *insn)
    Also populates defs_map which is used later by convert_insn.  */
 
 void
-general_scalar_chain::convert_registers ()
+scalar_chain::convert_registers ()
 {
   bitmap_iterator bi;
   unsigned id;
@@ -1510,7 +1516,9 @@  scalar_chain::convert ()
 
   EXECUTE_IF_SET_IN_BITMAP (insns, 0, id, bi)
     {
-      convert_insn (DF_INSN_UID_GET (id)->insn);
+      rtx_insn *insn = DF_INSN_UID_GET (id)->insn;
+      convert_insn_common (insn);
+      convert_insn (insn);
       converted_insns++;
     }
 
@@ -1843,56 +1851,62 @@  timode_remove_non_convertible_regs (bitmap candidates)
   bitmap_iterator bi;
   unsigned id;
   bitmap regs = BITMAP_ALLOC (NULL);
+  bool changed;
 
-  EXECUTE_IF_SET_IN_BITMAP (candidates, 0, id, bi)
-    {
-      rtx def_set = single_set (DF_INSN_UID_GET (id)->insn);
-      rtx dest = SET_DEST (def_set);
-      rtx src = SET_SRC (def_set);
-
-      if ((!REG_P (dest)
-	   || bitmap_bit_p (regs, REGNO (dest))
-	   || HARD_REGISTER_P (dest))
-	  && (!REG_P (src)
-	      || bitmap_bit_p (regs, REGNO (src))
-	      || HARD_REGISTER_P (src)))
-	continue;
-
-      if (REG_P (dest))
-	timode_check_non_convertible_regs (candidates, regs,
-					   REGNO (dest));
-
-      if (REG_P (src))
-	timode_check_non_convertible_regs (candidates, regs,
-					   REGNO (src));
-    }
+  do {
+    changed = false;
+    EXECUTE_IF_SET_IN_BITMAP (candidates, 0, id, bi)
+      {
+	rtx def_set = single_set (DF_INSN_UID_GET (id)->insn);
+	rtx dest = SET_DEST (def_set);
+	rtx src = SET_SRC (def_set);
+
+	if ((!REG_P (dest)
+	     || bitmap_bit_p (regs, REGNO (dest))
+	     || HARD_REGISTER_P (dest))
+	    && (!REG_P (src)
+		|| bitmap_bit_p (regs, REGNO (src))
+		|| HARD_REGISTER_P (src)))
+	  continue;
+
+	if (REG_P (dest))
+	  timode_check_non_convertible_regs (candidates, regs,
+					     REGNO (dest));
+
+	if (REG_P (src))
+	  timode_check_non_convertible_regs (candidates, regs,
+					     REGNO (src));
+      }
 
-  EXECUTE_IF_SET_IN_BITMAP (regs, 0, id, bi)
-    {
-      for (df_ref def = DF_REG_DEF_CHAIN (id);
-	   def;
-	   def = DF_REF_NEXT_REG (def))
-	if (bitmap_bit_p (candidates, DF_REF_INSN_UID (def)))
-	  {
-	    if (dump_file)
-	      fprintf (dump_file, "Removing insn %d from candidates list\n",
-		       DF_REF_INSN_UID (def));
+    EXECUTE_IF_SET_IN_BITMAP (regs, 0, id, bi)
+      {
+	for (df_ref def = DF_REG_DEF_CHAIN (id);
+	     def;
+	     def = DF_REF_NEXT_REG (def))
+	  if (bitmap_bit_p (candidates, DF_REF_INSN_UID (def)))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Removing insn %d from candidates list\n",
+			 DF_REF_INSN_UID (def));
 
-	    bitmap_clear_bit (candidates, DF_REF_INSN_UID (def));
-	  }
+	      bitmap_clear_bit (candidates, DF_REF_INSN_UID (def));
+	      changed = true;
+	    }
 
-      for (df_ref ref = DF_REG_USE_CHAIN (id);
-	   ref;
-	   ref = DF_REF_NEXT_REG (ref))
-	if (bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)))
-	  {
-	    if (dump_file)
-	      fprintf (dump_file, "Removing insn %d from candidates list\n",
-		       DF_REF_INSN_UID (ref));
+	for (df_ref ref = DF_REG_USE_CHAIN (id);
+	     ref;
+	     ref = DF_REF_NEXT_REG (ref))
+	  if (bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Removing insn %d from candidates list\n",
+			 DF_REF_INSN_UID (ref));
 
-	    bitmap_clear_bit (candidates, DF_REF_INSN_UID (ref));
-	  }
-    }
+	      bitmap_clear_bit (candidates, DF_REF_INSN_UID (ref));
+	      changed = true;
+	    }
+      }
+  } while (changed);
 
   BITMAP_FREE (regs);
 }
diff --git a/gcc/config/i386/i386-features.h b/gcc/config/i386/i386-features.h
index 88b222d..3d88a88 100644
--- a/gcc/config/i386/i386-features.h
+++ b/gcc/config/i386/i386-features.h
@@ -149,6 +149,7 @@  class scalar_chain
   bitmap defs_conv;
 
   bitmap insns_conv;
+  hash_map<rtx, rtx> defs_map;
   unsigned n_sse_to_integer;
   unsigned n_integer_to_sse;
 
@@ -161,12 +162,15 @@  class scalar_chain
   void emit_conversion_insns (rtx insns, rtx_insn *pos);
   rtx convert_compare (rtx op1, rtx op2, rtx_insn *insn);
   void mark_dual_mode_def (df_ref def);
+  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
+  void convert_insn_common (rtx_insn *insn);
+  void make_vector_copies (rtx_insn *, rtx);
+  void convert_registers ();
 
  private:
   void add_insn (bitmap candidates, unsigned insn_uid);
   void analyze_register_chain (bitmap candidates, df_ref ref);
   virtual void convert_insn (rtx_insn *insn) = 0;
-  virtual void convert_registers () = 0;
   virtual void convert_op (rtx *op, rtx_insn *insn) = 0;
 };
 
@@ -178,12 +182,8 @@  class general_scalar_chain : public scalar_chain
   int compute_convert_gain () final override;
 
  private:
-  hash_map<rtx, rtx> defs_map;
   void convert_insn (rtx_insn *insn) final override;
-  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
   void convert_op (rtx *op, rtx_insn *insn);
-  void make_vector_copies (rtx_insn *, rtx);
-  void convert_registers () final override;
   int vector_const_cost (rtx exp);
 };
 
@@ -197,8 +197,6 @@  class timode_scalar_chain : public scalar_chain
   void fix_debug_reg_uses (rtx reg);
   void convert_insn (rtx_insn *insn) final override;
   void convert_op (rtx *op, rtx_insn *insn);
-  /* We don't convert registers to different size.  */
-  void convert_registers () final override {}
 };
 
 } // anon namespace
diff --git a/gcc/testsuite/gcc.target/i386/pr106303.c b/gcc/testsuite/gcc.target/i386/pr106303.c
new file mode 100644
index 0000000..19cce40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106303.c
@@ -0,0 +1,25 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-inline-small-functions" } */
+
+struct a {
+  int b;
+  int c;
+  int d;
+  int e;
+} i, j;
+int f, g, h;
+struct a k() {
+  while (f)
+    i = j;
+  if (g) {
+    for (; h; h++)
+      i = j;
+    return j;
+  }
+  return i;
+}
+int main() {
+  k();
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr106347.c b/gcc/testsuite/gcc.target/i386/pr106347.c
new file mode 100644
index 0000000..003dd1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106347.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -fno-expensive-optimizations" } */
+
+__int128 m;
+int n;
+
+__attribute__ ((noinline)) int
+return_zero (void)
+{
+  return 0;
+}
+
+void
+foo (void)
+{
+  while (m < 0)
+    {
+      if (n || return_zero ())
+        __builtin_trap ();
+
+      ++m;
+    }
+}
+