Add debug counters to back threader.

Message ID 20211101095443.727405-1-aldyh@redhat.com
State New
Headers
Series Add debug counters to back threader. |

Commit Message

Aldy Hernandez Nov. 1, 2021, 9:54 a.m. UTC
  Chasing down stage3 miscomparisons is never fun, and having no way to
distinguish between jump threads registered by a particular
pass, is even harder.  This patch adds debug counters for the individual
back threading passes.  I've left the ethread pass alone, as that one is
usually benign, but we could easily add it if needed.

The fact that we can only pass one boolean argument to the passes
infrastructure has us do all sorts of gymnastics to differentiate
between the various back threading passes.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

	* dbgcnt.def: Add debug counter for back_thread[12] and
	back_threadfull[12].
	* passes.def: Pass "first" argument to each back threading pass.
	* tree-ssa-threadbackward.c (back_threader::back_threader): Add
	first argument.
	(back_threader::debug_counter): New.
	(back_threader::maybe_register_path): Call debug_counter.
---
 gcc/dbgcnt.def                |  4 ++
 gcc/passes.def                | 10 ++---
 gcc/tree-ssa-threadbackward.c | 71 ++++++++++++++++++++++++++++++++---
 3 files changed, 74 insertions(+), 11 deletions(-)
  

Comments

Jeff Law Nov. 1, 2021, 1:02 p.m. UTC | #1
On 11/1/2021 3:54 AM, Aldy Hernandez wrote:
> Chasing down stage3 miscomparisons is never fun, and having no way to
> distinguish between jump threads registered by a particular
> pass, is even harder.  This patch adds debug counters for the individual
> back threading passes.  I've left the ethread pass alone, as that one is
> usually benign, but we could easily add it if needed.
>
> The fact that we can only pass one boolean argument to the passes
> infrastructure has us do all sorts of gymnastics to differentiate
> between the various back threading passes.
>
> Tested on x86-64 Linux.
>
> OK?
>
> gcc/ChangeLog:
>
> 	* dbgcnt.def: Add debug counter for back_thread[12] and
> 	back_threadfull[12].
> 	* passes.def: Pass "first" argument to each back threading pass.
> 	* tree-ssa-threadbackward.c (back_threader::back_threader): Add
> 	first argument.
> 	(back_threader::debug_counter): New.
> 	(back_threader::maybe_register_path): Call debug_counter.
OK
jeff
  
Richard Biener Nov. 2, 2021, 1:27 p.m. UTC | #2
On Mon, Nov 1, 2021 at 2:03 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 11/1/2021 3:54 AM, Aldy Hernandez wrote:
> > Chasing down stage3 miscomparisons is never fun, and having no way to
> > distinguish between jump threads registered by a particular
> > pass, is even harder.  This patch adds debug counters for the individual
> > back threading passes.  I've left the ethread pass alone, as that one is
> > usually benign, but we could easily add it if needed.
> >
> > The fact that we can only pass one boolean argument to the passes
> > infrastructure has us do all sorts of gymnastics to differentiate
> > between the various back threading passes.
> >
> > Tested on x86-64 Linux.
> >
> > OK?
> >
> > gcc/ChangeLog:
> >
> >       * dbgcnt.def: Add debug counter for back_thread[12] and
> >       back_threadfull[12].
> >       * passes.def: Pass "first" argument to each back threading pass.
> >       * tree-ssa-threadbackward.c (back_threader::back_threader): Add
> >       first argument.
> >       (back_threader::debug_counter): New.
> >       (back_threader::maybe_register_path): Call debug_counter.
> OK

But it's ugly.  Very.  Why isn't a single debug-counter good enough?
You should be able to reduce to a single threading pass via
-fdisable-tree-xyz and then bisect with the debug counter.

Alternatively at least store the debug counter to query somewhere
so you can have the "hot" path query a single one.  So instead of

if (!dbg_cnt (back_thread1))

do

if (!dbg_cnt (curr_cnt))

and compute curr_cnt somewhere.

Richard.

> jeff
>
  
Aldy Hernandez Nov. 2, 2021, 1:35 p.m. UTC | #3
On Tue, Nov 2, 2021 at 2:27 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Nov 1, 2021 at 2:03 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 11/1/2021 3:54 AM, Aldy Hernandez wrote:
> > > Chasing down stage3 miscomparisons is never fun, and having no way to
> > > distinguish between jump threads registered by a particular
> > > pass, is even harder.  This patch adds debug counters for the individual
> > > back threading passes.  I've left the ethread pass alone, as that one is
> > > usually benign, but we could easily add it if needed.
> > >
> > > The fact that we can only pass one boolean argument to the passes
> > > infrastructure has us do all sorts of gymnastics to differentiate
> > > between the various back threading passes.
> > >
> > > Tested on x86-64 Linux.
> > >
> > > OK?
> > >
> > > gcc/ChangeLog:
> > >
> > >       * dbgcnt.def: Add debug counter for back_thread[12] and
> > >       back_threadfull[12].
> > >       * passes.def: Pass "first" argument to each back threading pass.
> > >       * tree-ssa-threadbackward.c (back_threader::back_threader): Add
> > >       first argument.
> > >       (back_threader::debug_counter): New.
> > >       (back_threader::maybe_register_path): Call debug_counter.
> > OK
>
> But it's ugly.  Very.  Why isn't a single debug-counter good enough?
> You should be able to reduce to a single threading pass via
> -fdisable-tree-xyz and then bisect with the debug counter.

Indeed.  I'm not a big fan either.

The -fdisable-tree-xyz approach is my first line of defense, but
sometimes the problem is a combination of threading passes working in
tandem.  For example, thread1 threads a path that causes a later
thread99 pass to find another path.  So I can't just have one counter.
We need to be able to bisect the thread1 path, and then, if there's
still a problem, bisect the thread99 pass.

I was fighting a bootstrap miscomparison bug when I could reduce the
problem to 2 threading passes, and then further to thread1's 123 path,
and thread2's 567 and 890 paths.  Not fun.

Aldy

>
> Alternatively at least store the debug counter to query somewhere
> so you can have the "hot" path query a single one.  So instead of
>
> if (!dbg_cnt (back_thread1))
>
> do
>
> if (!dbg_cnt (curr_cnt))
>
> and compute curr_cnt somewhere.
>
> Richard.
>
> > jeff
> >
>
  
Richard Biener Nov. 2, 2021, 2:12 p.m. UTC | #4
On Tue, Nov 2, 2021 at 2:36 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 2:27 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Nov 1, 2021 at 2:03 PM Jeff Law via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > >
> > >
> > > On 11/1/2021 3:54 AM, Aldy Hernandez wrote:
> > > > Chasing down stage3 miscomparisons is never fun, and having no way to
> > > > distinguish between jump threads registered by a particular
> > > > pass, is even harder.  This patch adds debug counters for the individual
> > > > back threading passes.  I've left the ethread pass alone, as that one is
> > > > usually benign, but we could easily add it if needed.
> > > >
> > > > The fact that we can only pass one boolean argument to the passes
> > > > infrastructure has us do all sorts of gymnastics to differentiate
> > > > between the various back threading passes.
> > > >
> > > > Tested on x86-64 Linux.
> > > >
> > > > OK?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >       * dbgcnt.def: Add debug counter for back_thread[12] and
> > > >       back_threadfull[12].
> > > >       * passes.def: Pass "first" argument to each back threading pass.
> > > >       * tree-ssa-threadbackward.c (back_threader::back_threader): Add
> > > >       first argument.
> > > >       (back_threader::debug_counter): New.
> > > >       (back_threader::maybe_register_path): Call debug_counter.
> > > OK
> >
> > But it's ugly.  Very.  Why isn't a single debug-counter good enough?
> > You should be able to reduce to a single threading pass via
> > -fdisable-tree-xyz and then bisect with the debug counter.
>
> Indeed.  I'm not a big fan either.
>
> The -fdisable-tree-xyz approach is my first line of defense, but
> sometimes the problem is a combination of threading passes working in
> tandem.  For example, thread1 threads a path that causes a later
> thread99 pass to find another path.  So I can't just have one counter.
> We need to be able to bisect the thread1 path, and then, if there's
> still a problem, bisect the thread99 pass.
>
> I was fighting a bootstrap miscomparison bug when I could reduce the
> problem to 2 threading passes, and then further to thread1's 123 path,
> and thread2's 567 and 890 paths.  Not fun.

Btw, you can now do -fdbg-cnt=thread:5:892-11111 to continue
bisecting the number space of thread2 after fixing '5' for thread1.

And -fdbg-cnt-list will tell you the upper bound of the counters
(for the 11111).

But yes, not fun.  But really "bisecting" multiple counters at the same
time doesn't work better than "bisecting" a single counter into
multiple slices.

Richard.

> Aldy
>
> >
> > Alternatively at least store the debug counter to query somewhere
> > so you can have the "hot" path query a single one.  So instead of
> >
> > if (!dbg_cnt (back_thread1))
> >
> > do
> >
> > if (!dbg_cnt (curr_cnt))
> >
> > and compute curr_cnt somewhere.
> >
> > Richard.
> >
> > > jeff
> > >
> >
>
  
Aldy Hernandez Nov. 6, 2021, 3:53 p.m. UTC | #5
On Tue, Nov 2, 2021 at 3:13 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Nov 2, 2021 at 2:36 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 2:27 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Nov 1, 2021 at 2:03 PM Jeff Law via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > >
> > > >
> > > > On 11/1/2021 3:54 AM, Aldy Hernandez wrote:
> > > > > Chasing down stage3 miscomparisons is never fun, and having no way to
> > > > > distinguish between jump threads registered by a particular
> > > > > pass, is even harder.  This patch adds debug counters for the individual
> > > > > back threading passes.  I've left the ethread pass alone, as that one is
> > > > > usually benign, but we could easily add it if needed.
> > > > >
> > > > > The fact that we can only pass one boolean argument to the passes
> > > > > infrastructure has us do all sorts of gymnastics to differentiate
> > > > > between the various back threading passes.
> > > > >
> > > > > Tested on x86-64 Linux.
> > > > >
> > > > > OK?
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >       * dbgcnt.def: Add debug counter for back_thread[12] and
> > > > >       back_threadfull[12].
> > > > >       * passes.def: Pass "first" argument to each back threading pass.
> > > > >       * tree-ssa-threadbackward.c (back_threader::back_threader): Add
> > > > >       first argument.
> > > > >       (back_threader::debug_counter): New.
> > > > >       (back_threader::maybe_register_path): Call debug_counter.
> > > > OK
> > >
> > > But it's ugly.  Very.  Why isn't a single debug-counter good enough?
> > > You should be able to reduce to a single threading pass via
> > > -fdisable-tree-xyz and then bisect with the debug counter.
> >
> > Indeed.  I'm not a big fan either.
> >
> > The -fdisable-tree-xyz approach is my first line of defense, but
> > sometimes the problem is a combination of threading passes working in
> > tandem.  For example, thread1 threads a path that causes a later
> > thread99 pass to find another path.  So I can't just have one counter.
> > We need to be able to bisect the thread1 path, and then, if there's
> > still a problem, bisect the thread99 pass.
> >
> > I was fighting a bootstrap miscomparison bug when I could reduce the
> > problem to 2 threading passes, and then further to thread1's 123 path,
> > and thread2's 567 and 890 paths.  Not fun.
>
> Btw, you can now do -fdbg-cnt=thread:5:892-11111 to continue
> bisecting the number space of thread2 after fixing '5' for thread1.

I think you meant -fdbg-cnt=thread:5-5:892-11111, since just the 5 is
1-5 AFAICT.

>
> And -fdbg-cnt-list will tell you the upper bound of the counters
> (for the 11111).
>
> But yes, not fun.  But really "bisecting" multiple counters at the same
> time doesn't work better than "bisecting" a single counter into
> multiple slices.

I find it easier to bisect using multiple counters, but if the general
consensus is that bisecting using a single counter and multiple slices
is easier, I'll switch it back.  However, I'd like to get some
feedback from the bugzilla gurus here, as they're doing most of the
heavy lifting here.

Martin, pinskia, do you have opinions here?

Aldy
  
Martin Liška Nov. 8, 2021, 12:22 p.m. UTC | #6
n 11/6/21 16:53, Aldy Hernandez wrote:
> Martin, pinskia, do you have opinions here?

Hello.

I do prefer having the 4 counters as introduced:

DEBUG_COUNTER (back_thread1)
DEBUG_COUNTER (back_thread2)
DEBUG_COUNTER (back_threadfull1)
DEBUG_COUNTER (back_threadfull2)

Cheers,
Martin
  

Patch

diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index c2bcc4eef5e..3a85665a1d7 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -144,6 +144,10 @@  echo ubound: $ub
    Please keep the list sorted in alphabetic order.  */
 DEBUG_COUNTER (asan_use_after_scope)
 DEBUG_COUNTER (auto_inc_dec)
+DEBUG_COUNTER (back_thread1)
+DEBUG_COUNTER (back_thread2)
+DEBUG_COUNTER (back_threadfull1)
+DEBUG_COUNTER (back_threadfull2)
 DEBUG_COUNTER (ccp)
 DEBUG_COUNTER (cfg_cleanup)
 DEBUG_COUNTER (cprop)
diff --git a/gcc/passes.def b/gcc/passes.def
index 29921f80ed9..0f541454e7f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -81,7 +81,7 @@  along with GCC; see the file COPYING3.  If not see
 	  /* After CCP we rewrite no longer addressed locals into SSA
 	     form if possible.  */
 	  NEXT_PASS (pass_forwprop);
-          NEXT_PASS (pass_early_thread_jumps);
+          NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
 	  NEXT_PASS (pass_sra_early);
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
@@ -210,7 +210,7 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_return_slot);
       NEXT_PASS (pass_fre, true /* may_iterate */);
       NEXT_PASS (pass_merge_phi);
-      NEXT_PASS (pass_thread_jumps_full);
+      NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
       NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_dce);
@@ -233,7 +233,7 @@  along with GCC; see the file COPYING3.  If not see
 	 propagations have already run, but before some more dead code
 	 is removed, and this place fits nicely.  Remember this when
 	 trying to move or duplicate pass_dominator somewhere earlier.  */
-      NEXT_PASS (pass_thread_jumps);
+      NEXT_PASS (pass_thread_jumps, /*first=*/true);
       NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
       /* Threading can leave many const/copy propagations in the IL.
 	 Clean them up.  Failure to do so well can lead to false
@@ -332,10 +332,10 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_fre, false /* may_iterate */);
       /* After late FRE we rewrite no longer addressed locals into SSA
          form if possible.  */
-      NEXT_PASS (pass_thread_jumps);
+      NEXT_PASS (pass_thread_jumps, /*first=*/false);
       NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
       NEXT_PASS (pass_strlen);
-      NEXT_PASS (pass_thread_jumps_full);
+      NEXT_PASS (pass_thread_jumps_full, /*first=*/false);
       NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
       /* Run CCP to compute alignment and nonzero bits.  */
       NEXT_PASS (pass_ccp, true /* nonzero_p */);
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index c66e74d159a..8e4a59744c5 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -44,6 +44,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-pretty-print.h"
 #include "cfghooks.h"
+#include "dbgcnt.h"
 
 // Path registry for the backwards threader.  After all paths have been
 // registered with register_path(), thread_through_all_blocks() is called
@@ -80,12 +81,13 @@  private:
 class back_threader
 {
 public:
-  back_threader (function *fun, unsigned flags);
+  back_threader (function *fun, unsigned flags, bool first);
   ~back_threader ();
   unsigned thread_blocks ();
 private:
   void maybe_thread_block (basic_block bb);
   void find_paths (basic_block bb, tree name);
+  bool debug_counter ();
   edge maybe_register_path ();
   bool find_paths_to_names (basic_block bb, bitmap imports);
   bool resolve_def (tree name, bitmap interesting, vec<tree> &worklist);
@@ -120,14 +122,19 @@  private:
   // VARYING.  Setting to true is more precise but slower.
   function *m_fun;
   unsigned m_flags;
+  // Set to TRUE for the first of each thread[12] pass or the first of
+  // each threadfull[12] pass.  This is used to differentiate between
+  // the different threading passes so we can set up debug counters.
+  bool m_first;
 };
 
 // Used to differentiate unreachable edges, so we may stop the search
 // in a the given direction.
 const edge back_threader::UNREACHABLE_EDGE = (edge) -1;
 
-back_threader::back_threader (function *fun, unsigned flags)
-  : m_profit (flags & BT_SPEED)
+back_threader::back_threader (function *fun, unsigned flags, bool first)
+  : m_profit (flags & BT_SPEED),
+    m_first (first)
 {
   if (flags & BT_SPEED)
     loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES);
@@ -149,6 +156,37 @@  back_threader::~back_threader ()
   loop_optimizer_finalize ();
 }
 
+// A wrapper for the various debug counters for the threading passes.
+// Returns TRUE if it's OK to register the current threading
+// candidate.
+
+bool
+back_threader::debug_counter ()
+{
+  // We don't have any counters for the ethread pass as that
+  // one is usually harmless ;-).
+  if ((m_flags & BT_SPEED) == 0)
+    return true;
+
+  if (m_flags & BT_RESOLVE)
+    {
+      if (m_first && !dbg_cnt (back_threadfull1))
+	return false;
+
+      if (!m_first && !dbg_cnt (back_threadfull2))
+	return false;
+    }
+  else
+    {
+      if (m_first && !dbg_cnt (back_thread1))
+	return false;
+
+      if (!m_first && !dbg_cnt (back_thread2))
+	return false;
+    }
+  return true;
+}
+
 // Register the current path for jump threading if it's profitable to
 // do so.
 //
@@ -172,6 +210,9 @@  back_threader::maybe_register_path ()
 
       if (profitable)
 	{
+	  if (!debug_counter ())
+	    return NULL;
+
 	  m_registry.register_path (m_path, taken_edge);
 
 	  if (irreducible)
@@ -974,15 +1015,21 @@  public:
   {
     return new pass_early_thread_jumps (m_ctxt);
   }
+  void set_pass_param (unsigned int, bool param) override
+  {
+    m_first = param;
+  }
   bool gate (function *) override
   {
     return flag_thread_jumps;
   }
   unsigned int execute (function *fun) override
   {
-    back_threader threader (fun, BT_NONE);
+    back_threader threader (fun, BT_NONE, m_first);
     return threader.thread_blocks ();
   }
+private:
+  bool m_first;
 };
 
 // Jump threading pass without resolving of unknown SSAs.
@@ -996,15 +1043,21 @@  public:
   {
     return new pass_thread_jumps (m_ctxt);
   }
+  void set_pass_param (unsigned int, bool param) override
+  {
+    m_first = param;
+  }
   bool gate (function *) override
   {
     return flag_thread_jumps && flag_expensive_optimizations;
   }
   unsigned int execute (function *fun) override
   {
-    back_threader threader (fun, BT_SPEED);
+    back_threader threader (fun, BT_SPEED, m_first);
     return threader.thread_blocks ();
   }
+private:
+  bool m_first;
 };
 
 // Jump threading pass that fully resolves unknown SSAs.
@@ -1018,15 +1071,21 @@  public:
   {
     return new pass_thread_jumps_full (m_ctxt);
   }
+  void set_pass_param (unsigned int, bool param) override
+  {
+    m_first = param;
+  }
   bool gate (function *) override
   {
     return flag_thread_jumps && flag_expensive_optimizations;
   }
   unsigned int execute (function *fun) override
   {
-    back_threader threader (fun, BT_SPEED | BT_RESOLVE);
+    back_threader threader (fun, BT_SPEED | BT_RESOLVE, m_first);
     return threader.thread_blocks ();
   }
+private:
+  bool m_first;
 };
 
 } // namespace {