diff mbox series

Try to resolve paths in threader without looking further back.

Message ID 20211020102816.656714-1-aldyh@redhat.com
State New
Headers show
Series Try to resolve paths in threader without looking further back. | expand

Commit Message

Aldy Hernandez Oct. 20, 2021, 10:28 a.m. UTC
Sometimes we can solve a candidate path without having to recurse
further back.  This can mostly happen in fully resolving mode, because
we can ask the ranger what the range on entry to the path is, but
there's no reason this can't always apply.  This one-liner removes
the fully-resolving restriction.

I'm tickled pink to see how many things we now get quite early
in the compilation.  I actually had to disable jump threading entirely
for a few tests because the early threader was catching things
disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
cleanups happening.

I was going to commit this as obvious, but I think the test changes
merit discussion.

We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
some time.  Every time a threading pass gets smarter, we push the
check further down the pipeline.  We've officially run out of dumb
threading passes to disable ;-).  In the last year we've gone up from a
handful of threads, to 34 threads with the current combination of
options.  I doubt this is testing anything useful any more, so I've
removed it.

Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
jump threads, but they were disallowed because of loop rotation.  Then
we started catching more jump threads in VRP2 threading so we tested
there.  With this patch though, we triple the number of threads found
from 11 to 31.  I believe this test has outlived its usefulness, and
I've removed it.  Note that even though we have these outrageous
possibilities for this test, the block copier ultimately chops them
down (23 survive though).

Likewise for ssa-dom-thread-7.c.  The number of threads in this test has
been growing consistently over the years.  There's no way to test
what is possible, especially because improvements in one threader open
up possibilities for another.  With this patch we're up to 41 registered
jump threads and they're spread over 4 passes.  There's no way to get the
amount right, and this test has become a source of useless busywork.

All in all, I believe the simpler jump threading tests, as well as the
gimple FE tests I've added, more than adequately cover us.

Tested on x86-64 Linux.

OK for trunk?

p.s. As usual, some warning pass gets thrown off.  Martin, I've XFAILed
it.

gcc/ChangeLog:

	* tree-ssa-threadbackward.c (back_threader::find_paths_to_names):
	Always try to resolve path without looking back.

gcc/testsuite/ChangeLog:

	* gcc.dg/graphite/scop-dsyr2k-2.c: Adjust for jump threading changes.
	* gcc.dg/graphite/scop-dsyr2k.c: Same.
	* gcc.dg/graphite/scop-dsyrk-2.c: Same.
	* gcc.dg/graphite/scop-dsyrk.c: Same.
	* gcc.dg/tree-ssa/pr20701.c: Same.
	* gcc.dg/tree-ssa/pr20702.c: Same.
	* gcc.dg/tree-ssa/pr21086.c: Same.
	* gcc.dg/tree-ssa/pr25382.c: Same.
	* gcc.dg/tree-ssa/pr58480.c: Same.
	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
	* gcc.dg/tree-ssa/vrp08.c: Same.
	* gcc.dg/tree-ssa/vrp55.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Removed.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Removed.
	* gcc.dg/tree-ssa/ssa-thread-11.c: Removed.
	* gcc.dg/uninit-pr89230-1.c: xfail.
---
 gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c |   1 +
 gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c   |   1 +
 gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c  |   1 +
 gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c    |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr20701.c       |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr20702.c       |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21086.c       |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr25382.c       |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr58480.c       |   2 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-4.c        |  60 --------
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        | 134 ------------------
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c |  50 -------
 .../gcc.dg/tree-ssa/ssa-vrp-thread-1.c        |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp08.c         |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp55.c         |   6 +-
 gcc/testsuite/gcc.dg/uninit-pr89230-1.c       |   3 +-
 gcc/tree-ssa-threadbackward.c                 |   4 +-
 17 files changed, 19 insertions(+), 258 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c

Comments

Martin Sebor Oct. 20, 2021, 2:35 p.m. UTC | #1
On 10/20/21 4:28 AM, Aldy Hernandez via Gcc-patches wrote:
> Sometimes we can solve a candidate path without having to recurse
> further back.  This can mostly happen in fully resolving mode, because
> we can ask the ranger what the range on entry to the path is, but
> there's no reason this can't always apply.  This one-liner removes
> the fully-resolving restriction.
> 
> I'm tickled pink to see how many things we now get quite early
> in the compilation.  I actually had to disable jump threading entirely
> for a few tests because the early threader was catching things
> disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
> cleanups happening.
> 
> I was going to commit this as obvious, but I think the test changes
> merit discussion.
> 
> We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
> some time.  Every time a threading pass gets smarter, we push the
> check further down the pipeline.  We've officially run out of dumb
> threading passes to disable ;-).  In the last year we've gone up from a
> handful of threads, to 34 threads with the current combination of
> options.  I doubt this is testing anything useful any more, so I've
> removed it.
> 
> Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
> jump threads, but they were disallowed because of loop rotation.  Then
> we started catching more jump threads in VRP2 threading so we tested
> there.  With this patch though, we triple the number of threads found
> from 11 to 31.  I believe this test has outlived its usefulness, and
> I've removed it.  Note that even though we have these outrageous
> possibilities for this test, the block copier ultimately chops them
> down (23 survive though).
> 
> Likewise for ssa-dom-thread-7.c.  The number of threads in this test has
> been growing consistently over the years.  There's no way to test
> what is possible, especially because improvements in one threader open
> up possibilities for another.  With this patch we're up to 41 registered
> jump threads and they're spread over 4 passes.  There's no way to get the
> amount right, and this test has become a source of useless busywork.
> 
> All in all, I believe the simpler jump threading tests, as well as the
> gimple FE tests I've added, more than adequately cover us.
> 
> Tested on x86-64 Linux.
> 
> OK for trunk?
> 
> p.s. As usual, some warning pass gets thrown off.  Martin, I've XFAILed
> it.

I appreciate the heads up.  I'm happy that the threader has
improved.  I'm obviously not pleased that it has led to regressions
in warnings but I understand that in some cases they might be due
to limitations in the warning code.  I think the test case you have
xfailed might be one such example.  The uninitialized warnings are
exquisitely sensitive to these types of changes.  If/when this patch
is applied please reopen PR 89230 and reference this commit.

Having said that, to maintain the quality of diagnostics,
the work that goes into these nice optimizer improvements needs
to be balanced by an effort to either update the warning code
to cope with the IL changes, or the optimizers need to take care
to avoid exposing undefined code that the warnings are designed
to detect.  I'm concerned not just that the quality of GCC 12
diagnostics has been eroding, but also that it seems to be not
just acceptable but expected.

Martin

> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-threadbackward.c (back_threader::find_paths_to_names):
> 	Always try to resolve path without looking back.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/graphite/scop-dsyr2k-2.c: Adjust for jump threading changes.
> 	* gcc.dg/graphite/scop-dsyr2k.c: Same.
> 	* gcc.dg/graphite/scop-dsyrk-2.c: Same.
> 	* gcc.dg/graphite/scop-dsyrk.c: Same.
> 	* gcc.dg/tree-ssa/pr20701.c: Same.
> 	* gcc.dg/tree-ssa/pr20702.c: Same.
> 	* gcc.dg/tree-ssa/pr21086.c: Same.
> 	* gcc.dg/tree-ssa/pr25382.c: Same.
> 	* gcc.dg/tree-ssa/pr58480.c: Same.
> 	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
> 	* gcc.dg/tree-ssa/vrp08.c: Same.
> 	* gcc.dg/tree-ssa/vrp55.c: Same.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Removed.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Removed.
> 	* gcc.dg/tree-ssa/ssa-thread-11.c: Removed.
> 	* gcc.dg/uninit-pr89230-1.c: xfail.
> ---
>   gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c |   1 +
>   gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c   |   1 +
>   gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c  |   1 +
>   gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c    |   1 +
>   gcc/testsuite/gcc.dg/tree-ssa/pr20701.c       |   2 +-
>   gcc/testsuite/gcc.dg/tree-ssa/pr20702.c       |   2 +-
>   gcc/testsuite/gcc.dg/tree-ssa/pr21086.c       |   2 +-
>   gcc/testsuite/gcc.dg/tree-ssa/pr25382.c       |   2 +-
>   gcc/testsuite/gcc.dg/tree-ssa/pr58480.c       |   2 +-
>   .../gcc.dg/tree-ssa/ssa-dom-thread-4.c        |  60 --------
>   .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        | 134 ------------------
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c |  50 -------
>   .../gcc.dg/tree-ssa/ssa-vrp-thread-1.c        |   4 +-
>   gcc/testsuite/gcc.dg/tree-ssa/vrp08.c         |   2 +-
>   gcc/testsuite/gcc.dg/tree-ssa/vrp55.c         |   6 +-
>   gcc/testsuite/gcc.dg/uninit-pr89230-1.c       |   3 +-
>   gcc/tree-ssa-threadbackward.c                 |   4 +-
>   17 files changed, 19 insertions(+), 258 deletions(-)
>   delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
>   delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
>   delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c
> 
> diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c
> index 06aa19a8577..42e23fc157e 100644
> --- a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c
> +++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c
> @@ -1,4 +1,5 @@
>   /* { dg-require-effective-target size32plus } */
> +/* { dg-additional-options "-fno-thread-jumps" } */
>   #define NMAX 3000
>   
>   static double a[NMAX][NMAX], b[NMAX][NMAX], c[NMAX][NMAX];
> diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
> index 41c91b97b57..feb99358ac7 100644
> --- a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
> +++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
> @@ -1,4 +1,5 @@
>   /* { dg-require-effective-target size32plus } */
> +/* { dg-additional-options "-fno-thread-jumps" } */
>   #define NMAX 3000
>   
>   static double a[NMAX][NMAX], b[NMAX][NMAX], c[NMAX][NMAX];
> diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c
> index 5622dce4798..935ade3fb6e 100644
> --- a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c
> +++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c
> @@ -1,4 +1,5 @@
>   /* { dg-require-effective-target size32plus } */
> +/* { dg-additional-options "-fno-thread-jumps" } */
>   #define NMAX 3000
>   #define MEASURE_TIME 1
>   
> diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
> index e01a517be11..5c65e406589 100644
> --- a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
> +++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
> @@ -1,4 +1,5 @@
>   /* { dg-require-effective-target size32plus } */
> +/* { dg-additional-options "-fno-thread-jumps" } */
>   #define NMAX 3000
>   #define MEASURE_TIME 1
>   
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr20701.c b/gcc/testsuite/gcc.dg/tree-ssa/pr20701.c
> index 2f914589e32..496c4256733 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr20701.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr20701.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1 -fno-early-inlining -fdelete-null-pointer-checks" } */
> +/* { dg-options "-O2 -fdump-tree-vrp1 -fno-early-inlining -fdelete-null-pointer-checks -fdisable-tree-thread1" } */
>   
>   typedef struct {
>     int code;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr20702.c b/gcc/testsuite/gcc.dg/tree-ssa/pr20702.c
> index c896857748c..81129674d8a 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr20702.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr20702.c
> @@ -4,7 +4,7 @@
>      immediate successors of the basic block.  */
>   
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fno-tree-dominator-opts -fdisable-tree-evrp -fdump-tree-vrp1-details -fdelete-null-pointer-checks" } */
> +/* { dg-options "-O2 -fno-thread-jumps -fdisable-tree-evrp -fdump-tree-vrp1-details -fdelete-null-pointer-checks" } */
>   
>   extern void bar (int);
>   
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21086.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21086.c
> index aadd53e2237..9b93d39d4e4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr21086.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21086.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdump-tree-dce2 -fdelete-null-pointer-checks" } */
> +/* { dg-options "-O2 -fno-thread-jumps -fdisable-tree-evrp -fdump-tree-vrp1 -fdump-tree-dce2 -fdelete-null-pointer-checks" } */
>   
>   int
>   foo (int *p)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr25382.c b/gcc/testsuite/gcc.dg/tree-ssa/pr25382.c
> index d74765551c2..8634c0a7895 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr25382.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr25382.c
> @@ -3,7 +3,7 @@
>      Check that VRP now gets ranges from BIT_AND_EXPRs.  */
>   
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fno-tree-ccp -fdisable-tree-evrp -fdump-tree-vrp1" } */
> +/* { dg-options "-O2 -fno-thread-jumps -fno-tree-ccp -fdisable-tree-evrp -fdump-tree-vrp1" } */
>   
>   int
>   foo (int a)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr58480.c b/gcc/testsuite/gcc.dg/tree-ssa/pr58480.c
> index 42898e72d4e..f11623b7c6b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr58480.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr58480.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile { target { ! keeps_null_pointer_checks } } } */
> -/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdelete-null-pointer-checks" } */
> +/* { dg-options "-O2 -fno-thread-jumps -fdisable-tree-evrp -fdump-tree-vrp1 -fdelete-null-pointer-checks" } */
>   
>   extern void eliminate (void);
>   extern void* f1 (void *a, void *b) __attribute__((nonnull));
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
> deleted file mode 100644
> index 9cd463571c4..00000000000
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
> +++ /dev/null
> @@ -1,60 +0,0 @@
> -/* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp-thread2-details -fdump-tree-dom2-details -std=gnu89 --param logical-op-non-short-circuit=1" } */
> -struct bitmap_head_def;
> -typedef struct bitmap_head_def *bitmap;
> -typedef const struct bitmap_head_def *const_bitmap;
> -typedef unsigned long BITMAP_WORD;
> -typedef struct bitmap_element_def
> -{
> -  struct bitmap_element_def *next;
> -  unsigned int indx;
> -} bitmap_element;
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -unsigned char
> -bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
> -		      const_bitmap kill)
> -{
> -  unsigned char changed = 0;
> -
> -  bitmap_element *dst_elt;
> -  const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;
> -
> -  while (a_elt || b_elt)
> -    {
> -      unsigned char new_element = 0;
> -
> -      if (b_elt)
> -	while (kill_elt && kill_elt->indx < b_elt->indx)
> -	  kill_elt = kill_elt->next;
> -
> -      if (b_elt && kill_elt && kill_elt->indx == b_elt->indx
> -	  && (!a_elt || a_elt->indx >= b_elt->indx))
> -	{
> -	  bitmap_element tmp_elt;
> -	  unsigned ix;
> -
> -	  BITMAP_WORD ior = 0;
> -
> -	      changed = bitmap_elt_ior (dst, dst_elt, dst_prev,
> -					a_elt, &tmp_elt, changed);
> -
> -	}
> -
> -    }
> -
> -
> -  return changed;
> -}
> -/* We used to catch 3 jump threads in vrp-thread1, but they all
> -   rotated the loop, so they were disallowed.  This in turn created
> -   other opportunities for the other threaders which result in the the
> -   post-loop threader (vrp-thread2) catching more.  */
> -/* { dg-final { scan-tree-dump-times "Registering jump thread" 5 "vrp-thread2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> deleted file mode 100644
> index 1da00a691c8..00000000000
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> +++ /dev/null
> @@ -1,134 +0,0 @@
> -/* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
> -
> -/* { dg-final { scan-tree-dump "Jumps threaded: 12"  "thread3" } } */
> -/* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
> -
> -/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
> -   to change decisions in switch expansion which in turn can expose new
> -   jump threading opportunities.  Skip the later tests on aarch64.  */
> -/* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" { target { ! aarch64*-*-* } } } } */
> -/* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp-thread2" { target { ! aarch64*-*-* } } } } */
> -
> -enum STATE {
> -  S0=0,
> -  SI,
> -  S1,
> -  S2,
> -  S3,
> -  S4,
> -  S5,
> -  S6
> -};
> -
> -int bar (enum STATE s);
> -
> -enum STATE foo (unsigned char **y, unsigned *c)
> -{
> -  unsigned char *x = *y;
> -  unsigned char n;
> -  enum STATE s = S0;
> -
> -  for( ; *x && s != SI; x++ )
> -    {
> -      n = *x;
> -      if (n == 'x')
> -	{
> -	  x++;
> -	  break;
> -	}
> -      switch(s)
> -	{
> -	case S0:
> -	  if(bar(n))
> -	    s = S3;
> -	  else if( n == 'a' || n == 'b' )
> -	    s = S1;
> -	  else if( n == 'c' )
> -	    s = S4;
> -	  else
> -	    {
> -	      s = SI;
> -	      c[SI]++;
> -	    }
> -	  c[S0]++;
> -	  break;
> -	case S1:
> -	  if(bar(n))
> -	    {
> -	      s = S3;
> -	      c[S1]++;
> -	    }
> -	  else if( n == 'c' )
> -	    {
> -	      s = S4;
> -	      c[S1]++;
> -	    }
> -	  else
> -	    {
> -	      s = SI;
> -	      c[S1]++;
> -	    }
> -	  break;
> -	case S3:
> -	  if( n == 'c' )
> -	    {
> -	      s = S4;
> -	      c[S3]++;
> -	    }
> -	  else if(!bar(n))
> -	    {
> -	      s = SI;
> -	      c[S3]++;
> -	    }
> -	  break;
> -	case S4:
> -	  if( n == 'E' || n == 'e' )
> -	    {
> -	      s = S2;
> -	      c[S4]++;
> -	    }
> -	  else if(!bar(n))
> -	    {
> -	      s = SI;
> -	      c[S4]++;
> -	    }
> -	  break;
> -	case S2:
> -	  if( n == 'a' || n == 'b' )
> -	    {
> -	      s = S5;
> -	      c[S2]++;
> -	    }
> -	  else
> -	    {
> -	      s = SI;
> -	      c[S2]++;
> -	    }
> -	  break;
> -	case S5:
> -	  if(bar(n))
> -	    {
> -	      s = S6;
> -	      c[S5]++;
> -	    }
> -	  else
> -	    {
> -	      s = SI;
> -	      c[S5]++;
> -	    }
> -	  break;
> -	case S6:
> -	  if(!bar(n))
> -	    {
> -	      s = SI;
> -	      c[SI]++;
> -	    }
> -	  break;
> -	default:
> -	  break;
> -	}
> -    }
> -  *y=x;
> -  return s;
> -}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c
> deleted file mode 100644
> index 672a54e07db..00000000000
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c
> +++ /dev/null
> @@ -1,50 +0,0 @@
> -/* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp2-details --param logical-op-non-short-circuit=1" } */
> -/* { dg-additional-options "-fdisable-tree-ethread -fdisable-tree-thread1 -fdisable-tree-thread2" } */
> -/* { dg-final { scan-tree-dump-not "IRREDUCIBLE_LOOP" "vrp2" } } */
> -
> -void abort (void);
> -typedef struct bitmap_head_def *bitmap;
> -typedef const struct bitmap_head_def *const_bitmap;
> -typedef struct bitmap_obstack
> -{
> -  struct bitmap_obstack *next;
> -  unsigned int indx;
> -}
> -bitmap_element;
> -typedef struct bitmap_head_def
> -{
> -  bitmap_element *first;
> -}
> -bitmap_head;
> -static __inline__ unsigned char
> -bitmap_elt_ior (bitmap dst, bitmap_element * dst_elt,
> -		bitmap_element * dst_prev, const bitmap_element * a_elt,
> -		const bitmap_element * b_elt)
> -{
> -  ((void) (!(a_elt || b_elt) ? abort (), 0 : 0));
> -}
> -
> -unsigned char
> -bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
> -		      const_bitmap kill)
> -{
> -  bitmap_element *dst_elt = dst->first;
> -  const bitmap_element *a_elt = a->first;
> -  const bitmap_element *b_elt = b->first;
> -  const bitmap_element *kill_elt = kill->first;
> -  bitmap_element *dst_prev = ((void *) 0);
> -  while (a_elt || b_elt)
> -    {
> -      if (b_elt && kill_elt && kill_elt->indx == b_elt->indx
> -	  && (!a_elt || a_elt->indx >= b_elt->indx));
> -      else
> -	{
> -	  bitmap_elt_ior (dst, dst_elt, dst_prev, a_elt, b_elt);
> -	  if (a_elt && b_elt && a_elt->indx == b_elt->indx)
> -	    ;
> -	  else if (a_elt && (!b_elt || a_elt->indx <= b_elt->indx))
> -	    a_elt = a_elt->next;
> -	}
> -    }
> -}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
> index 86d07ef9bdb..f3ca140bd26 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -fdelete-null-pointer-checks" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details -fdelete-null-pointer-checks" } */
>   /* { dg-skip-if "" keeps_null_pointer_checks } */
>   
>   void oof (void);
> @@ -29,5 +29,5 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
>   
>   /* ARM Cortex-M defined LOGICAL_OP_NON_SHORT_CIRCUIT to false,
>      so skip below test.  */
> -/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp-thread1" { target { ! arm_cortex_m } } } } */
> +/* { dg-final { scan-tree-dump-times "Registering jump thread" 1 "thread1" { target { ! arm_cortex_m } } } } */
>   
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp08.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp08.c
> index c2da30b4b68..2c6742b76c7 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp08.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp08.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details -fdelete-null-pointer-checks" } */
> +/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details -fdisable-tree-thread1 -fdelete-null-pointer-checks" } */
>   
>   /* Compile with -fno-tree-fre -O2 to prevent CSEing *p.  */
>   int
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
> index a478a6981ac..0ef57d935e4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp-thread1-blocks-vops-details -fdelete-null-pointer-checks" } */
> +/* { dg-options "-O2 -fdump-tree-ethread-details -fdelete-null-pointer-checks" } */
>   
>   void arf (void);
>   
> @@ -12,6 +12,6 @@ fu (char *p, int x)
>       arf ();
>   }
>   
> -/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp-thread1" { target { ! keeps_null_pointer_checks } } } } */
> -/* { dg-final { scan-tree-dump-times "Threaded jump" 0 "vrp-thread1" { target {   keeps_null_pointer_checks } } } } */
> +/* { dg-final { scan-tree-dump-times "Registering jump thread" 1 "ethread" { target { ! keeps_null_pointer_checks } } } } */
> +/* { dg-final { scan-tree-dump-times "Registering jump thread" 0 "ethread" { target {   keeps_null_pointer_checks } } } } */
>   
> diff --git a/gcc/testsuite/gcc.dg/uninit-pr89230-1.c b/gcc/testsuite/gcc.dg/uninit-pr89230-1.c
> index 1c07c4f6d78..dfc87a5b1a0 100644
> --- a/gcc/testsuite/gcc.dg/uninit-pr89230-1.c
> +++ b/gcc/testsuite/gcc.dg/uninit-pr89230-1.c
> @@ -8,7 +8,8 @@ struct S { int i, j; };
>   
>   int g (void)
>   {
> -  struct S *p = f (), *q;
> +  struct S *p = f ();
> +  struct S *q; // { dg-bogus "may be used uninitialized" "uninitialized" { xfail *-*-* } }
>   
>     if (p->i || !(q = f ()) || p->j != q->i)
>      {
> diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
> index d94e3b962db..edb396b3d6f 100644
> --- a/gcc/tree-ssa-threadbackward.c
> +++ b/gcc/tree-ssa-threadbackward.c
> @@ -374,8 +374,8 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting)
>         return false;
>       }
>   
> -  // Try to resolve the path with nothing but ranger knowledge.
> -  if (m_resolve && m_path.length () > 1 && maybe_register_path ())
> +  // Try to resolve the path without looking back.
> +  if (m_path.length () > 1 && maybe_register_path ())
>       {
>         m_path.pop ();
>         m_visited_bbs.remove (bb);
>
Aldy Hernandez Oct. 20, 2021, 3:15 p.m. UTC | #2
On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor <msebor@gmail.com> wrote:

> I appreciate the heads up.  I'm happy that the threader has
> improved.  I'm obviously not pleased that it has led to regressions
> in warnings but I understand that in some cases they might be due
> to limitations in the warning code.  I think the test case you have
> xfailed might be one such example.  The uninitialized warnings are
> exquisitely sensitive to these types of changes.  If/when this patch
> is applied please reopen PR 89230 and reference this commit.
>
> Having said that, to maintain the quality of diagnostics,
> the work that goes into these nice optimizer improvements needs
> to be balanced by an effort to either update the warning code
> to cope with the IL changes, or the optimizers need to take care
> to avoid exposing undefined code that the warnings are designed
> to detect.  I'm concerned not just that the quality of GCC 12
> diagnostics has been eroding, but also that it seems to be not
> just acceptable but expected.

You make a very good point.  It is certainly not my intention to make
life difficult for the warning maintainers, but I'm afraid I don't
have sufficient knowledge in the area to improve them.

There may be some low hanging fruit though.  At least in the warnings
that use the ranger, there's no reason to run these passes so late in
the pipeline.  You could run the warning code as early as you want,
insofar as SSA is available and the CFG has been built.  Heck, you may
even be able to run at -O0, though we may need some sort of value
numbering.  I believe Richi even suggested this a while back.

Another thing you could do is rewrite your passes to use actual
ranges, not the pair of ranges that the sprintf code uses (for
example).  We've put a lot of work into making infinite ranges work.
There's no reason to keep using value_ranges, anti ranges, and all
that.  I've mentioned this many times in the past 2+ years.  I even
tried my hand at doing the conversion myself, but I'm afraid I don't
understand the intricacies of the code very well.  I've done all I can
in this area, including porting all the evrp consumers to the ranger,
and recently providing patches for converting the strlen / sprintf
passes.

I'm sorry to rant here, but I've made lots of suggestions in the past
couple years, and they go unheeded.  I eventually get frustrated and
end up doing  a half assed conversion of what I can, myself.

Furthermore, I've mentioned in the past that the sprintf warnings (and
possibly the overflow ones)  depend on precise internal ranges that
evrp or the ranger are calculating.  The fact that they're hard to
read, make it less likely that others (well me, anyhow) will chime in
to help.  It's hard enough to understand the warnings, let alone the
code generating them.

Andrew has also suggested some things in this area.  ISTR there was
some overloading of the base functions you could do to represent
pointer offsets, though I can't remember the details.

Barring the above, at the top of our list for next year is full
support for pointers and floats.  The pointer bits should help, but
you must convert your passes for that to happen.  Pairs of integers
are far too fragile.

And finally, if none of that improves things sufficiently, Andrew and
I have batted around providing an infrastructure for predication that
can replace the uninit stuff which is showing its age, but alas that's
a ways away.

Aldy
Jeff Law Oct. 20, 2021, 8:01 p.m. UTC | #3
On 10/20/2021 9:15 AM, Aldy Hernandez wrote:
> On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor <msebor@gmail.com> wrote:
>
>> I appreciate the heads up.  I'm happy that the threader has
>> improved.  I'm obviously not pleased that it has led to regressions
>> in warnings but I understand that in some cases they might be due
>> to limitations in the warning code.  I think the test case you have
>> xfailed might be one such example.  The uninitialized warnings are
>> exquisitely sensitive to these types of changes.  If/when this patch
>> is applied please reopen PR 89230 and reference this commit.
>>
>> Having said that, to maintain the quality of diagnostics,
>> the work that goes into these nice optimizer improvements needs
>> to be balanced by an effort to either update the warning code
>> to cope with the IL changes, or the optimizers need to take care
>> to avoid exposing undefined code that the warnings are designed
>> to detect.  I'm concerned not just that the quality of GCC 12
>> diagnostics has been eroding, but also that it seems to be not
>> just acceptable but expected.
> You make a very good point.  It is certainly not my intention to make
> life difficult for the warning maintainers, but I'm afraid I don't
> have sufficient knowledge in the area to improve them.
>
> There may be some low hanging fruit though.  At least in the warnings
> that use the ranger, there's no reason to run these passes so late in
> the pipeline.  You could run the warning code as early as you want,
> insofar as SSA is available and the CFG has been built.  Heck, you may
> even be able to run at -O0, though we may need some sort of value
> numbering.  I believe Richi even suggested this a while back.
Running them later in the pipeline is to take advantage of the 
optimizers removing dead and unreachable code as much as possible. In 
fact, that's critical to -Wuninitialized.  Optimizing away unreachable 
paths  to avoid Wuninitialized false positives has been the major driver 
of jump threading improvements for the last 15 years.

jeff
Jeff Law Oct. 20, 2021, 8:19 p.m. UTC | #4
On 10/20/2021 4:28 AM, Aldy Hernandez wrote:
> Sometimes we can solve a candidate path without having to recurse
> further back.  This can mostly happen in fully resolving mode, because
> we can ask the ranger what the range on entry to the path is, but
> there's no reason this can't always apply.  This one-liner removes
> the fully-resolving restriction.
>
> I'm tickled pink to see how many things we now get quite early
> in the compilation.  I actually had to disable jump threading entirely
> for a few tests because the early threader was catching things
> disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
> cleanups happening.
>
> I was going to commit this as obvious, but I think the test changes
> merit discussion.
>
> We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
> some time.  Every time a threading pass gets smarter, we push the
> check further down the pipeline.  We've officially run out of dumb
> threading passes to disable ;-).  In the last year we've gone up from a
> handful of threads, to 34 threads with the current combination of
> options.  I doubt this is testing anything useful any more, so I've
> removed it.
>
> Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
> jump threads, but they were disallowed because of loop rotation.  Then
> we started catching more jump threads in VRP2 threading so we tested
> there.  With this patch though, we triple the number of threads found
> from 11 to 31.  I believe this test has outlived its usefulness, and
> I've removed it.  Note that even though we have these outrageous
> possibilities for this test, the block copier ultimately chops them
> down (23 survive though).
>
> Likewise for ssa-dom-thread-7.c.  The number of threads in this test has
> been growing consistently over the years.  There's no way to test
> what is possible, especially because improvements in one threader open
> up possibilities for another.  With this patch we're up to 41 registered
> jump threads and they're spread over 4 passes.  There's no way to get the
> amount right, and this test has become a source of useless busywork.
So we want to keep some form of ssa-dom-thread-7.  That' s the canonical 
testcase for the case for the FSM optimization.

What we need to verify is that we thread jumps across the backedge of 
the loop through the switch statement to a particular case (thus 
bypassing the indirect jump for the switch statement).  How to do that 
in a way that's easier to manage?  I have no clue.  I guess a gimple-fe 
based test might help.

Jeff
Aldy Hernandez Oct. 21, 2021, 7:17 a.m. UTC | #5
On Wed, Oct 20, 2021 at 10:01 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 10/20/2021 9:15 AM, Aldy Hernandez wrote:
> > On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor <msebor@gmail.com> wrote:
> >
> >> I appreciate the heads up.  I'm happy that the threader has
> >> improved.  I'm obviously not pleased that it has led to regressions
> >> in warnings but I understand that in some cases they might be due
> >> to limitations in the warning code.  I think the test case you have
> >> xfailed might be one such example.  The uninitialized warnings are
> >> exquisitely sensitive to these types of changes.  If/when this patch
> >> is applied please reopen PR 89230 and reference this commit.
> >>
> >> Having said that, to maintain the quality of diagnostics,
> >> the work that goes into these nice optimizer improvements needs
> >> to be balanced by an effort to either update the warning code
> >> to cope with the IL changes, or the optimizers need to take care
> >> to avoid exposing undefined code that the warnings are designed
> >> to detect.  I'm concerned not just that the quality of GCC 12
> >> diagnostics has been eroding, but also that it seems to be not
> >> just acceptable but expected.
> > You make a very good point.  It is certainly not my intention to make
> > life difficult for the warning maintainers, but I'm afraid I don't
> > have sufficient knowledge in the area to improve them.
> >
> > There may be some low hanging fruit though.  At least in the warnings
> > that use the ranger, there's no reason to run these passes so late in
> > the pipeline.  You could run the warning code as early as you want,
> > insofar as SSA is available and the CFG has been built.  Heck, you may
> > even be able to run at -O0, though we may need some sort of value
> > numbering.  I believe Richi even suggested this a while back.
> Running them later in the pipeline is to take advantage of the
> optimizers removing dead and unreachable code as much as possible. In
> fact, that's critical to -Wuninitialized.  Optimizing away unreachable
> paths  to avoid Wuninitialized false positives has been the major driver
> of jump threading improvements for the last 15 years.

Ughh, that's unfortunate.  We're gonna have to come up with
improvements to the Wuninitialized code, or a different paradigm
altogether.  I'm afraid this will only get worse.

It is a bit ironic that jump threading helps reduce Wuninitialized
false positives, but yet too much of it causes even more false
positives.

Aldy
Richard Biener Oct. 21, 2021, 7:22 a.m. UTC | #6
On Wed, Oct 20, 2021 at 10:02 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 10/20/2021 9:15 AM, Aldy Hernandez wrote:
> > On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor <msebor@gmail.com> wrote:
> >
> >> I appreciate the heads up.  I'm happy that the threader has
> >> improved.  I'm obviously not pleased that it has led to regressions
> >> in warnings but I understand that in some cases they might be due
> >> to limitations in the warning code.  I think the test case you have
> >> xfailed might be one such example.  The uninitialized warnings are
> >> exquisitely sensitive to these types of changes.  If/when this patch
> >> is applied please reopen PR 89230 and reference this commit.
> >>
> >> Having said that, to maintain the quality of diagnostics,
> >> the work that goes into these nice optimizer improvements needs
> >> to be balanced by an effort to either update the warning code
> >> to cope with the IL changes, or the optimizers need to take care
> >> to avoid exposing undefined code that the warnings are designed
> >> to detect.  I'm concerned not just that the quality of GCC 12
> >> diagnostics has been eroding, but also that it seems to be not
> >> just acceptable but expected.
> > You make a very good point.  It is certainly not my intention to make
> > life difficult for the warning maintainers, but I'm afraid I don't
> > have sufficient knowledge in the area to improve them.
> >
> > There may be some low hanging fruit though.  At least in the warnings
> > that use the ranger, there's no reason to run these passes so late in
> > the pipeline.  You could run the warning code as early as you want,
> > insofar as SSA is available and the CFG has been built.  Heck, you may
> > even be able to run at -O0, though we may need some sort of value
> > numbering.  I believe Richi even suggested this a while back.
> Running them later in the pipeline is to take advantage of the
> optimizers removing dead and unreachable code as much as possible. In
> fact, that's critical to -Wuninitialized.  Optimizing away unreachable
> paths  to avoid Wuninitialized false positives has been the major driver
> of jump threading improvements for the last 15 years.

Well, the good old idea is to simply queue the diagnostics and only emit
them if a stmt (with the location?) gets expanded.  Of course the details
on what exactly to key the warning on is tricky.

Richard.

> jeff
Aldy Hernandez Oct. 21, 2021, 10:15 a.m. UTC | #7
On Wed, Oct 20, 2021 at 10:19 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> So we want to keep some form of ssa-dom-thread-7.  That' s the canonical
> testcase for the case for the FSM optimization.
>
> What we need to verify is that we thread jumps across the backedge of
> the loop through the switch statement to a particular case (thus
> bypassing the indirect jump for the switch statement).  How to do that
> in a way that's easier to manage?  I have no clue.  I guess a gimple-fe
> based test might help.

Ah, I see.

After lots of pain, I was able to distill a tiny testcase that does
just that (tree-ssa/ssa-thread-backedge.c), and is easy to maintain.
I've added a "backedge" marker to the path dumping code for easy
matching in the test.  An alternative would be to convert it to a
gimple FE test examining the exact thread sequence through the
backedge, but I think that's overkill since the test is so small.

Phew, I think we're finally converging on a useful set of threading tests :).

OK for trunk?
Aldy
Martin Sebor Oct. 21, 2021, 2:50 p.m. UTC | #8
On 10/21/21 1:17 AM, Aldy Hernandez wrote:
> On Wed, Oct 20, 2021 at 10:01 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 10/20/2021 9:15 AM, Aldy Hernandez wrote:
>>> On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor <msebor@gmail.com> wrote:
>>>
>>>> I appreciate the heads up.  I'm happy that the threader has
>>>> improved.  I'm obviously not pleased that it has led to regressions
>>>> in warnings but I understand that in some cases they might be due
>>>> to limitations in the warning code.  I think the test case you have
>>>> xfailed might be one such example.  The uninitialized warnings are
>>>> exquisitely sensitive to these types of changes.  If/when this patch
>>>> is applied please reopen PR 89230 and reference this commit.
>>>>
>>>> Having said that, to maintain the quality of diagnostics,
>>>> the work that goes into these nice optimizer improvements needs
>>>> to be balanced by an effort to either update the warning code
>>>> to cope with the IL changes, or the optimizers need to take care
>>>> to avoid exposing undefined code that the warnings are designed
>>>> to detect.  I'm concerned not just that the quality of GCC 12
>>>> diagnostics has been eroding, but also that it seems to be not
>>>> just acceptable but expected.
>>> You make a very good point.  It is certainly not my intention to make
>>> life difficult for the warning maintainers, but I'm afraid I don't
>>> have sufficient knowledge in the area to improve them.
>>>
>>> There may be some low hanging fruit though.  At least in the warnings
>>> that use the ranger, there's no reason to run these passes so late in
>>> the pipeline.  You could run the warning code as early as you want,
>>> insofar as SSA is available and the CFG has been built.  Heck, you may
>>> even be able to run at -O0, though we may need some sort of value
>>> numbering.  I believe Richi even suggested this a while back.
>> Running them later in the pipeline is to take advantage of the
>> optimizers removing dead and unreachable code as much as possible. In
>> fact, that's critical to -Wuninitialized.  Optimizing away unreachable
>> paths  to avoid Wuninitialized false positives has been the major driver
>> of jump threading improvements for the last 15 years.
> 
> Ughh, that's unfortunate.  We're gonna have to come up with
> improvements to the Wuninitialized code, or a different paradigm
> altogether.  I'm afraid this will only get worse.
> 
> It is a bit ironic that jump threading helps reduce Wuninitialized
> false positives, but yet too much of it causes even more false
> positives.

For best results they need to work together.  That was also my
point.  All middle end warnings necessarily rely on optimizers
for one thing or another.  When we change one such optimization
and break the assumptions the warnings make (right or wrong),
the warnings break too.  To minimize the fallout we need to
do two things: decouple them as much as possible (minimize
the assumptions) and, when we change the optimizers, we
might need to update the warnings.  If we don't, then yes,
things are going to keep getting worse.

As an aside, in GCC 12 a good number of middle end warnings
already do run at -O0: all those in gimple-ssa-warn-access.c.
I'd like to see gimple-ssa-array-bounds invoked from the access
pass too (instead of from VRP), and eventually -Wrestrict as well.
I'm not sure about the strlen/sprintf warnings; those might need
to stay where they are because they run as part of the optimizers
there.

(By the way, I don't see range info in the access pass at -O0.
Should I?)

Martin
Jeff Law Oct. 21, 2021, 5:59 p.m. UTC | #9
On 10/21/2021 1:17 AM, Aldy Hernandez wrote:
> On Wed, Oct 20, 2021 at 10:01 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>> On 10/20/2021 9:15 AM, Aldy Hernandez wrote:
>>> On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor <msebor@gmail.com> wrote:
>>>
>>>> I appreciate the heads up.  I'm happy that the threader has
>>>> improved.  I'm obviously not pleased that it has led to regressions
>>>> in warnings but I understand that in some cases they might be due
>>>> to limitations in the warning code.  I think the test case you have
>>>> xfailed might be one such example.  The uninitialized warnings are
>>>> exquisitely sensitive to these types of changes.  If/when this patch
>>>> is applied please reopen PR 89230 and reference this commit.
>>>>
>>>> Having said that, to maintain the quality of diagnostics,
>>>> the work that goes into these nice optimizer improvements needs
>>>> to be balanced by an effort to either update the warning code
>>>> to cope with the IL changes, or the optimizers need to take care
>>>> to avoid exposing undefined code that the warnings are designed
>>>> to detect.  I'm concerned not just that the quality of GCC 12
>>>> diagnostics has been eroding, but also that it seems to be not
>>>> just acceptable but expected.
>>> You make a very good point.  It is certainly not my intention to make
>>> life difficult for the warning maintainers, but I'm afraid I don't
>>> have sufficient knowledge in the area to improve them.
>>>
>>> There may be some low hanging fruit though.  At least in the warnings
>>> that use the ranger, there's no reason to run these passes so late in
>>> the pipeline.  You could run the warning code as early as you want,
>>> insofar as SSA is available and the CFG has been built.  Heck, you may
>>> even be able to run at -O0, though we may need some sort of value
>>> numbering.  I believe Richi even suggested this a while back.
>> Running them later in the pipeline is to take advantage of the
>> optimizers removing dead and unreachable code as much as possible. In
>> fact, that's critical to -Wuninitialized.  Optimizing away unreachable
>> paths  to avoid Wuninitialized false positives has been the major driver
>> of jump threading improvements for the last 15 years.
> Ughh, that's unfortunate.  We're gonna have to come up with
> improvements to the Wuninitialized code, or a different paradigm
> altogether.  I'm afraid this will only get worse.
Well, good luck with a different paradigm :-)  It's a tough little nut.  
As long as we want to reduce false positives, then we're going to be 
dependent on optimization and related analysis.

And as I've noted before, we're generally better off fixing the 
optimizers when we stumble over a false positive from Wuninitialized.  
When that's not possible, we should look to fix the predicate analysis 
code.

>
> It is a bit ironic that jump threading helps reduce Wuninitialized
> false positives, but yet too much of it causes even more false
> positives.
I would expect the latter to be relatively rare for Wuninitialized. I 
think some of the other middle end warnings may be in a different boat 
though.



jeff
Jeff Law Oct. 22, 2021, 3:34 a.m. UTC | #10
On 10/21/2021 4:15 AM, Aldy Hernandez wrote:
> On Wed, Oct 20, 2021 at 10:19 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>> So we want to keep some form of ssa-dom-thread-7.  That' s the canonical
>> testcase for the case for the FSM optimization.
>>
>> What we need to verify is that we thread jumps across the backedge of
>> the loop through the switch statement to a particular case (thus
>> bypassing the indirect jump for the switch statement).  How to do that
>> in a way that's easier to manage?  I have no clue.  I guess a gimple-fe
>> based test might help.
> Ah, I see.
>
> After lots of pain, I was able to distill a tiny testcase that does
> just that (tree-ssa/ssa-thread-backedge.c), and is easy to maintain.
> I've added a "backedge" marker to the path dumping code for easy
> matching in the test.  An alternative would be to convert it to a
> gimple FE test examining the exact thread sequence through the
> backedge, but I think that's overkill since the test is so small.
Well, and the worry with a smaller testcase is reducing too far with the 
result not really being representative of the issue.  This actually 
happened during the development of the FSM bits.  I got a test from the 
ARM guys, evaluated it and concluded it could be addressed with the 
forward threader....  Then I did the implementation work.  Once done 
they said it didn't work and gave me a better testcase which had more 
"join" blocks we would have had to copy to realize the important jump 
threads.  At which point Steve E's FSM threader was the only viable choice.


>
> Phew, I think we're finally converging on a useful set of threading tests :).
>
> OK for trunk?
Mostly, I just worry about losing the key test for the FSM optimization.

Jeff
Aldy Hernandez Oct. 22, 2021, 3:53 a.m. UTC | #11
On Fri, Oct 22, 2021, 05:34 Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 10/21/2021 4:15 AM, Aldy Hernandez wrote:
> > On Wed, Oct 20, 2021 at 10:19 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >> So we want to keep some form of ssa-dom-thread-7.  That' s the canonical
> >> testcase for the case for the FSM optimization.
> >>
> >> What we need to verify is that we thread jumps across the backedge of
> >> the loop through the switch statement to a particular case (thus
> >> bypassing the indirect jump for the switch statement).  How to do that
> >> in a way that's easier to manage?  I have no clue.  I guess a gimple-fe
> >> based test might help.
> > Ah, I see.
> >
> > After lots of pain, I was able to distill a tiny testcase that does
> > just that (tree-ssa/ssa-thread-backedge.c), and is easy to maintain.
> > I've added a "backedge" marker to the path dumping code for easy
> > matching in the test.  An alternative would be to convert it to a
> > gimple FE test examining the exact thread sequence through the
> > backedge, but I think that's overkill since the test is so small.
> Well, and the worry with a smaller testcase is reducing too far with the
> result not really being representative of the issue.  This actually
> happened during the development of the FSM bits.  I got a test from the
> ARM guys, evaluated it and concluded it could be addressed with the
> forward threader....  Then I did the implementation work.  Once done
> they said it didn't work and gave me a better testcase which had more
> "join" blocks we would have had to copy to realize the important jump
> threads.  At which point Steve E's FSM threader was the only viable choice.
>
>
> >
> > Phew, I think we're finally converging on a useful set of threading
> tests :).
> >
> > OK for trunk?
> Mostly, I just worry about losing the key test for the FSM optimization.


With the provided test, the forward threaders can't thread through the
backedge and into the switch. Disabling the other threaders was just a
precaution. I just wanted to make sure it happened late because of the loop
restrictions we have in place. I could enable the forward threaders to
prove they can't get it.

Also, we have an assert noting that we never thread through backedges in
the forward threaders. It was part of the refactor. So the forward
threaders can't even do it.

I could add more cases and check that we have N or more threads through the
back edges. .and if it makes you feel safer, we could even convert the test
to gimple and test the specific thread sequence. It's just that the gimple
FE test is bound to get large and difficult to decipher if I start adding
many switch cases.

I'm just trying to avoid a huge test with 40 potential threads where no one
really knows how many we should get....as every threading pass opens up
possibilities for other passes.

Ughhhh....we could put the test back, check for some random large number,
and come up with a more satisfactory test later? ;-)

Aldy
Aldy Hernandez Oct. 22, 2021, 11:22 a.m. UTC | #12
On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor <msebor@gmail.com> wrote:

> I'd like to see gimple-ssa-array-bounds invoked from the access
> pass too (instead of from VRP), and eventually -Wrestrict as well.

You can do that right now.  The pass has been converted to the new API
and it would just require calling it with a ranger instead of the
vr_values from VRP:

      array_bounds_checker array_checker (fun, &vrp_vr_values);
      array_checker.check ();

That is, move it where you want and pass it a fresh new gimple_ranger.
If there are any regressions, we'd be glad to look at them.

> I'm not sure about the strlen/sprintf warnings; those might need
> to stay where they are because they run as part of the optimizers
> there.
>
> (By the way, I don't see range info in the access pass at -O0.
> Should I?)

I assume you mean you don't see anything in the dump files.

None of the VRP passes (evrp included) run at -O0, so you wouldn't see
anything in the IL.  You *may* be able to see some global ranges that
DOM's use of the evrp engine exported, but I'm not sure.

You're going to have to instantiate a gimple_ranger and use it if you
want to have range info available, but that's not going to show up in
the IL, even after you use it, because it doesn't export global ranges
by default.  What are you trying to do?

Aldy
Martin Sebor Oct. 22, 2021, 2:27 p.m. UTC | #13
On 10/22/21 5:22 AM, Aldy Hernandez wrote:
> On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor <msebor@gmail.com> wrote:
> 
>> I'd like to see gimple-ssa-array-bounds invoked from the access
>> pass too (instead of from VRP), and eventually -Wrestrict as well.
> 
> You can do that right now.  The pass has been converted to the new API
> and it would just require calling it with a ranger instead of the
> vr_values from VRP:
> 
>        array_bounds_checker array_checker (fun, &vrp_vr_values);
>        array_checker.check ();
> 
> That is, move it where you want and pass it a fresh new gimple_ranger.
> If there are any regressions, we'd be glad to look at them.

I appreciate that and I'm not worried about regressions due to
ranger code.

It's not so simple as it seems because of the optimizer
dependencies I mentioned.  -Warray-bounds runs before vectorization
and the access pass after it.  Moving it into the access pass will
mean dealing with the fallout: either accepting regressions in
the quality of warnings (bad locations due to vectorization
merging distinct stores into one) or running the access pass at
a different point in the pipeline, and facing regressions in
the other warnings due to that.  Running it twice, once earlier
for -Warray-bounds and then again later for -Wstringop-overflow
etc, would be less than optimal because they all do the same
thing (compute object sizes and offsets) and should be able to
share the same data (the pointer query cache).  So the ideal
solution is to find a middle ground where all these warnings
can run from the same pass with optimal results.

-Warray-bounds might also need to be adjusted for -O0 to avoid
warning on unreachable code, although, surprisingly, that hasn't
been an issue for the other warnings now enabled at -O0.

All this will take some time, which I'm out of for this stage 1.

> 
>> I'm not sure about the strlen/sprintf warnings; those might need
>> to stay where they are because they run as part of the optimizers
>> there.
>>
>> (By the way, I don't see range info in the access pass at -O0.
>> Should I?)
> 
> I assume you mean you don't see anything in the dump files.

I mean that I don't get accurate range info from the ranger
instance in any function.  I'd like the example below to trigger
a warning even at -O0 but it doesn't because n's range is
[0, UINT_MAX] instead of [7, UINT_MAX]:

   char a[4];

   void f (unsigned n)
   {
     if (n < 7)
       n = 7;
     __builtin_memset (a, 0, n);
   }

> 
> None of the VRP passes (evrp included) run at -O0, so you wouldn't see
> anything in the IL.  You *may* be able to see some global ranges that
> DOM's use of the evrp engine exported, but I'm not sure.
> 
> You're going to have to instantiate a gimple_ranger and use it if you
> want to have range info available, but that's not going to show up in
> the IL, even after you use it, because it doesn't export global ranges
> by default.  What are you trying to do?

The above.  The expected warning runs in the access warning pass.
It uses the per-function instance of the ranger but it gets back
a range for the type.  To see that put a breakpoint in
get_size_range() in pointer-query.cc and compile the above with
-O0.

Martin
Aldy Hernandez Oct. 22, 2021, 3:18 p.m. UTC | #14
On Fri, Oct 22, 2021 at 4:27 PM Martin Sebor <msebor@gmail.com> wrote:
>
> On 10/22/21 5:22 AM, Aldy Hernandez wrote:
> > On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor <msebor@gmail.com> wrote:
> >
> >> I'd like to see gimple-ssa-array-bounds invoked from the access
> >> pass too (instead of from VRP), and eventually -Wrestrict as well.
> >
> > You can do that right now.  The pass has been converted to the new API
> > and it would just require calling it with a ranger instead of the
> > vr_values from VRP:
> >
> >        array_bounds_checker array_checker (fun, &vrp_vr_values);
> >        array_checker.check ();
> >
> > That is, move it where you want and pass it a fresh new gimple_ranger.
> > If there are any regressions, we'd be glad to look at them.
>
> I appreciate that and I'm not worried about regressions due to
> ranger code.
>
> It's not so simple as it seems because of the optimizer
> dependencies I mentioned.  -Warray-bounds runs before vectorization
> and the access pass after it.  Moving it into the access pass will
> mean dealing with the fallout: either accepting regressions in
> the quality of warnings (bad locations due to vectorization
> merging distinct stores into one) or running the access pass at
> a different point in the pipeline, and facing regressions in
> the other warnings due to that.  Running it twice, once earlier
> for -Warray-bounds and then again later for -Wstringop-overflow
> etc, would be less than optimal because they all do the same
> thing (compute object sizes and offsets) and should be able to
> share the same data (the pointer query cache).  So the ideal
> solution is to find a middle ground where all these warnings
> can run from the same pass with optimal results.
>
> -Warray-bounds might also need to be adjusted for -O0 to avoid
> warning on unreachable code, although, surprisingly, that hasn't
> been an issue for the other warnings now enabled at -O0.
>
> All this will take some time, which I'm out of for this stage 1.
>
> >
> >> I'm not sure about the strlen/sprintf warnings; those might need
> >> to stay where they are because they run as part of the optimizers
> >> there.
> >>
> >> (By the way, I don't see range info in the access pass at -O0.
> >> Should I?)
> >
> > I assume you mean you don't see anything in the dump files.
>
> I mean that I don't get accurate range info from the ranger
> instance in any function.  I'd like the example below to trigger
> a warning even at -O0 but it doesn't because n's range is
> [0, UINT_MAX] instead of [7, UINT_MAX]:
>
>    char a[4];
>
>    void f (unsigned n)
>    {
>      if (n < 7)
>        n = 7;
>      __builtin_memset (a, 0, n);
>    }

Breakpoint 5, get_size_range (query=0x0, bound=<ssa_name
0x7fffe9fefdc8 1>, range=0x7fffffffda10,
    bndrng=0x7fffffffdc98) at
/home/aldyh/src/gcc/gcc/gimple-ssa-warn-access.cc:1196
(gdb) p debug_ranger()
;; Function f

=========== BB 2 ============
Imports: n_3(D)
Exports: n_3(D)
n_3(D)    unsigned int VARYING
    <bb 2> :
    if (n_3(D) <= 6)
      goto <bb 3>; [INV]
    else
      goto <bb 4>; [INV]

2->3  (T) n_3(D) :     unsigned int [0, 6]
2->4  (F) n_3(D) :     unsigned int [7, +INF]

=========== BB 3 ============
    <bb 3> :
    n_4 = 7;

n_4 : unsigned int [7, 7]

=========== BB 4 ============
    <bb 4> :
    # n_2 = PHI <n_3(D)(2), n_4(3)>
    _1 = (long unsigned int) n_2;
    __builtin_memset (&a, 0, _1);
    return;

_1 : long unsigned int [7, 4294967295]
n_2 : unsigned int [7, +INF]
Non-varying global ranges:
=========================:
_1  : long unsigned int [7, 4294967295]
n_2  : unsigned int [7, +INF]
n_4  : unsigned int [7, 7]

From the above it looks like _1 at BB4 is [7, 4294967295].   You probably wan:

  range_of_expr (r, tree_for_ssa_1, gimple_for_the_memset_call)

BTW, debug_ranger() tells you everything ranger would know for the
given IL.  It's meant as a debugging aid.  You may want to look at
it's source to see how it calls the ranger.

Aldy
Martin Sebor Oct. 22, 2021, 3:59 p.m. UTC | #15
On 10/22/21 9:18 AM, Aldy Hernandez wrote:
> On Fri, Oct 22, 2021 at 4:27 PM Martin Sebor <msebor@gmail.com> wrote:
>>
>> On 10/22/21 5:22 AM, Aldy Hernandez wrote:
>>> On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor <msebor@gmail.com> wrote:
>>>
>>>> I'd like to see gimple-ssa-array-bounds invoked from the access
>>>> pass too (instead of from VRP), and eventually -Wrestrict as well.
>>>
>>> You can do that right now.  The pass has been converted to the new API
>>> and it would just require calling it with a ranger instead of the
>>> vr_values from VRP:
>>>
>>>         array_bounds_checker array_checker (fun, &vrp_vr_values);
>>>         array_checker.check ();
>>>
>>> That is, move it where you want and pass it a fresh new gimple_ranger.
>>> If there are any regressions, we'd be glad to look at them.
>>
>> I appreciate that and I'm not worried about regressions due to
>> ranger code.
>>
>> It's not so simple as it seems because of the optimizer
>> dependencies I mentioned.  -Warray-bounds runs before vectorization
>> and the access pass after it.  Moving it into the access pass will
>> mean dealing with the fallout: either accepting regressions in
>> the quality of warnings (bad locations due to vectorization
>> merging distinct stores into one) or running the access pass at
>> a different point in the pipeline, and facing regressions in
>> the other warnings due to that.  Running it twice, once earlier
>> for -Warray-bounds and then again later for -Wstringop-overflow
>> etc, would be less than optimal because they all do the same
>> thing (compute object sizes and offsets) and should be able to
>> share the same data (the pointer query cache).  So the ideal
>> solution is to find a middle ground where all these warnings
>> can run from the same pass with optimal results.
>>
>> -Warray-bounds might also need to be adjusted for -O0 to avoid
>> warning on unreachable code, although, surprisingly, that hasn't
>> been an issue for the other warnings now enabled at -O0.
>>
>> All this will take some time, which I'm out of for this stage 1.
>>
>>>
>>>> I'm not sure about the strlen/sprintf warnings; those might need
>>>> to stay where they are because they run as part of the optimizers
>>>> there.
>>>>
>>>> (By the way, I don't see range info in the access pass at -O0.
>>>> Should I?)
>>>
>>> I assume you mean you don't see anything in the dump files.
>>
>> I mean that I don't get accurate range info from the ranger
>> instance in any function.  I'd like the example below to trigger
>> a warning even at -O0 but it doesn't because n's range is
>> [0, UINT_MAX] instead of [7, UINT_MAX]:
>>
>>     char a[4];
>>
>>     void f (unsigned n)
>>     {
>>       if (n < 7)
>>         n = 7;
>>       __builtin_memset (a, 0, n);
>>     }
> 
> Breakpoint 5, get_size_range (query=0x0, bound=<ssa_name
> 0x7fffe9fefdc8 1>, range=0x7fffffffda10,
>      bndrng=0x7fffffffdc98) at
> /home/aldyh/src/gcc/gcc/gimple-ssa-warn-access.cc:1196
> (gdb) p debug_ranger()
> ;; Function f
> 
> =========== BB 2 ============
> Imports: n_3(D)
> Exports: n_3(D)
> n_3(D)    unsigned int VARYING
>      <bb 2> :
>      if (n_3(D) <= 6)
>        goto <bb 3>; [INV]
>      else
>        goto <bb 4>; [INV]
> 
> 2->3  (T) n_3(D) :     unsigned int [0, 6]
> 2->4  (F) n_3(D) :     unsigned int [7, +INF]
> 
> =========== BB 3 ============
>      <bb 3> :
>      n_4 = 7;
> 
> n_4 : unsigned int [7, 7]
> 
> =========== BB 4 ============
>      <bb 4> :
>      # n_2 = PHI <n_3(D)(2), n_4(3)>
>      _1 = (long unsigned int) n_2;
>      __builtin_memset (&a, 0, _1);
>      return;
> 
> _1 : long unsigned int [7, 4294967295]
> n_2 : unsigned int [7, +INF]
> Non-varying global ranges:
> =========================:
> _1  : long unsigned int [7, 4294967295]
> n_2  : unsigned int [7, +INF]
> n_4  : unsigned int [7, 7]
> 
>  From the above it looks like _1 at BB4 is [7, 4294967295].

Great!

>   You probably wan:
> 
>    range_of_expr (r, tree_for_ssa_1, gimple_for_the_memset_call)

That's what the function does.  But its caller doesn't have
access to the Gimple statement so it passes in null instead.
Presumably without it, range_of_expr() doesn't have enough
context to know what BB I'm asking about.  It does work
without the statement at -O but then there's just one BB
(the if statement becomes a MAX_EXPR) so there's just one
range.

> 
> BTW, debug_ranger() tells you everything ranger would know for the
> given IL.  It's meant as a debugging aid.  You may want to look at
> it's source to see how it calls the ranger.

Thanks for the tip.  I should do that.  There's a paradigm
shift from the old ways of working with ranges and the new
way, and it will take a bit of adjusting to.  I just haven't
spent enough time working with Ranger to be there.  But this
exchange alone was already very helpful!

Martin
Aldy Hernandez Oct. 23, 2021, 8:31 a.m. UTC | #16
On 10/22/21 5:59 PM, Martin Sebor wrote:
> On 10/22/21 9:18 AM, Aldy Hernandez wrote:
>> On Fri, Oct 22, 2021 at 4:27 PM Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> On 10/22/21 5:22 AM, Aldy Hernandez wrote:
>>>> On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor <msebor@gmail.com> wrote:

>>>>> (By the way, I don't see range info in the access pass at -O0.
>>>>> Should I?)
>>>>
>>>> I assume you mean you don't see anything in the dump files.
>>>
>>> I mean that I don't get accurate range info from the ranger
>>> instance in any function.  I'd like the example below to trigger
>>> a warning even at -O0 but it doesn't because n's range is
>>> [0, UINT_MAX] instead of [7, UINT_MAX]:
>>>
>>>     char a[4];
>>>
>>>     void f (unsigned n)
>>>     {
>>>       if (n < 7)
>>>         n = 7;
>>>       __builtin_memset (a, 0, n);
>>>     }
>>
>> Breakpoint 5, get_size_range (query=0x0, bound=<ssa_name
>> 0x7fffe9fefdc8 1>, range=0x7fffffffda10,
>>      bndrng=0x7fffffffdc98) at
>> /home/aldyh/src/gcc/gcc/gimple-ssa-warn-access.cc:1196
>> (gdb) p debug_ranger()
>> ;; Function f
>>
>> =========== BB 2 ============
>> Imports: n_3(D)
>> Exports: n_3(D)
>> n_3(D)    unsigned int VARYING
>>      <bb 2> :
>>      if (n_3(D) <= 6)
>>        goto <bb 3>; [INV]
>>      else
>>        goto <bb 4>; [INV]
>>
>> 2->3  (T) n_3(D) :     unsigned int [0, 6]
>> 2->4  (F) n_3(D) :     unsigned int [7, +INF]
>>
>> =========== BB 3 ============
>>      <bb 3> :
>>      n_4 = 7;
>>
>> n_4 : unsigned int [7, 7]
>>
>> =========== BB 4 ============
>>      <bb 4> :
>>      # n_2 = PHI <n_3(D)(2), n_4(3)>
>>      _1 = (long unsigned int) n_2;
>>      __builtin_memset (&a, 0, _1);
>>      return;
>>
>> _1 : long unsigned int [7, 4294967295]
>> n_2 : unsigned int [7, +INF]
>> Non-varying global ranges:
>> =========================:
>> _1  : long unsigned int [7, 4294967295]
>> n_2  : unsigned int [7, +INF]
>> n_4  : unsigned int [7, 7]
>>
>>  From the above it looks like _1 at BB4 is [7, 4294967295].
> 
> Great!
> 
>>   You probably want:
>>
>>    range_of_expr (r, tree_for_ssa_1, gimple_for_the_memset_call)
> 
> That's what the function does.  But its caller doesn't have
> access to the Gimple statement so it passes in null instead.
> Presumably without it, range_of_expr() doesn't have enough
> context to know what BB I'm asking about.  It does work
> without the statement at -O but then there's just one BB
> (the if statement becomes a MAX_EXPR) so there's just one
> range.
> 
>>
>> BTW, debug_ranger() tells you everything ranger would know for the
>> given IL.  It's meant as a debugging aid.  You may want to look at
>> it's source to see how it calls the ranger.
> 
> Thanks for the tip.  I should do that.  There's a paradigm
> shift from the old ways of working with ranges and the new
> way, and it will take a bit of adjusting to.  I just haven't
> spent enough time working with Ranger to be there.  But this
> exchange alone was already very helpful!


You can query the ranger on any point in the IL.  However, if you don't 
give it context, it'll just return the globally known range.  In this 
case it'll be the global SSA range, which is unset because the usual 
setters haven't run at -O0 (evrp, VRP*).  So yes, you need to pass it 
correct context.

Yes, it's a paradigm shift.  The evrp engine worked by pushing state as 
you did a dom walk, so you could ask for SSA ranges that were specific 
to the path sensitive point you were in the walk.  The ranger is far 
more versatile, in which you can ask for a range on an edge, block, or 
statement, regardless of how you're iterating through the gimple.

You can also use the ranger to indirectly tell you about reachability. 
For example, if you ask for the range of x_6 at a point in the IL and 
the answer comes out as UNDEFINED, that point is unreachable.  That is, 
assuming x_6 is considered live at the query point.  IIRC, if you ask 
for x_6  at a point not dominated by the definition of x_6, the ranger 
will also return UNDEFINED.

The ranger API is designed to be minimal and simple:

bool range_of_stmt (irange &r, gimple *, tree name = NULL);
bool range_of_expr (irange &r, tree name, gimple * = NULL);
bool range_on_edge (irange &r, edge e, tree name);
void range_on_exit (irange &r, basic_block bb, tree name);

Andrew keeps threatening he'll write up some articles this year on the 
ranger and its reusable components. *prod* :)

Aldy
Jeff Law Oct. 24, 2021, 4:57 p.m. UTC | #17
On 10/21/2021 9:53 PM, Aldy Hernandez wrote:
>
>
>     >
>     > Phew, I think we're finally converging on a useful set of
>     threading tests :).
>     >
>     > OK for trunk?
>     Mostly, I just worry about losing the key test for the FSM
>     optimization.
>
>
> With the provided test, the forward threaders can't thread through the 
> backedge and into the switch. Disabling the other threaders was just a 
> precaution. I just wanted to make sure it happened late because of the 
> loop restrictions we have in place. I could enable the forward 
> threaders to prove they can't get it.
Right.  There was a time when the forward threaders handled the 
backedge, but it's much better handled by the backwards threader.

> I could add more cases and check that we have N or more threads 
> through the back edges. .and if it makes you feel safer, we could even 
> convert the test to gimple and test the specific thread sequence. It's 
> just that the gimple FE test is bound to get large and difficult to 
> decipher if I start adding many switch cases.
I would love if we could turn the testcase into a gimple based test.  I 
just shudder at the thought of trying to pull that together.  And yes, 
it's awful hard to decipher, both in terms of test behavior and in terms 
of what the key jump threads are.

>
> I'm just trying to avoid a huge test with 40 potential threads where 
> no one really knows how many we should get....as every threading pass 
> opens up possibilities for other passes.
Understood.  To some degree it's inherent in the problem.  The smarter 
our threaders get the more likely they are to discover new 
opportunities, so there's clearly a maintenance burden to these tests 
over time.  It's made worse by the interactions with BRANCH_COST as well 
as the heuristics for switch conversion.

Gimple based tests would significantly help the the latter issues, but I 
don't know how to tackle the problem of exposing more jump threads as 
our threaders get better.

>
> Ughhhh....we could put the test back, check for some random large 
> number, and come up with a more satisfactory test later? ;-)
I thought our "counting" based tests could only check equality (ie, 
expect to see this string precisely N times).  Though if we could check 
that # threads realized was > some low water mark, that'd probably be 
better than what we've got right now.


jeff
Bernhard Reutner-Fischer Oct. 24, 2021, 5:55 p.m. UTC | #18
On Sun, 24 Oct 2021 10:57:05 -0600
Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:

> I thought our "counting" based tests could only check equality (ie, 
> expect to see this string precisely N times).  Though if we could check 
> that # threads realized was > some low water mark, that'd probably be 
> better than what we've got right now.

That's what i was about to suggest yesterday but didn't dare, with a
reference to
testsuite/lib/scanasm.exp # proc object-size
for the cmp.
Richard Biener Oct. 24, 2021, 6:21 p.m. UTC | #19
On October 24, 2021 6:57:05 PM GMT+02:00, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
>
>On 10/21/2021 9:53 PM, Aldy Hernandez wrote:
>>
>>
>>     >
>>     > Phew, I think we're finally converging on a useful set of
>>     threading tests :).
>>     >
>>     > OK for trunk?
>>     Mostly, I just worry about losing the key test for the FSM
>>     optimization.
>>
>>
>> With the provided test, the forward threaders can't thread through the 
>> backedge and into the switch. Disabling the other threaders was just a 
>> precaution. I just wanted to make sure it happened late because of the 
>> loop restrictions we have in place. I could enable the forward 
>> threaders to prove they can't get it.
>Right.  There was a time when the forward threaders handled the 
>backedge, but it's much better handled by the backwards threader.
>
>> I could add more cases and check that we have N or more threads 
>> through the back edges. .and if it makes you feel safer, we could even 
>> convert the test to gimple and test the specific thread sequence. It's 
>> just that the gimple FE test is bound to get large and difficult to 
>> decipher if I start adding many switch cases.
>I would love if we could turn the testcase into a gimple based test.  I 
>just shudder at the thought of trying to pull that together.  And yes, 
>it's awful hard to decipher, both in terms of test behavior and in terms 
>of what the key jump threads are.
>
>>
>> I'm just trying to avoid a huge test with 40 potential threads where 
>> no one really knows how many we should get....as every threading pass 
>> opens up possibilities for other passes.
>Understood.  To some degree it's inherent in the problem.  The smarter 
>our threaders get the more likely they are to discover new 
>opportunities, so there's clearly a maintenance burden to these tests 
>over time.  It's made worse by the interactions with BRANCH_COST as well 
>as the heuristics for switch conversion.
>
>Gimple based tests would significantly help the the latter issues, but I 
>don't know how to tackle the problem of exposing more jump threads as 
>our threaders get better.

Well, you'd feed the specific GIMPLE to a single threading pass and check its dump file. With GIMPLE based tests you can nearly do unit testing... 

>>
>> Ughhhh....we could put the test back, check for some random large 
>> number, and come up with a more satisfactory test later? ;-)

Maybe we can dump the source location of conditions we thread through when dumping the threading pass. 

>I thought our "counting" based tests could only check equality (ie, 
>expect to see this string precisely N times).  Though if we could check 
>that # threads realized was > some low water mark, that'd probably be 
>better than what we've got right now.
>
>
>jeff
Aldy Hernandez Oct. 24, 2021, 6:25 p.m. UTC | #20
On 10/24/21 6:57 PM, Jeff Law wrote:

>> Ughhhh....we could put the test back, check for some random large 
>> number, and come up with a more satisfactory test later? ;-)
> I thought our "counting" based tests could only check equality (ie, 
> expect to see this string precisely N times).  Though if we could check 
> that # threads realized was > some low water mark, that'd probably be 
> better than what we've got right now.

Andrew actually had a patch for a dejagnu construct doing just that 
(scan-tree-dump-minimum), but I just noticed it didn't work quite right 
for this test.

This is a bit embarrassing, but upon further analysis I've just noticed 
that the number of threadable candidates has been exploding over the 
year, but the ones that actually make it past the block copier 
restrictions plus rewire_first_differing_edge, etc, only changed by 1 
with this patch.  So perhaps we don't need to bend over backward (just 
yet anyhow).

I can leave the simple gimple FE test since I've already coded it.   Up 
to you.

How does this look?

Aldy
Aldy Hernandez Oct. 25, 2021, 6:47 a.m. UTC | #21
On Sun, Oct 24, 2021 at 8:25 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On 10/24/21 6:57 PM, Jeff Law wrote:
>
> >> Ughhhh....we could put the test back, check for some random large
> >> number, and come up with a more satisfactory test later? ;-)
> > I thought our "counting" based tests could only check equality (ie,
> > expect to see this string precisely N times).  Though if we could check
> > that # threads realized was > some low water mark, that'd probably be
> > better than what we've got right now.
>
> Andrew actually had a patch for a dejagnu construct doing just that
> (scan-tree-dump-minimum), but I just noticed it didn't work quite right
> for this test.
>
> This is a bit embarrassing, but upon further analysis I've just noticed
> that the number of threadable candidates has been exploding over the
> year, but the ones that actually make it past the block copier
> restrictions plus rewire_first_differing_edge, etc, only changed by 1
> with this patch.  So perhaps we don't need to bend over backward (just
> yet anyhow).

For the record, the differing thread is expected and correct.

Before the patch we have these 3 threads:

[59] Registering jump thread: (6, 10) incoming edge; (10, 32) normal
(32, 33) normal (33, 36) nocopy;
[60] Registering jump thread: (7, 10) incoming edge; (10, 32) normal
(32, 33) normal (33, 36) nocopy;
[61] Registering jump thread: (8, 10) incoming edge; (10, 32) normal
(32, 33) normal (33, 36) nocopy;

Whereas with the new patch, we don't need to go as far back to thread:

[59] Registering jump thread: (10, 32) incoming edge; (32, 33) normal
(33, 36) nocopy;

Aldy
Andrew MacLeod Oct. 25, 2021, 4:58 p.m. UTC | #22
On 10/20/21 6:28 AM, Aldy Hernandez wrote:
> Sometimes we can solve a candidate path without having to recurse
> further back.  This can mostly happen in fully resolving mode, because
> we can ask the ranger what the range on entry to the path is, but
> there's no reason this can't always apply.  This one-liner removes
> the fully-resolving restriction.
>
> I'm tickled pink to see how many things we now get quite early
> in the compilation.  I actually had to disable jump threading entirely
> for a few tests because the early threader was catching things
> disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
> cleanups happening.
>
> I was going to commit this as obvious, but I think the test changes
> merit discussion.
>
> We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
> some time.  Every time a threading pass gets smarter, we push the
> check further down the pipeline.  We've officially run out of dumb
> threading passes to disable ;-).  In the last year we've gone up from a
> handful of threads, to 34 threads with the current combination of
> options.  I doubt this is testing anything useful any more, so I've
> removed it.
>
> Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
> jump threads, but they were disallowed because of loop rotation.  Then
> we started catching more jump threads in VRP2 threading so we tested
> there.  With this patch though, we triple the number of threads found
> from 11 to 31.  I believe this test has outlived its usefulness, and
> I've removed it.  Note that even though we have these outrageous
> possibilities for this test, the block copier ultimately chops them
> down (23 survive though).

Im running into an issue with ssa-dom-thread-4.c when trying to run 
ranger for the VRP2 pass.  It reduces the number of threads to 2, and 
upon closer inspection as to why, I see:

unsigned char
bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
                       const_bitmap kill)
{
   unsigned char changed = 0;

   bitmap_element *dst_elt;
   const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;

   while (a_elt || b_elt)
     {

Ranger determines that the uses of a_elt and b_elt in the guard are used 
before defined, so assumed UNDFINED and removes a condition check.

So it seems like this entire test case is predicated on undefined 
behaviour?  fwiw, If I initialize them, I get 0 threads...

Andrew
Aldy Hernandez Oct. 25, 2021, 5:01 p.m. UTC | #23
On Mon, Oct 25, 2021 at 6:58 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 10/20/21 6:28 AM, Aldy Hernandez wrote:
> > Sometimes we can solve a candidate path without having to recurse
> > further back.  This can mostly happen in fully resolving mode, because
> > we can ask the ranger what the range on entry to the path is, but
> > there's no reason this can't always apply.  This one-liner removes
> > the fully-resolving restriction.
> >
> > I'm tickled pink to see how many things we now get quite early
> > in the compilation.  I actually had to disable jump threading entirely
> > for a few tests because the early threader was catching things
> > disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
> > cleanups happening.
> >
> > I was going to commit this as obvious, but I think the test changes
> > merit discussion.
> >
> > We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
> > some time.  Every time a threading pass gets smarter, we push the
> > check further down the pipeline.  We've officially run out of dumb
> > threading passes to disable ;-).  In the last year we've gone up from a
> > handful of threads, to 34 threads with the current combination of
> > options.  I doubt this is testing anything useful any more, so I've
> > removed it.
> >
> > Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
> > jump threads, but they were disallowed because of loop rotation.  Then
> > we started catching more jump threads in VRP2 threading so we tested
> > there.  With this patch though, we triple the number of threads found
> > from 11 to 31.  I believe this test has outlived its usefulness, and
> > I've removed it.  Note that even though we have these outrageous
> > possibilities for this test, the block copier ultimately chops them
> > down (23 survive though).
>
> Im running into an issue with ssa-dom-thread-4.c when trying to run
> ranger for the VRP2 pass.  It reduces the number of threads to 2, and
> upon closer inspection as to why, I see:
>
> unsigned char
> bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
>                        const_bitmap kill)
> {
>    unsigned char changed = 0;
>
>    bitmap_element *dst_elt;
>    const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;
>
>    while (a_elt || b_elt)
>      {
>
> Ranger determines that the uses of a_elt and b_elt in the guard are used
> before defined, so assumed UNDFINED and removes a condition check.
>
> So it seems like this entire test case is predicated on undefined
> behaviour?  fwiw, If I initialize them, I get 0 threads...

Hah.  That makes sense.  As the threaders have gotten smarter, the
scan-tree-dump-times has moved later on in the pipeline to less
capable threaders.  Currently it's testing for 3 threads in the first
VRP threader pass.  With my proposed patch I bet we see an UNDEFINED
somewhere in the calculation, and bail on the entire thread as
unreachable.

Aldy
Jeff Law Oct. 25, 2021, 5:02 p.m. UTC | #24
On 10/25/2021 10:58 AM, Andrew MacLeod wrote:
> On 10/20/21 6:28 AM, Aldy Hernandez wrote:
>> Sometimes we can solve a candidate path without having to recurse
>> further back.  This can mostly happen in fully resolving mode, because
>> we can ask the ranger what the range on entry to the path is, but
>> there's no reason this can't always apply.  This one-liner removes
>> the fully-resolving restriction.
>>
>> I'm tickled pink to see how many things we now get quite early
>> in the compilation.  I actually had to disable jump threading entirely
>> for a few tests because the early threader was catching things
>> disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
>> cleanups happening.
>>
>> I was going to commit this as obvious, but I think the test changes
>> merit discussion.
>>
>> We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
>> some time.  Every time a threading pass gets smarter, we push the
>> check further down the pipeline.  We've officially run out of dumb
>> threading passes to disable ;-).  In the last year we've gone up from a
>> handful of threads, to 34 threads with the current combination of
>> options.  I doubt this is testing anything useful any more, so I've
>> removed it.
>>
>> Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
>> jump threads, but they were disallowed because of loop rotation.  Then
>> we started catching more jump threads in VRP2 threading so we tested
>> there.  With this patch though, we triple the number of threads found
>> from 11 to 31.  I believe this test has outlived its usefulness, and
>> I've removed it.  Note that even though we have these outrageous
>> possibilities for this test, the block copier ultimately chops them
>> down (23 survive though).
>
> Im running into an issue with ssa-dom-thread-4.c when trying to run 
> ranger for the VRP2 pass.  It reduces the number of threads to 2, and 
> upon closer inspection as to why, I see:
>
> unsigned char
> bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
>                       const_bitmap kill)
> {
>   unsigned char changed = 0;
>
>   bitmap_element *dst_elt;
>   const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;
>
>   while (a_elt || b_elt)
>     {
>
> Ranger determines that the uses of a_elt and b_elt in the guard are 
> used before defined, so assumed UNDFINED and removes a condition check.
>
> So it seems like this entire test case is predicated on undefined 
> behaviour?  fwiw, If I initialize them, I get 0 threads...
It may have been over-reduced.  It looks like one of mine :-)

I'd argue that in this case the test is compromised and should just be 
removed.

Jeff
Jeff Law Oct. 25, 2021, 6:42 p.m. UTC | #25
On 10/24/2021 12:25 PM, Aldy Hernandez wrote:
> On 10/24/21 6:57 PM, Jeff Law wrote:
>
>>> Ughhhh....we could put the test back, check for some random large 
>>> number, and come up with a more satisfactory test later? ;-)
>> I thought our "counting" based tests could only check equality (ie, 
>> expect to see this string precisely N times).  Though if we could 
>> check that # threads realized was > some low water mark, that'd 
>> probably be better than what we've got right now.
>
> Andrew actually had a patch for a dejagnu construct doing just that 
> (scan-tree-dump-minimum), but I just noticed it didn't work quite 
> right for this test.
>
> This is a bit embarrassing, but upon further analysis I've just 
> noticed that the number of threadable candidates has been exploding 
> over the year, but the ones that actually make it past the block 
> copier restrictions plus rewire_first_differing_edge, etc, only 
> changed by 1 with this patch.  So perhaps we don't need to bend over 
> backward (just yet anyhow).
>
> I can leave the simple gimple FE test since I've already coded it.   
> Up to you.
I'd keep the gimple FE test.  I can easily see coming back to this ;-)

>
> How does this look?
Looks good for the trunk to me.

jeff
Aldy Hernandez Oct. 25, 2021, 6:49 p.m. UTC | #26
On Mon, Oct 25, 2021 at 8:42 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 10/24/2021 12:25 PM, Aldy Hernandez wrote:
> > On 10/24/21 6:57 PM, Jeff Law wrote:
> >
> >>> Ughhhh....we could put the test back, check for some random large
> >>> number, and come up with a more satisfactory test later? ;-)
> >> I thought our "counting" based tests could only check equality (ie,
> >> expect to see this string precisely N times).  Though if we could
> >> check that # threads realized was > some low water mark, that'd
> >> probably be better than what we've got right now.
> >
> > Andrew actually had a patch for a dejagnu construct doing just that
> > (scan-tree-dump-minimum), but I just noticed it didn't work quite
> > right for this test.
> >
> > This is a bit embarrassing, but upon further analysis I've just
> > noticed that the number of threadable candidates has been exploding
> > over the year, but the ones that actually make it past the block
> > copier restrictions plus rewire_first_differing_edge, etc, only
> > changed by 1 with this patch.  So perhaps we don't need to bend over
> > backward (just yet anyhow).
> >
> > I can leave the simple gimple FE test since I've already coded it.
> > Up to you.
> I'd keep the gimple FE test.  I can easily see coming back to this ;-)
>
> >
> > How does this look?
> Looks good for the trunk to me.

Thanks Jeff.

I will commit the other patch from this series as well as the
testsuite change, both of which you approved.  Also, I was going to
commit the following as obvious until I noticed it depended on the
other patches:

https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582232.html

I think it's now obvious, but if you have an objection, let me know.

It'll be a while, cause I need to rest everything again on x86 and
ppc64.  I'm tired of getting mail from CI bots :).

Thanks for your feedback and patience.
Aldy
Jeff Law Oct. 25, 2021, 6:58 p.m. UTC | #27
On 10/25/2021 12:49 PM, Aldy Hernandez wrote:
> On Mon, Oct 25, 2021 at 8:42 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>> On 10/24/2021 12:25 PM, Aldy Hernandez wrote:
>>> On 10/24/21 6:57 PM, Jeff Law wrote:
>>>
>>>>> Ughhhh....we could put the test back, check for some random large
>>>>> number, and come up with a more satisfactory test later? ;-)
>>>> I thought our "counting" based tests could only check equality (ie,
>>>> expect to see this string precisely N times).  Though if we could
>>>> check that # threads realized was > some low water mark, that'd
>>>> probably be better than what we've got right now.
>>> Andrew actually had a patch for a dejagnu construct doing just that
>>> (scan-tree-dump-minimum), but I just noticed it didn't work quite
>>> right for this test.
>>>
>>> This is a bit embarrassing, but upon further analysis I've just
>>> noticed that the number of threadable candidates has been exploding
>>> over the year, but the ones that actually make it past the block
>>> copier restrictions plus rewire_first_differing_edge, etc, only
>>> changed by 1 with this patch.  So perhaps we don't need to bend over
>>> backward (just yet anyhow).
>>>
>>> I can leave the simple gimple FE test since I've already coded it.
>>> Up to you.
>> I'd keep the gimple FE test.  I can easily see coming back to this ;-)
>>
>>> How does this look?
>> Looks good for the trunk to me.
> Thanks Jeff.
>
> I will commit the other patch from this series as well as the
> testsuite change, both of which you approved.  Also, I was going to
> commit the following as obvious until I noticed it depended on the
> other patches:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582232.html
Just to be explicit, that patch is fine too.

>
> I think it's now obvious, but if you have an objection, let me know.
>
> It'll be a while, cause I need to rest everything again on x86 and
> ppc64.  I'm tired of getting mail from CI bots :).
>
> Thanks for your feedback and patience.
Thanks for digging into this stuff.  It's ripe for some developer love.

jeff
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c
index 06aa19a8577..42e23fc157e 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target size32plus } */
+/* { dg-additional-options "-fno-thread-jumps" } */
 #define NMAX 3000
 
 static double a[NMAX][NMAX], b[NMAX][NMAX], c[NMAX][NMAX];
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
index 41c91b97b57..feb99358ac7 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target size32plus } */
+/* { dg-additional-options "-fno-thread-jumps" } */
 #define NMAX 3000
 
 static double a[NMAX][NMAX], b[NMAX][NMAX], c[NMAX][NMAX];
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c
index 5622dce4798..935ade3fb6e 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target size32plus } */
+/* { dg-additional-options "-fno-thread-jumps" } */
 #define NMAX 3000
 #define MEASURE_TIME 1
 
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
index e01a517be11..5c65e406589 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target size32plus } */
+/* { dg-additional-options "-fno-thread-jumps" } */
 #define NMAX 3000
 #define MEASURE_TIME 1
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr20701.c b/gcc/testsuite/gcc.dg/tree-ssa/pr20701.c
index 2f914589e32..496c4256733 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr20701.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr20701.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fno-early-inlining -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fno-early-inlining -fdelete-null-pointer-checks -fdisable-tree-thread1" } */
 
 typedef struct {
   int code;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr20702.c b/gcc/testsuite/gcc.dg/tree-ssa/pr20702.c
index c896857748c..81129674d8a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr20702.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr20702.c
@@ -4,7 +4,7 @@ 
    immediate successors of the basic block.  */
 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-dominator-opts -fdisable-tree-evrp -fdump-tree-vrp1-details -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fno-thread-jumps -fdisable-tree-evrp -fdump-tree-vrp1-details -fdelete-null-pointer-checks" } */
 
 extern void bar (int);
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21086.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21086.c
index aadd53e2237..9b93d39d4e4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr21086.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21086.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdump-tree-dce2 -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fno-thread-jumps -fdisable-tree-evrp -fdump-tree-vrp1 -fdump-tree-dce2 -fdelete-null-pointer-checks" } */
 
 int
 foo (int *p)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr25382.c b/gcc/testsuite/gcc.dg/tree-ssa/pr25382.c
index d74765551c2..8634c0a7895 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr25382.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr25382.c
@@ -3,7 +3,7 @@ 
    Check that VRP now gets ranges from BIT_AND_EXPRs.  */
 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-ccp -fdisable-tree-evrp -fdump-tree-vrp1" } */
+/* { dg-options "-O2 -fno-thread-jumps -fno-tree-ccp -fdisable-tree-evrp -fdump-tree-vrp1" } */
 
 int
 foo (int a)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr58480.c b/gcc/testsuite/gcc.dg/tree-ssa/pr58480.c
index 42898e72d4e..f11623b7c6b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr58480.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr58480.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile { target { ! keeps_null_pointer_checks } } } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fno-thread-jumps -fdisable-tree-evrp -fdump-tree-vrp1 -fdelete-null-pointer-checks" } */
 
 extern void eliminate (void);
 extern void* f1 (void *a, void *b) __attribute__((nonnull));
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
deleted file mode 100644
index 9cd463571c4..00000000000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
+++ /dev/null
@@ -1,60 +0,0 @@ 
-/* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp-thread2-details -fdump-tree-dom2-details -std=gnu89 --param logical-op-non-short-circuit=1" } */
-struct bitmap_head_def;
-typedef struct bitmap_head_def *bitmap;
-typedef const struct bitmap_head_def *const_bitmap;
-typedef unsigned long BITMAP_WORD;
-typedef struct bitmap_element_def
-{
-  struct bitmap_element_def *next;
-  unsigned int indx;
-} bitmap_element;
-
-
-
-
-
-
-
-
-
-unsigned char
-bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
-		      const_bitmap kill)
-{
-  unsigned char changed = 0;
-
-  bitmap_element *dst_elt;
-  const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;
-
-  while (a_elt || b_elt)
-    {
-      unsigned char new_element = 0;
-
-      if (b_elt)
-	while (kill_elt && kill_elt->indx < b_elt->indx)
-	  kill_elt = kill_elt->next;
-
-      if (b_elt && kill_elt && kill_elt->indx == b_elt->indx
-	  && (!a_elt || a_elt->indx >= b_elt->indx))
-	{
-	  bitmap_element tmp_elt;
-	  unsigned ix;
-
-	  BITMAP_WORD ior = 0;
-
-	      changed = bitmap_elt_ior (dst, dst_elt, dst_prev,
-					a_elt, &tmp_elt, changed);
-
-	}
-
-    }
-
-
-  return changed;
-}
-/* We used to catch 3 jump threads in vrp-thread1, but they all
-   rotated the loop, so they were disallowed.  This in turn created
-   other opportunities for the other threaders which result in the the
-   post-loop threader (vrp-thread2) catching more.  */
-/* { dg-final { scan-tree-dump-times "Registering jump thread" 5 "vrp-thread2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
deleted file mode 100644
index 1da00a691c8..00000000000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ /dev/null
@@ -1,134 +0,0 @@ 
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
-
-/* { dg-final { scan-tree-dump "Jumps threaded: 12"  "thread3" } } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
-
-/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
-   to change decisions in switch expansion which in turn can expose new
-   jump threading opportunities.  Skip the later tests on aarch64.  */
-/* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" { target { ! aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp-thread2" { target { ! aarch64*-*-* } } } } */
-
-enum STATE {
-  S0=0,
-  SI,
-  S1,
-  S2,
-  S3,
-  S4,
-  S5,
-  S6
-};
-
-int bar (enum STATE s);
-
-enum STATE foo (unsigned char **y, unsigned *c)
-{
-  unsigned char *x = *y;
-  unsigned char n;
-  enum STATE s = S0;
-
-  for( ; *x && s != SI; x++ )
-    {
-      n = *x;
-      if (n == 'x')
-	{
-	  x++;
-	  break;
-	}
-      switch(s)
-	{
-	case S0:
-	  if(bar(n))
-	    s = S3;
-	  else if( n == 'a' || n == 'b' )
-	    s = S1;
-	  else if( n == 'c' )
-	    s = S4;
-	  else
-	    {
-	      s = SI;
-	      c[SI]++;
-	    }
-	  c[S0]++;
-	  break;
-	case S1:
-	  if(bar(n))
-	    {
-	      s = S3;
-	      c[S1]++;
-	    }
-	  else if( n == 'c' )
-	    {
-	      s = S4;
-	      c[S1]++;
-	    }
-	  else
-	    {
-	      s = SI;
-	      c[S1]++;
-	    }
-	  break;
-	case S3:
-	  if( n == 'c' )
-	    {
-	      s = S4;
-	      c[S3]++;
-	    }
-	  else if(!bar(n))
-	    {
-	      s = SI;
-	      c[S3]++;
-	    }
-	  break;
-	case S4:
-	  if( n == 'E' || n == 'e' )
-	    {
-	      s = S2;
-	      c[S4]++;
-	    }
-	  else if(!bar(n))
-	    {
-	      s = SI;
-	      c[S4]++;
-	    }
-	  break;
-	case S2:
-	  if( n == 'a' || n == 'b' )
-	    {
-	      s = S5;
-	      c[S2]++;
-	    }
-	  else
-	    {
-	      s = SI;
-	      c[S2]++;
-	    }
-	  break;
-	case S5:
-	  if(bar(n))
-	    {
-	      s = S6;
-	      c[S5]++;
-	    }
-	  else
-	    {
-	      s = SI;
-	      c[S5]++;
-	    }
-	  break;
-	case S6:
-	  if(!bar(n))
-	    {
-	      s = SI;
-	      c[SI]++;
-	    }
-	  break;
-	default:
-	  break;
-	}
-    }
-  *y=x;
-  return s;
-}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c
deleted file mode 100644
index 672a54e07db..00000000000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c
+++ /dev/null
@@ -1,50 +0,0 @@ 
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp2-details --param logical-op-non-short-circuit=1" } */
-/* { dg-additional-options "-fdisable-tree-ethread -fdisable-tree-thread1 -fdisable-tree-thread2" } */
-/* { dg-final { scan-tree-dump-not "IRREDUCIBLE_LOOP" "vrp2" } } */
-
-void abort (void);
-typedef struct bitmap_head_def *bitmap;
-typedef const struct bitmap_head_def *const_bitmap;
-typedef struct bitmap_obstack
-{
-  struct bitmap_obstack *next;
-  unsigned int indx;
-}
-bitmap_element;
-typedef struct bitmap_head_def
-{
-  bitmap_element *first;
-}
-bitmap_head;
-static __inline__ unsigned char
-bitmap_elt_ior (bitmap dst, bitmap_element * dst_elt,
-		bitmap_element * dst_prev, const bitmap_element * a_elt,
-		const bitmap_element * b_elt)
-{
-  ((void) (!(a_elt || b_elt) ? abort (), 0 : 0));
-}
-
-unsigned char
-bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
-		      const_bitmap kill)
-{
-  bitmap_element *dst_elt = dst->first;
-  const bitmap_element *a_elt = a->first;
-  const bitmap_element *b_elt = b->first;
-  const bitmap_element *kill_elt = kill->first;
-  bitmap_element *dst_prev = ((void *) 0);
-  while (a_elt || b_elt)
-    {
-      if (b_elt && kill_elt && kill_elt->indx == b_elt->indx
-	  && (!a_elt || a_elt->indx >= b_elt->indx));
-      else
-	{
-	  bitmap_elt_ior (dst, dst_elt, dst_prev, a_elt, b_elt);
-	  if (a_elt && b_elt && a_elt->indx == b_elt->indx)
-	    ;
-	  else if (a_elt && (!b_elt || a_elt->indx <= b_elt->indx))
-	    a_elt = a_elt->next;
-	}
-    }
-}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
index 86d07ef9bdb..f3ca140bd26 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdelete-null-pointer-checks" } */
 /* { dg-skip-if "" keeps_null_pointer_checks } */
 
 void oof (void);
@@ -29,5 +29,5 @@  build_omp_regions_1 (basic_block bb, struct omp_region *parent,
 
 /* ARM Cortex-M defined LOGICAL_OP_NON_SHORT_CIRCUIT to false,
    so skip below test.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp-thread1" { target { ! arm_cortex_m } } } } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 1 "thread1" { target { ! arm_cortex_m } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp08.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp08.c
index c2da30b4b68..2c6742b76c7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp08.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp08.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details -fdisable-tree-thread1 -fdelete-null-pointer-checks" } */
 
 /* Compile with -fno-tree-fre -O2 to prevent CSEing *p.  */
 int
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
index a478a6981ac..0ef57d935e4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp-thread1-blocks-vops-details -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fdump-tree-ethread-details -fdelete-null-pointer-checks" } */
 
 void arf (void);
 
@@ -12,6 +12,6 @@  fu (char *p, int x)
     arf ();
 }
 
-/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp-thread1" { target { ! keeps_null_pointer_checks } } } } */
-/* { dg-final { scan-tree-dump-times "Threaded jump" 0 "vrp-thread1" { target {   keeps_null_pointer_checks } } } } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 1 "ethread" { target { ! keeps_null_pointer_checks } } } } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 0 "ethread" { target {   keeps_null_pointer_checks } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/uninit-pr89230-1.c b/gcc/testsuite/gcc.dg/uninit-pr89230-1.c
index 1c07c4f6d78..dfc87a5b1a0 100644
--- a/gcc/testsuite/gcc.dg/uninit-pr89230-1.c
+++ b/gcc/testsuite/gcc.dg/uninit-pr89230-1.c
@@ -8,7 +8,8 @@  struct S { int i, j; };
 
 int g (void)
 {
-  struct S *p = f (), *q;
+  struct S *p = f ();
+  struct S *q; // { dg-bogus "may be used uninitialized" "uninitialized" { xfail *-*-* } }
 
   if (p->i || !(q = f ()) || p->j != q->i)
    {
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index d94e3b962db..edb396b3d6f 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -374,8 +374,8 @@  back_threader::find_paths_to_names (basic_block bb, bitmap interesting)
       return false;
     }
 
-  // Try to resolve the path with nothing but ranger knowledge.
-  if (m_resolve && m_path.length () > 1 && maybe_register_path ())
+  // Try to resolve the path without looking back.
+  if (m_path.length () > 1 && maybe_register_path ())
     {
       m_path.pop ();
       m_visited_bbs.remove (bb);