Avoid invalid loop transformations in jump threading registry.

Message ID 20210923111531.964654-1-aldyh@redhat.com
State Committed
Commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Headers
Series Avoid invalid loop transformations in jump threading registry. |

Commit Message

Aldy Hernandez Sept. 23, 2021, 11:15 a.m. UTC
  My upcoming improvements to the forward jump threader make it thread
more aggressively.  In investigating some "regressions", I noticed
that it has always allowed threading through empty latches and across
loop boundaries.  As we have discussed recently, this should be avoided
until after loop optimizations have run their course.

Note that this wasn't much of a problem before because DOM/VRP
couldn't find these opportunities, but with a smarter solver, we trip
over them more easily.

Because the forward threader doesn't have an independent localized cost
model like the new threader (profitable_path_p), it is difficult to
catch these things at discovery.  However, we can catch them at
registration time, with the added benefit that all the threaders
(forward and backward) can share the handcuffs.

This patch is an adaptation of what we do in the backward threader, but
it is not meant to catch everything we do there, as some of the
restrictions there are due to limitations of the different block
copiers (for example, the generic copier does not re-use existing
threading paths).

We could ideally remove the now redundant bits in profitable_path_p, but
I would prefer not to for two reasons.  First, the backward threader uses
profitable_path_p as it discovers paths to avoid discovering paths in
unprofitable directions.  Second, I would like to merge all the forward
cost restrictions into the profitability class in the backward threader,
not the other way around.  Alas, that reshuffling will have to wait for
the next release.

As usual, there are quite a few tests that needed adjustments.  It seems
we were quite happily threading improper scenarios.  With most of them,
as can be seen in pr77445-2.c, we're merely shifting the threading to
after loop optimizations.

Tested on x86-64 Linux.

OK for trunk?

p.s. "Sure, sounds like fun... how hard can improving the threaders be?"

gcc/ChangeLog:

	* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
	New.
	(jt_path_registry::register_jump_thread): Call
	cancel_invalid_paths.
	* tree-ssa-threadupdate.h (class jt_path_registry): Add
	cancel_invalid_paths.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/20030714-2.c: Adjust.
	* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
	* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
	* gcc.dg/vect/bb-slp-16.c: Adjust.
---
 gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c    |  7 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c     | 19 ++++--
 gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c     |  4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-18.c       |  4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        |  4 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-16.c         |  7 --
 gcc/tree-ssa-threadupdate.c                   | 67 +++++++++++++++----
 gcc/tree-ssa-threadupdate.h                   |  1 +
 8 files changed, 78 insertions(+), 35 deletions(-)
  

Comments

Jeff Law Sept. 23, 2021, 4:10 p.m. UTC | #1
On 9/23/2021 5:15 AM, Aldy Hernandez wrote:
> My upcoming improvements to the forward jump threader make it thread
> more aggressively.  In investigating some "regressions", I noticed
> that it has always allowed threading through empty latches and across
> loop boundaries.  As we have discussed recently, this should be avoided
> until after loop optimizations have run their course.
>
> Note that this wasn't much of a problem before because DOM/VRP
> couldn't find these opportunities, but with a smarter solver, we trip
> over them more easily.
We used to be much more aggressive in this space -- but we removed the 
equivalency tracking on backedges in the main part of DOM which had the 
side effect to reducing the number of threads related to back edges in 
loops.

Of course that was generally a positive thing given the issues we've 
been discussing.


>
> Because the forward threader doesn't have an independent localized cost
> model like the new threader (profitable_path_p), it is difficult to
> catch these things at discovery.  However, we can catch them at
> registration time, with the added benefit that all the threaders
> (forward and backward) can share the handcuffs.
In an ideal world profitability and correctness would be separated -- 
but they're still intertwined and I don't think this makes that 
situation particularly worse.  And I do like that having a single choke 
point.

Obviously you're cleaning this up, so I think a significant degree of 
freedom should be given here....


>
> This patch is an adaptation of what we do in the backward threader, but
> it is not meant to catch everything we do there, as some of the
> restrictions there are due to limitations of the different block
> copiers (for example, the generic copier does not re-use existing
> threading paths).
>
> We could ideally remove the now redundant bits in profitable_path_p, but
> I would prefer not to for two reasons.  First, the backward threader uses
> profitable_path_p as it discovers paths to avoid discovering paths in
> unprofitable directions.  Second, I would like to merge all the forward
> cost restrictions into the profitability class in the backward threader,
> not the other way around.  Alas, that reshuffling will have to wait for
> the next release.
>
> As usual, there are quite a few tests that needed adjustments.  It seems
> we were quite happily threading improper scenarios.  With most of them,
> as can be seen in pr77445-2.c, we're merely shifting the threading to
> after loop optimizations.
>
> Tested on x86-64 Linux.
>
> OK for trunk?
>
> p.s. "Sure, sounds like fun... how hard can improving the threaders be?"
>
> gcc/ChangeLog:
>
> 	* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
> 	New.
> 	(jt_path_registry::register_jump_thread): Call
> 	cancel_invalid_paths.
> 	* tree-ssa-threadupdate.h (class jt_path_registry): Add
> 	cancel_invalid_paths.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/tree-ssa/20030714-2.c: Adjust.
> 	* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
> 	* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
> 	* gcc.dg/vect/bb-slp-16.c: Adjust.
OK
jeff
  
Aldy Hernandez Sept. 24, 2021, 11:34 a.m. UTC | #2
On 9/23/21 6:10 PM, Jeff Law wrote:
> 
> 
> On 9/23/2021 5:15 AM, Aldy Hernandez wrote:
>> My upcoming improvements to the forward jump threader make it thread
>> more aggressively.  In investigating some "regressions", I noticed
>> that it has always allowed threading through empty latches and across
>> loop boundaries.  As we have discussed recently, this should be avoided
>> until after loop optimizations have run their course.
>>
>> Note that this wasn't much of a problem before because DOM/VRP
>> couldn't find these opportunities, but with a smarter solver, we trip
>> over them more easily.
> We used to be much more aggressive in this space -- but we removed the 
> equivalency tracking on backedges in the main part of DOM which had the 
> side effect to reducing the number of threads related to back edges in 
> loops.

I thought we couldn't thread through back edges at all in the old 
threader, or are we talking about the same thing?  We have a hard fail 
on backedge thread attempts for anything but the backward threader and 
its custom copier.

> 
> Of course that was generally a positive thing given the issues we've 
> been discussing.

Yeah.  These tweaks have reduced the number of jump threads in my 
bootstrap .ii by 6%, so a considerable amount.  But they were 
problematic threading paths to begin with.

For example, it removed the regression introduced by the backward 
threader rewrite in gcc.dg/vect/bb-slp-16.c.  Interestingly, for all the 
checks we do in the backward threader, some threading through loop 
boundaries seep through.  In particular the check for loop crossing 
excludes the taken edge, which IMO is a mistake.  If the entire path is 
in a loop, but the taken edge points to another loop, that by definition 
is a loop crossing.  Note, we have an exception for the first block in a 
path being in another loop, but that's something else ;-).

Anywhoo... we're catching it now.  We really should clean this up and 
merge the differing implementations.  But I'm way over my time budget 
for this ;-).

> 
> 
>>
>> Because the forward threader doesn't have an independent localized cost
>> model like the new threader (profitable_path_p), it is difficult to
>> catch these things at discovery.  However, we can catch them at
>> registration time, with the added benefit that all the threaders
>> (forward and backward) can share the handcuffs.
> In an ideal world profitability and correctness would be separated -- 
> but they're still intertwined and I don't think this makes that 
> situation particularly worse.  And I do like that having a single choke 
> point.

Huh, I hadn't though about it that way, but you're right.  The 
profitable_path_p code is catching both correctness as well as 
profitability issues.  It seems all the profitability stuff is keyed by 
param*fsm* compile options, though.  It should be easy enough to separate.

> 
> Obviously you're cleaning this up, so I think a significant degree of 
> freedom should be given here....

Much appreciated.

Aldy
  
Christophe Lyon Sept. 24, 2021, 1:15 p.m. UTC | #3
On 23/09/2021 13:15, Aldy Hernandez via Gcc-patches wrote:
> My upcoming improvements to the forward jump threader make it thread
> more aggressively.  In investigating some "regressions", I noticed
> that it has always allowed threading through empty latches and across
> loop boundaries.  As we have discussed recently, this should be avoided
> until after loop optimizations have run their course.
>
> Note that this wasn't much of a problem before because DOM/VRP
> couldn't find these opportunities, but with a smarter solver, we trip
> over them more easily.
>
> Because the forward threader doesn't have an independent localized cost
> model like the new threader (profitable_path_p), it is difficult to
> catch these things at discovery.  However, we can catch them at
> registration time, with the added benefit that all the threaders
> (forward and backward) can share the handcuffs.
>
> This patch is an adaptation of what we do in the backward threader, but
> it is not meant to catch everything we do there, as some of the
> restrictions there are due to limitations of the different block
> copiers (for example, the generic copier does not re-use existing
> threading paths).
>
> We could ideally remove the now redundant bits in profitable_path_p, but
> I would prefer not to for two reasons.  First, the backward threader uses
> profitable_path_p as it discovers paths to avoid discovering paths in
> unprofitable directions.  Second, I would like to merge all the forward
> cost restrictions into the profitability class in the backward threader,
> not the other way around.  Alas, that reshuffling will have to wait for
> the next release.
>
> As usual, there are quite a few tests that needed adjustments.  It seems
> we were quite happily threading improper scenarios.  With most of them,
> as can be seen in pr77445-2.c, we're merely shifting the threading to
> after loop optimizations.
>
> Tested on x86-64 Linux.
>
> OK for trunk?
>
> p.s. "Sure, sounds like fun... how hard can improving the threaders be?"
>
> gcc/ChangeLog:
>
> 	* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
> 	New.
> 	(jt_path_registry::register_jump_thread): Call
> 	cancel_invalid_paths.
> 	* tree-ssa-threadupdate.h (class jt_path_registry): Add
> 	cancel_invalid_paths.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/tree-ssa/20030714-2.c: Adjust.
> 	* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
> 	* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
> 	* gcc.dg/vect/bb-slp-16.c: Adjust.


After your commit r12-3876, I've noticed that some of the updated tests 
fail on some arm targets:

FAIL: gcc:gcc.dg/tree-ssa/tree-ssa.exp=gcc.dg/tree-ssa/pr66752-3.c scan-tree-dump-not thread3 "if .flag"
FAIL: gcc:gcc.dg/tree-ssa/tree-ssa.exp=gcc.dg/tree-ssa/pr77445-2.c scan-tree-dump thread1 "Jumps threaded: 9"

when cpu is:

* cortex-a5 (fpu = vfpv3-d16-fp16)

* cortex-m0, m3, m4, m7 and m55 (with assorted -march/-mfloat-abi)

See 
https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/r12-3876-g4a960d548b7d7d942f316c5295f6d849b74214f5/report-build-info.html

for more details (you can ignore the regressions in libstdc++, they are 
related to random timeouts)

Thanks,


Christophe


> ---
>   gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c    |  7 +-
>   gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c     | 19 ++++--
>   gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c     |  4 +-
>   .../gcc.dg/tree-ssa/ssa-dom-thread-18.c       |  4 +-
>   .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        |  4 +-
>   gcc/testsuite/gcc.dg/vect/bb-slp-16.c         |  7 --
>   gcc/tree-ssa-threadupdate.c                   | 67 +++++++++++++++----
>   gcc/tree-ssa-threadupdate.h                   |  1 +
>   8 files changed, 78 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
> index eb663f2ff5b..9585ff11307 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
> @@ -32,7 +32,8 @@ get_alias_set (t)
>       }
>   }
>   
> -/* There should be exactly three IF conditionals if we thread jumps
> -   properly.  */
> -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
> +/* There should be exactly 4 IF conditionals if we thread jumps
> +   properly.  There used to be 3, but one thread was crossing
> +   loops.  */
> +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */
>    
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
> index e1464e21170..922a331b217 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */
>   
>   extern int status, pt;
>   extern int count;
> @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a)
>      pt--;
>   }
>   
> -/* There are 4 jump threading opportunities, all of which will be
> -   realized, which will eliminate testing of FLAG, completely.  */
> -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */
> +/* There are 2 jump threading opportunities (which don't cross loops),
> +   all of which will be realized, which will eliminate testing of
> +   FLAG, completely.  */
> +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */
>   
> -/* There should be no assignments or references to FLAG, verify they're
> -   eliminated as early as possible.  */
> -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */
> +/* We used to remove references to FLAG by DCE2, but this was
> +   depending on early threaders threading through loop boundaries
> +   (which we shouldn't do).  However, the late threading passes, which
> +   run after loop optimizations , can successfully eliminate the
> +   references to FLAG.  Verify that ther are no references by the late
> +   threading passes.  */
> +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
> index f9fc212f49e..01a0f1f197d 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
> @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) {
>      aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
>      to change decisions in switch expansion which in turn can expose new
>      jump threading opportunities.  Skip the later tests on aarch64.  */
> -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
> -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */
> +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */
> +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */
>   /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
>   /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
>   /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
> index 60d4f76f076..2d78d045516 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
> @@ -21,5 +21,7 @@
>   	 condition.
>   
>      All the cases are picked up by VRP1 as jump threads.  */
> -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
> +
> +/* There used to be 6 jump threads found by thread1, but they all
> +   depended on threading through distinct loops in ethread.  */
>   /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> index e3d4b311c03..16abcde5053 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> @@ -1,8 +1,8 @@
>   /* { dg-do compile } */
>   /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
>   
> -/* { dg-final { scan-tree-dump "Jumps threaded: 18"  "thread1" } } */
> -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump "Jumps threaded: 12"  "thread1" } } */
> +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */
>   /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
>   
>   /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> index 664e93e9b60..e68a9b62535 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> @@ -1,8 +1,5 @@
>   /* { dg-require-effective-target vect_int } */
>   
> -/* See note below as to why we disable threading.  */
> -/* { dg-additional-options "-fdisable-tree-thread1" } */
> -
>   #include <stdarg.h>
>   #include "tree-vect.h"
>   
> @@ -30,10 +27,6 @@ main1 (int dummy)
>         *pout++ = *pin++ + a;
>         *pout++ = *pin++ + a;
>         *pout++ = *pin++ + a;
> -      /* In some architectures like ppc64, jump threading may thread
> -	 the iteration where i==0 such that we no longer optimize the
> -	 BB.  Another alternative to disable jump threading would be
> -	 to wrap the read from `i' into a function returning i.  */
>         if (arr[i] = i)
>           a = i;
>         else
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index baac11280fa..2b9b8f81274 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> @@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers)
>     return retval;
>   }
>   
> +bool
> +jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
> +{
> +  gcc_checking_assert (!path.is_empty ());
> +  edge taken_edge = path[path.length () - 1]->e;
> +  loop_p loop = taken_edge->src->loop_father;
> +  bool seen_latch = false;
> +  bool path_crosses_loops = false;
> +
> +  for (unsigned int i = 0; i < path.length (); i++)
> +    {
> +      edge e = path[i]->e;
> +
> +      if (e == NULL)
> +	{
> +	  // NULL outgoing edges on a path can happen for jumping to a
> +	  // constant address.
> +	  cancel_thread (&path, "Found NULL edge in jump threading path");
> +	  return true;
> +	}
> +
> +      if (loop->latch == e->src || loop->latch == e->dest)
> +	seen_latch = true;
> +
> +      // The first entry represents the block with an outgoing edge
> +      // that we will redirect to the jump threading path.  Thus we
> +      // don't care about that block's loop father.
> +      if ((i > 0 && e->src->loop_father != loop)
> +	  || e->dest->loop_father != loop)
> +	path_crosses_loops = true;
> +
> +      if (flag_checking && !m_backedge_threads)
> +	gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
> +    }
> +
> +  if (cfun->curr_properties & PROP_loop_opts_done)
> +    return false;
> +
> +  if (seen_latch && empty_block_p (loop->latch))
> +    {
> +      cancel_thread (&path, "Threading through latch before loop opts "
> +		     "would create non-empty latch");
> +      return true;
> +    }
> +  if (path_crosses_loops)
> +    {
> +      cancel_thread (&path, "Path crosses loops");
> +      return true;
> +    }
> +  return false;
> +}
> +
>   /* Register a jump threading opportunity.  We queue up all the jump
>      threading opportunities discovered by a pass and update the CFG
>      and SSA form all at once.
> @@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path)
>         return false;
>       }
>   
> -  /* First make sure there are no NULL outgoing edges on the jump threading
> -     path.  That can happen for jumping to a constant address.  */
> -  for (unsigned int i = 0; i < path->length (); i++)
> -    {
> -      if ((*path)[i]->e == NULL)
> -	{
> -	  cancel_thread (path, "Found NULL edge in jump threading path");
> -	  return false;
> -	}
> -
> -      if (flag_checking && !m_backedge_threads)
> -	gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0);
> -    }
> +  if (cancel_invalid_paths (*path))
> +    return false;
>   
>     if (dump_file && (dump_flags & TDF_DETAILS))
>       dump_jump_thread_path (dump_file, *path, true);
> diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
> index 8b48a671212..d68795c9f27 100644
> --- a/gcc/tree-ssa-threadupdate.h
> +++ b/gcc/tree-ssa-threadupdate.h
> @@ -75,6 +75,7 @@ protected:
>     unsigned long m_num_threaded_edges;
>   private:
>     virtual bool update_cfg (bool peel_loop_headers) = 0;
> +  bool cancel_invalid_paths (vec<jump_thread_edge *> &path);
>     jump_thread_path_allocator m_allocator;
>     // True if threading through back edges is allowed.  This is only
>     // allowed in the generic copier in the backward threader.
  
Jeff Law Sept. 24, 2021, 4:09 p.m. UTC | #4
On 9/24/2021 5:34 AM, Aldy Hernandez wrote:
>
>
> On 9/23/21 6:10 PM, Jeff Law wrote:
>>
>>
>> On 9/23/2021 5:15 AM, Aldy Hernandez wrote:
>>> My upcoming improvements to the forward jump threader make it thread
>>> more aggressively.  In investigating some "regressions", I noticed
>>> that it has always allowed threading through empty latches and across
>>> loop boundaries.  As we have discussed recently, this should be avoided
>>> until after loop optimizations have run their course.
>>>
>>> Note that this wasn't much of a problem before because DOM/VRP
>>> couldn't find these opportunities, but with a smarter solver, we trip
>>> over them more easily.
>> We used to be much more aggressive in this space -- but we removed 
>> the equivalency tracking on backedges in the main part of DOM which 
>> had the side effect to reducing the number of threads related to back 
>> edges in loops.
>
> I thought we couldn't thread through back edges at all in the old 
> threader, or are we talking about the same thing?  We have a hard fail 
> on backedge thread attempts for anything but the backward threader and 
> its custom copier.
We used to have it in the distant past IIRC.

Jeff
  

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
index eb663f2ff5b..9585ff11307 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
@@ -32,7 +32,8 @@  get_alias_set (t)
     }
 }
 
-/* There should be exactly three IF conditionals if we thread jumps
-   properly.  */
-/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
+/* There should be exactly 4 IF conditionals if we thread jumps
+   properly.  There used to be 3, but one thread was crossing
+   loops.  */
+/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */
  
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index e1464e21170..922a331b217 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */
 
 extern int status, pt;
 extern int count;
@@ -32,10 +32,15 @@  foo (int N, int c, int b, int *a)
    pt--;
 }
 
-/* There are 4 jump threading opportunities, all of which will be
-   realized, which will eliminate testing of FLAG, completely.  */
-/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */
+/* There are 2 jump threading opportunities (which don't cross loops),
+   all of which will be realized, which will eliminate testing of
+   FLAG, completely.  */
+/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */
 
-/* There should be no assignments or references to FLAG, verify they're
-   eliminated as early as possible.  */
-/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */
+/* We used to remove references to FLAG by DCE2, but this was
+   depending on early threaders threading through loop boundaries
+   (which we shouldn't do).  However, the late threading passes, which
+   run after loop optimizations , can successfully eliminate the
+   references to FLAG.  Verify that ther are no references by the late
+   threading passes.  */
+/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
index f9fc212f49e..01a0f1f197d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
@@ -123,8 +123,8 @@  enum STATES FMS( u8 **in , u32 *transitions) {
    aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
    to change decisions in switch expansion which in turn can expose new
    jump threading opportunities.  Skip the later tests on aarch64.  */
-/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */
+/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */
 /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
 /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
 /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
index 60d4f76f076..2d78d045516 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
@@ -21,5 +21,7 @@ 
 	 condition.
 
    All the cases are picked up by VRP1 as jump threads.  */
-/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
+
+/* There used to be 6 jump threads found by thread1, but they all
+   depended on threading through distinct loops in ethread.  */
 /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index e3d4b311c03..16abcde5053 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,8 +1,8 @@ 
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
 
-/* { dg-final { scan-tree-dump "Jumps threaded: 18"  "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 12"  "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
 
 /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index 664e93e9b60..e68a9b62535 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -1,8 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
 
-/* See note below as to why we disable threading.  */
-/* { dg-additional-options "-fdisable-tree-thread1" } */
-
 #include <stdarg.h>
 #include "tree-vect.h"
 
@@ -30,10 +27,6 @@  main1 (int dummy)
       *pout++ = *pin++ + a;
       *pout++ = *pin++ + a;
       *pout++ = *pin++ + a;
-      /* In some architectures like ppc64, jump threading may thread
-	 the iteration where i==0 such that we no longer optimize the
-	 BB.  Another alternative to disable jump threading would be
-	 to wrap the read from `i' into a function returning i.  */
       if (arr[i] = i)
         a = i;
       else
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index baac11280fa..2b9b8f81274 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2757,6 +2757,58 @@  fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers)
   return retval;
 }
 
+bool
+jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
+{
+  gcc_checking_assert (!path.is_empty ());
+  edge taken_edge = path[path.length () - 1]->e;
+  loop_p loop = taken_edge->src->loop_father;
+  bool seen_latch = false;
+  bool path_crosses_loops = false;
+
+  for (unsigned int i = 0; i < path.length (); i++)
+    {
+      edge e = path[i]->e;
+
+      if (e == NULL)
+	{
+	  // NULL outgoing edges on a path can happen for jumping to a
+	  // constant address.
+	  cancel_thread (&path, "Found NULL edge in jump threading path");
+	  return true;
+	}
+
+      if (loop->latch == e->src || loop->latch == e->dest)
+	seen_latch = true;
+
+      // The first entry represents the block with an outgoing edge
+      // that we will redirect to the jump threading path.  Thus we
+      // don't care about that block's loop father.
+      if ((i > 0 && e->src->loop_father != loop)
+	  || e->dest->loop_father != loop)
+	path_crosses_loops = true;
+
+      if (flag_checking && !m_backedge_threads)
+	gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
+    }
+
+  if (cfun->curr_properties & PROP_loop_opts_done)
+    return false;
+
+  if (seen_latch && empty_block_p (loop->latch))
+    {
+      cancel_thread (&path, "Threading through latch before loop opts "
+		     "would create non-empty latch");
+      return true;
+    }
+  if (path_crosses_loops)
+    {
+      cancel_thread (&path, "Path crosses loops");
+      return true;
+    }
+  return false;
+}
+
 /* Register a jump threading opportunity.  We queue up all the jump
    threading opportunities discovered by a pass and update the CFG
    and SSA form all at once.
@@ -2776,19 +2828,8 @@  jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path)
       return false;
     }
 
-  /* First make sure there are no NULL outgoing edges on the jump threading
-     path.  That can happen for jumping to a constant address.  */
-  for (unsigned int i = 0; i < path->length (); i++)
-    {
-      if ((*path)[i]->e == NULL)
-	{
-	  cancel_thread (path, "Found NULL edge in jump threading path");
-	  return false;
-	}
-
-      if (flag_checking && !m_backedge_threads)
-	gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0);
-    }
+  if (cancel_invalid_paths (*path))
+    return false;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     dump_jump_thread_path (dump_file, *path, true);
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 8b48a671212..d68795c9f27 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -75,6 +75,7 @@  protected:
   unsigned long m_num_threaded_edges;
 private:
   virtual bool update_cfg (bool peel_loop_headers) = 0;
+  bool cancel_invalid_paths (vec<jump_thread_edge *> &path);
   jump_thread_path_allocator m_allocator;
   // True if threading through back edges is allowed.  This is only
   // allowed in the generic copier in the backward threader.