Replace VRP threader with a hybrid forward threader.

Message ID 20210924154653.1108992-1-aldyh@redhat.com
State New
Headers
Series Replace VRP threader with a hybrid forward threader. |

Commit Message

Aldy Hernandez Sept. 24, 2021, 3:46 p.m. UTC
  This patch implements the new hybrid forward threader and replaces the
embedded VRP threader with it.

With all the pieces that have gone in, the implementation of the hybrid
threader is straightforward: convert the current state into
SSA imports that the solver will understand, and let the path solver
precompute ranges and relations for the path.  After this setup is done,
we can use the range_query API to solve gimple statements in the threader.
The forward threader is now engine agnostic so there are no changes to
the threader per se.

I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
because they will also be used in the evrp removal of the DOM/threader,
which is my next task.

Most of the patch, is actually test changes.  I have gone through every
single one and verified that we're correct.  Most were trivial dump
file name changes, but others required going through the IL an
certifying that the different IL was expected.

For example, in pr59597.c, we have one less thread because the
ASSERT_EXPR was getting in the way, and making it seem like things were
not crossing loops.  The hybrid threader sees the correct representation
of the IL, and avoids threading this one case.

The final numbers are a 12.16% improvement in jump threads immediately
after VRP, and a 0.82% improvement in overall jump threads.  The
performance drop is 0.6% (plus the 1.43% hit from moving the embedded
threader into its own pass).  As I've said, I'd prefer to keep the
threader in its own pass, but if this is an issue, we can address this
with a shared ranger when VRP is replaced with an evrp instance
(upcoming).

Note, that these numbers are slightly different than what I originally
posted.  A few correctness tweaks, plus restricting loop threads, made
the difference.  That being said, I was aiming for par.  A 12% gain is
just gravy ;-).  When we merge the threaders, we should see even better
numbers-- and we'll have the benefit of an entire release stress testing
the solver.

As I mentioned in my introductory note, paths ending in MEM_REF
conditional are missing.  In reality, this didn't make a difference, as
it was so rare.  However, as a follow-up, I will distill a test and add
a suitable PR to keep us honest.

There is a one-line change to libgomp/team.c silencing a new used
uninitialized warning.  As my previous work with the threaders has
shown, warnings flare up after each improvement to jump threading.  I
expect this to be no different.  I've promised Jakub to investigate
fully, so I will analyze and add the appropriate PR for the warning
experts.

Oh yeah, the new pass dump is called vrp-threader[12] to match each
VRP[12] pass.  However, there's no reason for it to either be named
vrp-threader, or for it to live in tree-vrp.c.

Tested on x86-64 Linux.

OK?

p.s. "Did I say 5 weeks?  My bad, I meant 5 months."

gcc/ChangeLog:

	* passes.def (pass_vrp_threader): New.
	* tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader.
	* tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New.
	(hybrid_jt_simplifier::hybrid_jt_simplifier): New.
	(hybrid_jt_simplifier::simplify): New.
	(hybrid_jt_simplifier::compute_ranges_from_state): New.
	* tree-ssa-threadedge.h (class hybrid_jt_state): New.
	(class hybrid_jt_simplifier): New.
	* tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump
	threader.
	(class hybrid_threader): New.
	(hybrid_threader::hybrid_threader): New.
	(hybrid_threader::~hybrid_threader): New.
	(hybrid_threader::before_dom_children): New.
	(hybrid_threader::after_dom_children): New.
	(execute_vrp_threader): New.
	(class pass_vrp_threader): New.
	(make_pass_vrp_threader): New.

libgomp/ChangeLog:

	* team.c: Initialize start_data.
	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust.
	* testsuite/libgomp.graphite/force-parallel-8.c: Adjust.

gcc/testsuite/ChangeLog:

	* gcc.dg/torture/pr55107.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
	* gcc.dg/tree-ssa/pr21559.c: Adjust.
	* gcc.dg/tree-ssa/pr59597.c: Adjust.
	* gcc.dg/tree-ssa/pr61839_1.c: Adjust.
	* gcc.dg/tree-ssa/pr61839_3.c: Adjust.
	* gcc.dg/tree-ssa/pr71437.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Adjust.
	* gcc.dg/tree-ssa/ssa-thread-14.c: Adjust.
	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Adjust.
	* gcc.dg/tree-ssa/vrp106.c: Adjust.
	* gcc.dg/tree-ssa/vrp55.c: Adjust.
---
 gcc/passes.def                                |   2 +
 gcc/testsuite/gcc.dg/torture/pr55107.c        |   2 +-
 .../gcc.dg/tree-ssa/phi_on_compare-1.c        |   4 +-
 .../gcc.dg/tree-ssa/phi_on_compare-2.c        |   4 +-
 .../gcc.dg/tree-ssa/phi_on_compare-3.c        |   4 +-
 .../gcc.dg/tree-ssa/phi_on_compare-4.c        |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21559.c       |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr59597.c       |  13 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c     |  10 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c     |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr71437.c       |   4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-11.c       |   2 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-16.c       |   2 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-18.c       |   4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-2a.c       |   6 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-4.c        |   5 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c |   4 +-
 .../gcc.dg/tree-ssa/ssa-vrp-thread-1.c        |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp106.c        |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp55.c         |   2 +-
 gcc/tree-pass.h                               |   1 +
 gcc/tree-ssa-threadedge.c                     |  71 +++++++++
 gcc/tree-ssa-threadedge.h                     |  20 +++
 gcc/tree-vrp.c                                | 143 +++++++++++++++---
 libgomp/team.c                                |   2 +-
 .../libgomp.graphite/force-parallel-4.c       |   2 +-
 .../libgomp.graphite/force-parallel-8.c       |   2 +-
 27 files changed, 268 insertions(+), 61 deletions(-)
  

Comments

Bernhard Reutner-Fischer Sept. 25, 2021, 7:25 p.m. UTC | #1
On Fri, 24 Sep 2021 17:46:53 +0200
Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:

> p.s. "Did I say 5 weeks?  My bad, I meant 5 months."

heh. units (.oO~"xkcd.com/1047/")

> +static unsigned int
> +execute_vrp_threader (function *fun)
> +{
> +  hybrid_threader threader;
> +  threader.thread_jumps (fun);
> +  threader.thread_through_all_blocks ();
> +  return 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_vrp_threader =
> +{
> +  GIMPLE_PASS, /* type */
> +  "vrp-thread", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_TREE_VRP, /* tv_id */
> +  PROP_ssa, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */
> +};

So shouldn't non-jumpy or flat code avoid the cleanup_cfg or
update_ssa iff neither the function nor anything else was threaded?

thanks,
  
Jeff Law Sept. 27, 2021, 3:01 p.m. UTC | #2
On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
> This patch implements the new hybrid forward threader and replaces the
> embedded VRP threader with it.
But most importantly, it pulls it out of the VRP pass as we no longer 
need the VRP data or ASSERT_EXPRs.

>
> With all the pieces that have gone in, the implementation of the hybrid
> threader is straightforward: convert the current state into
> SSA imports that the solver will understand, and let the path solver
> precompute ranges and relations for the path.  After this setup is done,
> we can use the range_query API to solve gimple statements in the threader.
> The forward threader is now engine agnostic so there are no changes to
> the threader per se.
So the big question is do we think it's going to be this clean when we 
try to divorce the threading from DOM?

>
> I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
> because they will also be used in the evrp removal of the DOM/threader,
> which is my next task.
Sweet.

>
> Most of the patch, is actually test changes.  I have gone through every
> single one and verified that we're correct.  Most were trivial dump
> file name changes, but others required going through the IL an
> certifying that the different IL was expected.
>
> For example, in pr59597.c, we have one less thread because the
> ASSERT_EXPR was getting in the way, and making it seem like things were
> not crossing loops.  The hybrid threader sees the correct representation
> of the IL, and avoids threading this one case.
>
> The final numbers are a 12.16% improvement in jump threads immediately
> after VRP, and a 0.82% improvement in overall jump threads.  The
> performance drop is 0.6% (plus the 1.43% hit from moving the embedded
> threader into its own pass).  As I've said, I'd prefer to keep the
> threader in its own pass, but if this is an issue, we can address this
> with a shared ranger when VRP is replaced with an evrp instance
> (upcoming).
Presumably we're also seeing a cannibalization of threads from later 
passes.   And just to be clear, this is good.

And the big question, is the pass running after VRP2 doing anything 
particularly useful?  Do we want to try and kill it now, or later?


> As I mentioned in my introductory note, paths ending in MEM_REF
> conditional are missing.  In reality, this didn't make a difference, as
> it was so rare.  However, as a follow-up, I will distill a test and add
> a suitable PR to keep us honest.
Yea, I don't think these are going to be a notable issue for the 
threaders that were previously run out of VRP.  I'm less sure about DOM 
though.

>
> There is a one-line change to libgomp/team.c silencing a new used
> uninitialized warning.  As my previous work with the threaders has
> shown, warnings flare up after each improvement to jump threading.  I
> expect this to be no different.  I've promised Jakub to investigate
> fully, so I will analyze and add the appropriate PR for the warning
> experts.
ACK.


>
> Oh yeah, the new pass dump is called vrp-threader[12] to match each
> VRP[12] pass.  However, there's no reason for it to either be named
> vrp-threader, or for it to live in tree-vrp.c.
>
> Tested on x86-64 Linux.
>
> OK?
>
> p.s. "Did I say 5 weeks?  My bad, I meant 5 months."
>
> gcc/ChangeLog:
>
> 	* passes.def (pass_vrp_threader): New.
> 	* tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader.
> 	* tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New.
> 	(hybrid_jt_simplifier::hybrid_jt_simplifier): New.
> 	(hybrid_jt_simplifier::simplify): New.
> 	(hybrid_jt_simplifier::compute_ranges_from_state): New.
> 	* tree-ssa-threadedge.h (class hybrid_jt_state): New.
> 	(class hybrid_jt_simplifier): New.
> 	* tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump
> 	threader.
> 	(class hybrid_threader): New.
> 	(hybrid_threader::hybrid_threader): New.
> 	(hybrid_threader::~hybrid_threader): New.
> 	(hybrid_threader::before_dom_children): New.
> 	(hybrid_threader::after_dom_children): New.
> 	(execute_vrp_threader): New.
> 	(class pass_vrp_threader): New.
> 	(make_pass_vrp_threader): New.
>
> libgomp/ChangeLog:
>
> 	* team.c: Initialize start_data.
> 	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust.
> 	* testsuite/libgomp.graphite/force-parallel-8.c: Adjust.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/torture/pr55107.c: Adjust.
> 	* gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust.
> 	* gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust.
> 	* gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust.
> 	* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
> 	* gcc.dg/tree-ssa/pr21559.c: Adjust.
> 	* gcc.dg/tree-ssa/pr59597.c: Adjust.
> 	* gcc.dg/tree-ssa/pr61839_1.c: Adjust.
> 	* gcc.dg/tree-ssa/pr61839_3.c: Adjust.
> 	* gcc.dg/tree-ssa/pr71437.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-thread-14.c: Adjust.
> 	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Adjust.
> 	* gcc.dg/tree-ssa/vrp106.c: Adjust.
> 	* gcc.dg/tree-ssa/vrp55.c: Adjust.
OK
jeff
  
Aldy Hernandez Sept. 27, 2021, 3:27 p.m. UTC | #3
On 9/27/21 5:01 PM, Jeff Law wrote:
> 
> 
> On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
>> This patch implements the new hybrid forward threader and replaces the
>> embedded VRP threader with it.
> But most importantly, it pulls it out of the VRP pass as we no longer 
> need the VRP data or ASSERT_EXPRs.

Yes, I have a follow-up patch removing the old mini-pass.

> 
>>
>> With all the pieces that have gone in, the implementation of the hybrid
>> threader is straightforward: convert the current state into
>> SSA imports that the solver will understand, and let the path solver
>> precompute ranges and relations for the path.  After this setup is done,
>> we can use the range_query API to solve gimple statements in the 
>> threader.
>> The forward threader is now engine agnostic so there are no changes to
>> the threader per se.
> So the big question is do we think it's going to be this clean when we 
> try to divorce the threading from DOM?

Interestingly, yes.  With all the refactoring I've done, it turns out 
that divorcing evrp from the DOM threader is a matter of having 
dom_jt_simplifier inherit from hybrid_jt_simplifier instead of the base 
class.  Then we have simplify() look at the const_copies/avails, 
otherwise let the hybrid simplifier do its thing.  Yes, I was amazed too.

As usual there are caveats:

First, notice that we'd still depend on const_copies/avails, because 
we'd need them for floats anyhow.  But this has the added benefit of 
catching a few things in the presence of the IL changing from under us.

Second, it turns out that DOM has other uses of evrp that need to be 
addressed-- particularly its use of evrp to do its simple copy prop.

Be that as it may, none of these are show stoppers.  I have a proof of 
concept that converts everything with a few lines of code.

The big issue now is performance.  Plugging in the full ranger makes it 
uncomfortably slower than just using evrp.  Andrew has some ideas for a 
super fast ranger that doesn't do full look-ups, so we have finally 
found a good use case for something we had in the back burner.

Now, numbers...

Converting the DOM threader to a hybrid client improves DOM threading 
counts by 4%, but it's all at the expense of other passes.  The total 
threading counts was unchanged (well, it got worse by -0.05%).  It 
doesn't look like there's any gain.  We're shuffling things around at 
this point.

> 
>>
>> I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
>> because they will also be used in the evrp removal of the DOM/threader,
>> which is my next task.
> Sweet.
> 
>>
>> Most of the patch, is actually test changes.  I have gone through every
>> single one and verified that we're correct.  Most were trivial dump
>> file name changes, but others required going through the IL an
>> certifying that the different IL was expected.
>>
>> For example, in pr59597.c, we have one less thread because the
>> ASSERT_EXPR was getting in the way, and making it seem like things were
>> not crossing loops.  The hybrid threader sees the correct representation
>> of the IL, and avoids threading this one case.
>>
>> The final numbers are a 12.16% improvement in jump threads immediately
>> after VRP, and a 0.82% improvement in overall jump threads.  The
>> performance drop is 0.6% (plus the 1.43% hit from moving the embedded
>> threader into its own pass).  As I've said, I'd prefer to keep the
>> threader in its own pass, but if this is an issue, we can address this
>> with a shared ranger when VRP is replaced with an evrp instance
>> (upcoming).
> Presumably we're also seeing a cannibalization of threads from later 
> passes.   And just to be clear, this is good.
> 
> And the big question, is the pass running after VRP2 doing anything 
> particularly useful?  Do we want to try and kill it now, or later?

Interesting question.  Perhaps if we convert DOM threading to a hybrid 
model, it will render the post-VRP threader completely useless.  Huhh... 
That could kill 2 birds with one stone... we get rid of a threading 
pass, and we don't need to worry about as much about the super-fast ranger.

Huh...good idea.  I will experiment.

Thanks.
Aldy
  
Aldy Hernandez Sept. 27, 2021, 4:07 p.m. UTC | #4
On 9/27/21 5:27 PM, Aldy Hernandez wrote:
> 
> 
> On 9/27/21 5:01 PM, Jeff Law wrote:
>>
>>
>> On 9/24/2021 9:46 AM, Aldy Hernandez wrote:

>> And the big question, is the pass running after VRP2 doing anything 
>> particularly useful?  Do we want to try and kill it now, or later?
> 
> Interesting question.  Perhaps if we convert DOM threading to a hybrid 
> model, it will render the post-VRP threader completely useless.  Huhh... 
> That could kill 2 birds with one stone... we get rid of a threading 
> pass, and we don't need to worry about as much about the super-fast ranger.

These are just a few of the threading passes at -O2:

a.c.192t.thread3   <-- bck threader
a.c.193t.dom3      <-- fwd threader
a.c.194t.strlen1
a.c.195t.thread4   <-- bck threader
a.c.196t.vrp2
a.c.197t.vrp-thread2 <-- fwd threader

That's almost 4 back to back threaders!

*pause for effect*

Aldy
  
Richard Biener Sept. 27, 2021, 5:28 p.m. UTC | #5
On September 27, 2021 6:07:40 PM GMT+02:00, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
>
>On 9/27/21 5:27 PM, Aldy Hernandez wrote:
>> 
>> 
>> On 9/27/21 5:01 PM, Jeff Law wrote:
>>>
>>>
>>> On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
>
>>> And the big question, is the pass running after VRP2 doing anything 
>>> particularly useful?  Do we want to try and kill it now, or later?
>> 
>> Interesting question.  Perhaps if we convert DOM threading to a hybrid 
>> model, it will render the post-VRP threader completely useless.  Huhh... 
>> That could kill 2 birds with one stone... we get rid of a threading 
>> pass, and we don't need to worry about as much about the super-fast ranger.
>
>These are just a few of the threading passes at -O2:
>
>a.c.192t.thread3   <-- bck threader
>a.c.193t.dom3      <-- fwd threader
>a.c.194t.strlen1
>a.c.195t.thread4   <-- bck threader
>a.c.196t.vrp2
>a.c.197t.vrp-thread2 <-- fwd threader
>
>That's almost 4 back to back threaders!
>
>*pause for effect*

We've always known we have too many of these once Jeff triplicated all the backwards threading ones. I do hope we manage to reduce the number for GCC 12. Esp. If the new ones are slower because they no longer use simple lattices. 

Richard. 

>Aldy
>
  
Bernhard Reutner-Fischer Sept. 29, 2021, 9:20 a.m. UTC | #6
On Wed, 29 Sep 2021 10:10:00 +0200
Aldy Hernandez <aldyh@redhat.com> wrote:

> Jeff has requested we slow down changes in the threading space while
> we chased down regressions.

Sure. Take your time.
> 
> That being said, thank you for your suggestion.  I am putting the
> attached patch in my queue for testing.

LGTM but cannot approve it.

~hybrid_threader wouldn't want to free_dominance_info() would it?
But you certainly looked for such leaks.
thanks,
> 
> Aldy
> 
> On Wed, Sep 29, 2021 at 7:43 AM Bernhard Reutner-Fischer
> <rep.dot.nop@gmail.com> wrote:
> >
> > Aldy, ping?
> >
> > On 25 September 2021 21:25:44 CEST, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
> > >On Fri, 24 Sep 2021 17:46:53 +0200
> > >Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >
> > >> +static unsigned int
> > >> +execute_vrp_threader (function *fun)
> > >> +{
> > >> +  hybrid_threader threader;
> > >> +  threader.thread_jumps (fun);
> > >> +  threader.thread_through_all_blocks ();
> > >> +  return 0;
> > >> +}
> > >> +
> > >> +namespace {
> > >> +
> > >> +const pass_data pass_data_vrp_threader =
> > >> +{
> > >> +  GIMPLE_PASS, /* type */
> > >> +  "vrp-thread", /* name */
> > >> +  OPTGROUP_NONE, /* optinfo_flags */
> > >> +  TV_TREE_VRP, /* tv_id */
> > >> +  PROP_ssa, /* properties_required */
> > >> +  0, /* properties_provided */
> > >> +  0, /* properties_destroyed */
> > >> +  0, /* todo_flags_start */
> > >> +  ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */
> > >> +};
> > >
> > >So shouldn't non-jumpy or flat code avoid the cleanup_cfg or
> > >update_ssa iff neither the function nor anything else was threaded?
> > >
> > >thanks,
> >
  
Jeff Law Sept. 29, 2021, 3:45 p.m. UTC | #7
On 9/29/2021 3:20 AM, Bernhard Reutner-Fischer wrote:
> On Wed, 29 Sep 2021 10:10:00 +0200
> Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Jeff has requested we slow down changes in the threading space while
>> we chased down regressions.
> Sure. Take your time.
>> That being said, thank you for your suggestion.  I am putting the
>> attached patch in my queue for testing.
> LGTM but cannot approve it.
>
> ~hybrid_threader wouldn't want to free_dominance_info() would it?
> But you certainly looked for such leaks.
> thanks,
>> Aldy
>>
>> On Wed, Sep 29, 2021 at 7:43 AM Bernhard Reutner-Fischer
>> <rep.dot.nop@gmail.com> wrote:
>>> Aldy, ping?
>>>
>>> On 25 September 2021 21:25:44 CEST, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
>>>> On Fri, 24 Sep 2021 17:46:53 +0200
>>>> Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>>> +static unsigned int
>>>>> +execute_vrp_threader (function *fun)
>>>>> +{
>>>>> +  hybrid_threader threader;
>>>>> +  threader.thread_jumps (fun);
>>>>> +  threader.thread_through_all_blocks ();
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +namespace {
>>>>> +
>>>>> +const pass_data pass_data_vrp_threader =
>>>>> +{
>>>>> +  GIMPLE_PASS, /* type */
>>>>> +  "vrp-thread", /* name */
>>>>> +  OPTGROUP_NONE, /* optinfo_flags */
>>>>> +  TV_TREE_VRP, /* tv_id */
>>>>> +  PROP_ssa, /* properties_required */
>>>>> +  0, /* properties_provided */
>>>>> +  0, /* properties_destroyed */
>>>>> +  0, /* todo_flags_start */
>>>>> +  ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */
>>>>> +};
>>>> So shouldn't non-jumpy or flat code avoid the cleanup_cfg or
>>>> update_ssa iff neither the function nor anything else was threaded?
>>>>
>>>> thanks,
>
> 0001-Avoid-CFG-updates-in-VRP-threader-if-nothing-changed.patch
>
>  From 34877c42f4442653bdc534d135f1f44f63175ce6 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez <aldyh@redhat.com>
> Date: Wed, 29 Sep 2021 10:02:12 +0200
> Subject: [PATCH] Avoid CFG updates in VRP threader if nothing changed.
>
> There is no need to update the CFG or SSAs if nothing has changed in VRP
> threading.
>
> gcc/ChangeLog:
>
> 	* tree-vrp.c (thread_through_all_blocks): Return bool.
> 	(execute_vrp_threader): Return TODO_* flags.
> 	(pass_data_vrp_threader): Set todo_flags_finish to 0.
OK
jeff
  
Gerald Pfeifer Oct. 1, 2021, 10:55 a.m. UTC | #8
On Fri, 24 Sep 2021, Aldy Hernandez via Gcc-patches wrote:
> This patch implements the new hybrid forward threader and replaces the
> embedded VRP threader with it.

I'm not sure this is the right of the patches to follow-up around this, 
but between Jeff writing

  "Note we've got massive failures in the tester starting sometime 
  yesterday and I suspect all the threader work. So I'm going to slow 
  down on reviews of that code as we stabilize stuff."

in another thread and you 

  "There seems to be a memory consumption issue on 32 bit hosts after 
  the hybrid threader patchset.  I'm having a hard time reproducing..."

in yet another I can report that my i586-unknown-freebsd11 nightly tester 
started to fail on Sep 28 at 00:40 UTC, still failed Sep 29 and Sep 30,
and successfully passed last night.

Failures were all at the same point in all-stage2-gcc:

   cc1plus: out of memory allocating 65536 bytes after a total of 0 bytes
   gmake[3]: *** [Makefile:1136: insn-emit.o] Error 1

   cc1plus: out of memory allocating 65536 bytes after a total of 0 bytes
   gmake[3]: *** [Makefile:1136: insn-emit.o] Error 1

   cc1plus: out of memory allocating 86776 bytes after a total of 0 bytes
   gmake[3]: *** [Makefile:1136: insn-emit.o] Error 1


Is this under control now, or was last night just a lucky one?

Since that reproduced somewhat regularly, how may I be able to help?

Gerald
  
Aldy Hernandez Oct. 1, 2021, 11:03 a.m. UTC | #9
On 10/1/21 12:55 PM, Gerald Pfeifer wrote:
> On Fri, 24 Sep 2021, Aldy Hernandez via Gcc-patches wrote:
>> This patch implements the new hybrid forward threader and replaces the
>> embedded VRP threader with it.
> 
> I'm not sure this is the right of the patches to follow-up around this,
> but between Jeff writing
> 
>    "Note we've got massive failures in the tester starting sometime
>    yesterday and I suspect all the threader work. So I'm going to slow
>    down on reviews of that code as we stabilize stuff."

Most of this has been resolved.  Some of it was some out-of-tree patches 
Jeff had on his tree, and some other stuff were tests that needed 
adjustments on other architectures.  Both have been fixed.

That being said, the visium & bfin embedded targets have some failures 
I've yet to look at.

On a similar topic, disallowing threading paths that cross loops has 
brought some problems that I'm looking at.

> 
> in another thread and you
> 
>    "There seems to be a memory consumption issue on 32 bit hosts after
>    the hybrid threader patchset.  I'm having a hard time reproducing..."

This has been fixed by:

commit 64dd46dbc682fbbc03a74e0298f7ac471c5e80f2
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Thu Sep 30 02:19:36 2021 +0200

     Plug memory leak in hybrid_threader.

     Tested on x86-64 Linux.

Aldy
  
Aldy Hernandez Oct. 14, 2021, 12:29 p.m. UTC | #10
On Mon, Sep 27, 2021 at 7:29 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On September 27, 2021 6:07:40 PM GMT+02:00, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >On 9/27/21 5:27 PM, Aldy Hernandez wrote:
> >>
> >>
> >> On 9/27/21 5:01 PM, Jeff Law wrote:
> >>>
> >>>
> >>> On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
> >
> >>> And the big question, is the pass running after VRP2 doing anything
> >>> particularly useful?  Do we want to try and kill it now, or later?
> >>
> >> Interesting question.  Perhaps if we convert DOM threading to a hybrid
> >> model, it will render the post-VRP threader completely useless.  Huhh...
> >> That could kill 2 birds with one stone... we get rid of a threading
> >> pass, and we don't need to worry about as much about the super-fast ranger.
> >
> >These are just a few of the threading passes at -O2:
> >
> >a.c.192t.thread3   <-- bck threader
> >a.c.193t.dom3      <-- fwd threader
> >a.c.194t.strlen1
> >a.c.195t.thread4   <-- bck threader
> >a.c.196t.vrp2
> >a.c.197t.vrp-thread2 <-- fwd threader
> >
> >That's almost 4 back to back threaders!
> >
> >*pause for effect*
>
> We've always known we have too many of these once Jeff triplicated all the backwards threading ones. I do hope we manage to reduce the number for GCC 12. Esp. If the new ones are slower because they no longer use simple lattices.

By the way, what is the blessed way of knowing which of the N passes
we are in?  For instance, there are 4 back threading passes (well 5
with ethread).  I'd like to know how to know if I'm in the 4th one,
which is the one that runs before VRP2 threading.  I know there's
gimple_opt_pass::set_pass_param, but that seems to only take a bool.

Thanks.
Aldy
  
Richard Biener Oct. 14, 2021, 12:47 p.m. UTC | #11
On Thu, Oct 14, 2021 at 2:29 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Mon, Sep 27, 2021 at 7:29 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On September 27, 2021 6:07:40 PM GMT+02:00, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > >
> > >
> > >On 9/27/21 5:27 PM, Aldy Hernandez wrote:
> > >>
> > >>
> > >> On 9/27/21 5:01 PM, Jeff Law wrote:
> > >>>
> > >>>
> > >>> On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
> > >
> > >>> And the big question, is the pass running after VRP2 doing anything
> > >>> particularly useful?  Do we want to try and kill it now, or later?
> > >>
> > >> Interesting question.  Perhaps if we convert DOM threading to a hybrid
> > >> model, it will render the post-VRP threader completely useless.  Huhh...
> > >> That could kill 2 birds with one stone... we get rid of a threading
> > >> pass, and we don't need to worry about as much about the super-fast ranger.
> > >
> > >These are just a few of the threading passes at -O2:
> > >
> > >a.c.192t.thread3   <-- bck threader
> > >a.c.193t.dom3      <-- fwd threader
> > >a.c.194t.strlen1
> > >a.c.195t.thread4   <-- bck threader
> > >a.c.196t.vrp2
> > >a.c.197t.vrp-thread2 <-- fwd threader
> > >
> > >That's almost 4 back to back threaders!
> > >
> > >*pause for effect*
> >
> > We've always known we have too many of these once Jeff triplicated all the backwards threading ones. I do hope we manage to reduce the number for GCC 12. Esp. If the new ones are slower because they no longer use simple lattices.
>
> By the way, what is the blessed way of knowing which of the N passes
> we are in?  For instance, there are 4 back threading passes (well 5
> with ethread).  I'd like to know how to know if I'm in the 4th one,
> which is the one that runs before VRP2 threading.  I know there's
> gimple_opt_pass::set_pass_param, but that seems to only take a bool.

There's no blessed way other than to make distinct passes (but you could
add multiple params emulating a binary encoding of a sequence number, eh).

But then I'd really just try removing either the threader before dom or the
one before vrp (I suppose removing the one before VRP makes most sense).

Richard.

>
> Thanks.
> Aldy
>
  

Patch

diff --git a/gcc/passes.def b/gcc/passes.def
index d7a1f8c97a6..9115da7beb6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -212,6 +212,7 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_merge_phi);
       NEXT_PASS (pass_thread_jumps);
       NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
+      NEXT_PASS (pass_vrp_threader);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_dce);
       /* pass_stdarg is always run and at this point we execute
@@ -337,6 +338,7 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_strlen);
       NEXT_PASS (pass_thread_jumps);
       NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
+      NEXT_PASS (pass_vrp_threader);
       /* Threading can leave many const/copy propagations in the IL.
 	 Clean them up.  Instead of just copy_prop, we use ccp to
 	 compute alignment and nonzero bits.  */
diff --git a/gcc/testsuite/gcc.dg/torture/pr55107.c b/gcc/testsuite/gcc.dg/torture/pr55107.c
index d757c041220..2edb75f7541 100644
--- a/gcc/testsuite/gcc.dg/torture/pr55107.c
+++ b/gcc/testsuite/gcc.dg/torture/pr55107.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-additional-options "-fno-split-loops" } */
+/* { dg-additional-options "-fno-split-loops -w" } */
 
 typedef unsigned short uint16_t;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
index 5227c87fbf4..59663dd5314 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp1" } */
+/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */
 
 void g (int);
 void g1 (int);
@@ -27,4 +27,4 @@  f (long a, long b, long c, long d, long x)
   g (a);
 }
 
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
index eaf89bb4581..0c2f6e0e878 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp1" } */
+/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */
 
 void g (void);
 void g1 (void);
@@ -20,4 +20,4 @@  f (long a, long b, long c, long d, int x)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c
index d5a1e0b3b98..6a3d3595d8c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp1" } */
+/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */
 
 void g (void);
 void g1 (void);
@@ -22,4 +22,4 @@  f (long a, long b, long c, long d, int x)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c
index 53acabc7b84..9bc4c6db8e8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp1" } */
+/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */
 
 void g (int);
 void g1 (int);
@@ -37,4 +37,4 @@  f (long a, long b, long c, long d, int x)
   g (c + d);
 }
 
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c
index b4065668ff8..51b3b7ac755 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-vrp1-details" } */
+/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-vrp-thread1-details" } */
 
 static int blocksize = 4096;
 
@@ -39,6 +39,6 @@  void foo (void)
    statement.  We also realize that the final bytes == 0 test is useless,
    and thread over it.  We also know that toread != 0 is useless when
    entering while loop and thread over it.  */
-/* { dg-final { scan-tree-dump-times "Threaded jump" 3 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Threaded jump" 3 "vrp-thread1" } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
index dab16abe522..2caa1f532ea 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp1-details" } */
+/* { dg-options "-Ofast -fdump-tree-vrp-thread1-details" } */
 
 typedef unsigned short u16;
 typedef unsigned char u8;
@@ -56,6 +56,11 @@  main (int argc, char argv[])
   return crc;
 }
 
-/* { dg-final { scan-tree-dump-times "Registering jump thread" 3 "vrp1" } } */
-/* { dg-final { scan-tree-dump-not "joiner" "vrp1" } } */
-/* { dg-final { scan-tree-dump-times "Threaded jump" 3 "vrp1" } } */
+/* Previously we had 3 jump threads, but one of them crossed loops.
+   The reason the old threader was allowing it, was because there was
+   an ASSERT_EXPR getting in the way.  Without the ASSERT_EXPR, we
+   have an empty pre-header block as the final block in the thread,
+   which the threader will simply join with the next block which *is*
+   in a different loop.  */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 2 "vrp-thread1" } } */
+/* { dg-final { scan-tree-dump-not "joiner" "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
index ddc53fbfbcc..0229a823ab7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
@@ -1,6 +1,6 @@ 
 /* PR tree-optimization/61839.  */
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdisable-tree-evrp -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdump-tree-vrp-thread1 -fdisable-tree-evrp -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */
 /* { dg-require-effective-target int32plus } */
 
 __attribute__ ((noinline))
@@ -38,7 +38,11 @@  int main ()
 }
 
 /* Scan for c = 972195717) >> [0, 1] in function foo.  */
-/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1  "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1  "vrp-thread1" } } */
+
+/* Previously we were checking for two ?: with constant PHI arguments,
+   but now we collapse them into one.  */
 /* Scan for c = 972195717) >> [2, 3] in function bar.  */
-/* { dg-final { scan-tree-dump-times "243048929 : 121524464" 2  "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "243048929 : 121524464" 1  "vrp-thread1" } } */
+
 /* { dg-final { scan-tree-dump-times "486097858" 0  "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
index cc322d6e703..7be1873282c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
@@ -1,6 +1,6 @@ 
 /* PR tree-optimization/61839.  */
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdump-tree-vrp-thread1 -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */
 
 __attribute__ ((noinline))
 int foo (int a, unsigned b)
@@ -22,5 +22,5 @@  int main ()
 }
 
 /* Scan for c [12, 13] << 8 in function foo.  */
-/* { dg-final { scan-tree-dump-times "3072 : 3328" 2  "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "3072 : 3328" 1  "vrp-thread1" } } */
 /* { dg-final { scan-tree-dump-times "3072" 0  "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c
index 66a54053270..a2386ba19f0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-ffast-math -O3 -fdump-tree-vrp1-details" } */
+/* { dg-options "-ffast-math -O3 -fdump-tree-vrp-thread1-details" } */
 
 int I = 50, J = 50;
 int S, L;
@@ -39,4 +39,4 @@  void foo (int K)
 	bar (LD, SD);
     }
 }
-/* { dg-final { scan-tree-dump-times "Threaded jump " 2 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Threaded jump " 2 "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c
index 856ab389439..73969bbe1e5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details --param logical-op-non-short-circuit=1 -fdisable-tree-thread1 -fdisable-tree-thread2" } */
+/* { dg-options "-O2 -fdump-tree-dom2-details --param logical-op-non-short-circuit=1 -fdisable-tree-thread1 -fdisable-tree-thread2 -fdisable-tree-vrp-thread1 " } */
 
 static int *bb_ticks;
 extern void frob (void);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c
index ffbdc988e0a..1b677f44b40 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details -w --param logical-op-non-short-circuit=1" } */
+/* { dg-options "-O2 -fdump-tree-dom2-details -w --param logical-op-non-short-circuit=1 -fdisable-tree-vrp-thread1" } */
 unsigned char
 validate_subreg (unsigned int offset, unsigned int isize, unsigned int osize, int zz, int qq)
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
index 2d78d045516..0246ebf3c63 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp1-details -fdump-tree-thread1-details -std=gnu89 --param logical-op-non-short-circuit=0" } */
+/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -std=gnu89 --param logical-op-non-short-circuit=0" } */
 
 #include "ssa-dom-thread-4.c"
 
@@ -24,4 +24,4 @@ 
 
 /* There used to be 6 jump threads found by thread1, but they all
    depended on threading through distinct loops in ethread.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
index b972f649442..8f0a12c12ee 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats" } */
+/* { dg-options "-O2 -fdump-tree-vrp-thread1-stats -fdump-tree-dom2-stats" } */
 
 void bla();
 
@@ -16,6 +16,6 @@  void thread_entry_through_header (void)
 
 /* There's a single jump thread that should be handled by the VRP
    jump threading pass.  */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp1"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp-thread1"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp-thread1"} } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
index 521754f8d79..46e464ff26a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp1-details -fdump-tree-dom2-details -std=gnu89 --param logical-op-non-short-circuit=1" } */
+/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -fdump-tree-dom2-details -std=gnu89 --param logical-op-non-short-circuit=1" } */
 struct bitmap_head_def;
 typedef struct bitmap_head_def *bitmap;
 typedef const struct bitmap_head_def *const_bitmap;
@@ -58,4 +58,5 @@  bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
    code we missed the edge when the first conditional is false
    (b_elt is zero, which means the second conditional is always
    zero.  VRP1 catches all three.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 3 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 2 "vrp-thread1" } } */
+/* { dg-final { scan-tree-dump-times "Path crosses loops" 1 "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
index f9152b9358f..8c5cc8228fb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
@@ -1,7 +1,7 @@ 
 /* { dg-do compile } */
-/* { dg-additional-options "-O2 -fdump-tree-vrp-details --param logical-op-non-short-circuit=1" }  */
+/* { dg-additional-options "-O2 -fdump-tree-vrp-thread1-details --param logical-op-non-short-circuit=1" }  */
 /* { dg-additional-options "-fdisable-tree-thread1" } */
-/* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */
+/* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp-thread1" } }  */
 
 void foo (void);
 void bar (void);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
index ef5611fa32e..86d07ef9bdb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-details -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -fdelete-null-pointer-checks" } */
 /* { dg-skip-if "" keeps_null_pointer_checks } */
 
 void oof (void);
@@ -29,5 +29,5 @@  build_omp_regions_1 (basic_block bb, struct omp_region *parent,
 
 /* ARM Cortex-M defined LOGICAL_OP_NON_SHORT_CIRCUIT to false,
    so skip below test.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp1" { target { ! arm_cortex_m } } } } */
+/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp-thread1" { target { ! arm_cortex_m } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c
index e2e48d8deb9..f25ea9c3826 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c
@@ -1,6 +1,6 @@ 
 /* PR tree-optimization/18046  */
-/* { dg-options "-O2 -fdump-tree-vrp1-details" }  */
-/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp1" } }  */
+/* { dg-options "-O2 -fdump-tree-vrp-thread1-details" }  */
+/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp-thread1" } }  */
 /* During VRP we expect to thread the true arm of the conditional through the switch
    and to the BB that corresponds to the 7 ... 9 case label.  */
 extern void foo (void);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
index 8ae9b8d2160..57317cd5b17 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-blocks-vops-details -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fdump-tree-vrp-thread1-blocks-vops-details -fdelete-null-pointer-checks" } */
 
 void arf (void);
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index eb75eb17951..84477a47b88 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -462,6 +462,7 @@  extern gimple_opt_pass *make_pass_copy_prop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_isolate_erroneous_paths (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_early_vrp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_vrp (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_vrp_threader (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_uncprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_return_slot (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_reassoc (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index ae77e5eb396..29ed60a98b0 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -39,6 +39,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "vr-values.h"
 #include "gimple-ssa-evrp-analyze.h"
 #include "gimple-range.h"
+#include "gimple-range-path.h"
 
 /* To avoid code explosion due to jump threading, we limit the
    number of statements we are going to copy.  This variable
@@ -1397,3 +1398,73 @@  jt_state::register_equivs_stmt (gimple *stmt, basic_block bb,
     register_equiv (gimple_get_lhs (stmt), cached_lhs,
 		    /*update_range=*/false);
 }
+
+// Hybrid threader implementation.
+
+
+void
+hybrid_jt_state::register_equivs_stmt (gimple *, basic_block, jt_simplifier *)
+{
+  // Ranger has no need to simplify anything to improve equivalences.
+}
+
+hybrid_jt_simplifier::hybrid_jt_simplifier (gimple_ranger *r,
+					    path_range_query *q)
+{
+  m_ranger = r;
+  m_query = q;
+}
+
+tree
+hybrid_jt_simplifier::simplify (gimple *stmt, gimple *, basic_block,
+				jt_state *state)
+{
+  int_range_max r;
+
+  compute_ranges_from_state (stmt, state);
+
+  if (gimple_code (stmt) == GIMPLE_COND
+      || gimple_code (stmt) == GIMPLE_ASSIGN)
+    {
+      tree ret;
+      if (m_query->range_of_stmt (r, stmt) && r.singleton_p (&ret))
+	return ret;
+    }
+  else if (gimple_code (stmt) == GIMPLE_SWITCH)
+    {
+      gswitch *switch_stmt = dyn_cast <gswitch *> (stmt);
+      tree index = gimple_switch_index (switch_stmt);
+      if (m_query->range_of_expr (r, index, stmt))
+	return find_case_label_range (switch_stmt, &r);
+    }
+  return NULL;
+}
+
+// Use STATE to generate the list of imports needed for the solver,
+// and calculate the ranges along the path.
+
+void
+hybrid_jt_simplifier::compute_ranges_from_state (gimple *stmt, jt_state *state)
+{
+  auto_bitmap imports;
+  gori_compute &gori = m_ranger->gori ();
+
+  state->get_path (m_path);
+
+  // Start with the imports to the final conditional.
+  bitmap_copy (imports, gori.imports (m_path[0]));
+
+  // Add any other interesting operands we may have missed.
+  if (gimple_bb (stmt) != m_path[0])
+    {
+      for (unsigned i = 0; i < gimple_num_ops (stmt); ++i)
+	{
+	  tree op = gimple_op (stmt, i);
+	  if (op
+	      && TREE_CODE (op) == SSA_NAME
+	      && irange::supports_type_p (TREE_TYPE (op)))
+	    bitmap_set_bit (imports, SSA_NAME_VERSION (op));
+	}
+    }
+  m_query->precompute_ranges (m_path, imports);
+}
diff --git a/gcc/tree-ssa-threadedge.h b/gcc/tree-ssa-threadedge.h
index 0b47a521053..ac605a3ac30 100644
--- a/gcc/tree-ssa-threadedge.h
+++ b/gcc/tree-ssa-threadedge.h
@@ -53,6 +53,26 @@  public:
   virtual tree simplify (gimple *, gimple *, basic_block, jt_state *) = 0;
 };
 
+class hybrid_jt_state : public jt_state
+{
+private:
+  void register_equivs_stmt (gimple *, basic_block, jt_simplifier *) override;
+};
+
+class hybrid_jt_simplifier : public jt_simplifier
+{
+public:
+  hybrid_jt_simplifier (class gimple_ranger *r, class path_range_query *q);
+
+private:
+  tree simplify (gimple *stmt, gimple *, basic_block, jt_state *) override;
+  void compute_ranges_from_state (gimple *stmt, jt_state *);
+
+  gimple_ranger *m_ranger;
+  path_range_query *m_query;
+  auto_vec<basic_block> m_path;
+};
+
 // This is the high level threader.  The entry point is
 // thread_outgoing_edges(), which calculates and registers paths to be
 // threaded.  When all candidates have been registered,
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index a5079ee48aa..c55a7499c14 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -66,6 +66,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "range-op.h"
 #include "value-range-equiv.h"
 #include "gimple-array-bounds.h"
+#include "gimple-range.h"
+#include "gimple-range-path.h"
 #include "tree-ssa-dom.h"
 
 /* Set of SSA names found live during the RPO traversal of the function
@@ -4591,11 +4593,6 @@  execute_vrp (struct function *fun, bool warn_array_bounds_p)
       array_checker.check ();
     }
 
-  /* We must identify jump threading opportunities before we release
-     the datastructures built by VRP.  */
-  vrp_jump_threader threader (fun, &vrp_vr_values);
-  threader.thread_jumps ();
-
   simplify_casted_conds (fun, &vrp_vr_values);
 
   free_numbers_of_iterations_estimates (fun);
@@ -4605,21 +4602,6 @@  execute_vrp (struct function *fun, bool warn_array_bounds_p)
      does not properly handle ASSERT_EXPRs.  */
   assert_engine.remove_range_assertions ();
 
-  /* If we exposed any new variables, go ahead and put them into
-     SSA form now, before we handle jump threading.  This simplifies
-     interactions between rewriting of _DECL nodes into SSA form
-     and rewriting SSA_NAME nodes into SSA form after block
-     duplication and CFG manipulation.  */
-  update_ssa (TODO_update_ssa);
-
-  /* We identified all the jump threading opportunities earlier, but could
-     not transform the CFG at that time.  This routine transforms the
-     CFG and arranges for the dominator tree to be rebuilt if necessary.
-
-     Note the SSA graph update will occur during the normal TODO
-     processing by the pass manager.  */
-  threader.thread_through_all_blocks ();
-
   scev_finalize ();
   loop_optimizer_finalize ();
   return 0;
@@ -4669,3 +4651,124 @@  make_pass_vrp (gcc::context *ctxt)
 {
   return new pass_vrp (ctxt);
 }
+
+// This is the dom walker for the hybrid threader.  The reason this is
+// here, as opposed to the generic threading files, is because the
+// other client would be DOM, and they have their own custom walker.
+
+class hybrid_threader : public dom_walker
+{
+public:
+  hybrid_threader ();
+  ~hybrid_threader ();
+
+  void thread_jumps (function *fun)
+  {
+    walk (fun->cfg->x_entry_block_ptr);
+  }
+  void thread_through_all_blocks ()
+  {
+    m_threader->thread_through_all_blocks (false);
+  }
+
+private:
+  edge before_dom_children (basic_block) override;
+  void after_dom_children (basic_block bb) override;
+
+  hybrid_jt_simplifier *m_simplifier;
+  jump_threader *m_threader;
+  jt_state *m_state;
+  gimple_ranger *m_ranger;
+  path_range_query *m_query;
+};
+
+hybrid_threader::hybrid_threader () : dom_walker (CDI_DOMINATORS, REACHABLE_BLOCKS)
+{
+  loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
+  scev_initialize ();
+  calculate_dominance_info (CDI_DOMINATORS);
+  mark_dfs_back_edges ();
+
+  m_ranger = new gimple_ranger;
+  m_query = new path_range_query (*m_ranger, /*resolve=*/true);
+  m_simplifier = new hybrid_jt_simplifier (m_ranger, m_query);
+  m_state = new hybrid_jt_state;
+  m_threader = new jump_threader (m_simplifier, m_state);
+}
+
+hybrid_threader::~hybrid_threader ()
+{
+  delete m_simplifier;
+  delete m_threader;
+  delete m_state;
+  delete m_ranger;
+
+  scev_finalize ();
+  loop_optimizer_finalize ();
+}
+
+edge
+hybrid_threader::before_dom_children (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+  int_range<2> r;
+
+  for (gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      m_ranger->range_of_stmt (r, stmt);
+    }
+  return NULL;
+}
+
+void
+hybrid_threader::after_dom_children (basic_block bb)
+{
+  m_threader->thread_outgoing_edges (bb);
+}
+
+static unsigned int
+execute_vrp_threader (function *fun)
+{
+  hybrid_threader threader;
+  threader.thread_jumps (fun);
+  threader.thread_through_all_blocks ();
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_vrp_threader =
+{
+  GIMPLE_PASS, /* type */
+  "vrp-thread", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_TREE_VRP, /* tv_id */
+  PROP_ssa, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */
+};
+
+class pass_vrp_threader : public gimple_opt_pass
+{
+public:
+  pass_vrp_threader (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_vrp_threader, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  opt_pass * clone () { return new pass_vrp_threader (m_ctxt); }
+  virtual bool gate (function *) { return flag_tree_vrp != 0; }
+  virtual unsigned int execute (function *fun)
+    { return execute_vrp_threader (fun); }
+};
+
+} // namespace {
+
+gimple_opt_pass *
+make_pass_vrp_threader (gcc::context *ctxt)
+{
+  return new pass_vrp_threader (ctxt);
+}
diff --git a/libgomp/team.c b/libgomp/team.c
index ba57152db8a..11a2521057f 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -312,7 +312,7 @@  gomp_team_start (void (*fn) (void *), void *data, unsigned nthreads,
 		 unsigned flags, struct gomp_team *team,
 		 struct gomp_taskgroup *taskgroup)
 {
-  struct gomp_thread_start_data *start_data;
+  struct gomp_thread_start_data *start_data = NULL;
   struct gomp_thread *thr, *nthr;
   struct gomp_task *task;
   struct gomp_task_icv *icv;
diff --git a/libgomp/testsuite/libgomp.graphite/force-parallel-4.c b/libgomp/testsuite/libgomp.graphite/force-parallel-4.c
index ef6f64d229b..2cbe8ffd819 100644
--- a/libgomp/testsuite/libgomp.graphite/force-parallel-4.c
+++ b/libgomp/testsuite/libgomp.graphite/force-parallel-4.c
@@ -1,5 +1,5 @@ 
 /* Autopar with IF conditions.  */
-/* { dg-additional-options "-fdisable-tree-thread1" } */
+/* { dg-additional-options "-fdisable-tree-thread1 -fdisable-tree-vrp-threader1" } */
 
 void abort();
 
diff --git a/libgomp/testsuite/libgomp.graphite/force-parallel-8.c b/libgomp/testsuite/libgomp.graphite/force-parallel-8.c
index a97eb97acf6..95870230d1d 100644
--- a/libgomp/testsuite/libgomp.graphite/force-parallel-8.c
+++ b/libgomp/testsuite/libgomp.graphite/force-parallel-8.c
@@ -1,4 +1,4 @@ 
-/* { dg-additional-options "-fdisable-tree-thread1" } */
+/* { dg-additional-options "-fdisable-tree-thread1 -fdisable-tree-vrp-threader1" } */
 
 #define N 1500